We have two disks:
- /scratch (5.0T, not backed up)
- /data_bck (500G, backed up by IT)
Each of these have a private and a public folder.
Data that is shared should be in the public folder, user individual project should go in the private folder.
Public data should be documentet in this wiki.
Set up access right to comply with this.
Do not store large things in your home-folder. Though we use our university accounts to log in, the $HOME folder on the system is not the same as your $HOME on the university system, but part of the storage for the OS.
From the README file
scratch file space is dedicated only to temporary data. Data can be erased on regular intervals without any prior notification. It can not be used for permanent data storage. Data on /scratch will NOT be backed up.
For long term storage and backup please see README on /data_bck disk.
Some projects generate data that can be used (in agreement with data owners) by other researchers and perhaps students. That data needs to be documented so future users know what the data is, terms for use, who generated or collected it and how it was collected.
Data collection is expensive. It takes code, storage and man hours. If your project has data it can share, please do. It makes bootstraping a new project easier, lets researchers and students quickly test ideas and in the best case scenario more projects and publications can emerge from the same data. Also: please cite where data comes from - data collectors like to be recognised.
- Twitter data
Desc: SQL-dump of tweets from Norwegian elections (hastags like #valg, etc.). Collected with yourtwapperkepper.
People: Hallvard Moe has published on this data, so has Eirik Stavelin. Ask them.
- Newspaper articles
Desc: Norwegian newspaper articles. Sorted by news category. Collected by Norsk Avis Korpus (NAK) and Norsk Telegrambyrå (NTB).
People: Dag Elgesem, Eirik Stavelin (has cleaned this data from xml to plain text, and auto categorised the NAK). And NAK/NTB.
Nelson is running a mongodb which can only be accessed from localhost. The database files are stored on scratch (/scratch/mongodb) and should be considered volatile. If you need backup you need to make sure to make regular dumps when this is appropriate. Although the data is not to be considered securely stored, do not alter other peoples data without consent.
- a db called "graball" contains a collection called "urls" which contains urls hit by the graball system in the diversity project