Storing data

From imdb
Revision as of 18:25, 16 December 2015 by Est040 (talk | contribs)
Jump to: navigation, search

We have two disks:

  • /scratch (5.0T, not backed up)
  • /data_bck (500G, backed up by IT)

Each of these have a private and a public folder.

Data that is shared should be in the public folder, user individual project should go in the private folder.

Public data should be documentet in this wiki.

Set up access right to comply with this.

Do not store large things in your home-folder. Though we use our university accounts to log in, the $HOME folder on the system is not the same as your $HOME on the university system, but part of the storage for the OS.

From the README file

  • /scratch

scratch file space is dedicated only to temporary data. Data can be erased on regular intervals without any prior notification. It can not be used for permanent data storage. Data on /scratch will NOT be backed up.

  • /data_bck

For long term storage and backup please see README on /data_bck disk.

Sharing data

Some projects generate data that can be used (in agreement with data owners) by other researchers and perhaps students. That data needs to be documented so future users know what the data is, terms for use, who generated or collected it and how it was collected.

  • Twitter data

Location: /data_bck/public/data_twitter_valgkamp Desc: SQL-dump of tweets from Norwegian elections (hastags like #valg, etc.). Collected with yourtwapperkepper. People: Hallvard Moe has published on this data, so has Eirik Stavelin. Ask them.

  • Newspaper articles

Location: /data_bck/public/data_ntb_nak Desc: Norwegian newspaper articles. Sorted by news category. Collected by Norsk Avis Korpus (NAK) and Norsk Telegrambyrå (NTB). People: Dag Elgesem, Eirik Stavelin (has cleaned this data from xml to plain text, and auto categorised the NAK).