Difference between revisions of "Lab: Semantic Lifting - CSV"

Line 20: Line 20:
 
'''Task 1'''
 
'''Task 1'''
  
Below are four lines of CSV that could have been saved from a spreadsheet. Copy them into a file and write a program with a loop that reads each line from that file (except the initial header line) and adds it to your graph as triples:
+
Below are four lines of CSV that could have been saved from a spreadsheet. Copy them into a file in your project folder and write a program with a loop that reads each line from that file (except the initial header line) and adds it to your graph as triples:
  
 
  "Name","Gender","Country","Town","Expertise","Interests"
 
  "Name","Gender","Country","Town","Expertise","Interests"

Revision as of 21:55, 12 March 2020

Lab 9: Semantic Lifting - CSV

Topics

Today's topic involves lifting the data in CSV format into RDF. The goal is for you to learn an example of how we can convert unsemantic data into RDF.

CSV stands for Comma Seperated Values, meaning that each point of data is seperated by a column.

Fortunately, CSV is already structured in a way that makes the creation of triples relatively easy.

Relevant Libraries

  • Pandas
  • Python functions:

split(), replace().

Tasks

Task 1

Below are four lines of CSV that could have been saved from a spreadsheet. Copy them into a file in your project folder and write a program with a loop that reads each line from that file (except the initial header line) and adds it to your graph as triples:

"Name","Gender","Country","Town","Expertise","Interests"
"Regina Catherine Hall","F","Great Britain","Manchester","Ecology, zoology","Football, music travelling"
"Achille Blaise","M","France","Nancy","","Chess, computer games"
"Nyarai Awotwi Ihejirika","F","Kenya","Nairobi","Computers, semantic networks","Hiking, botany"
"Xun He Zhang","M","China","Chengdu","Internet, mathematics, logistics","Dancing, music, trombone"

When solving the task take note of the following:

  • The subject of the triples will be the names of the people. The header (first line) are the columns of data and should act as the predicates of the triples.
  • Some columns like expertise have multiple values for one person. You should create unique triple for each of these expertises.
  • Spaces should replaced with underscores to from a valid URI. E.g Regina Catherine should be Regina_Catherine.
  • For consistency, make sure all resources start with a Captital letter.


Task 2


Task 3

Useful Readings