Difference between revisions of "Python Examples"

 
(26 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
This page will be updated with Python examples related to the lectures and labs. We will add more examples after each lab has ended. The first examples will use Python's RDFlib. We will introduce other relevant libraries later.
 
This page will be updated with Python examples related to the lectures and labs. We will add more examples after each lab has ended. The first examples will use Python's RDFlib. We will introduce other relevant libraries later.
  
 
+
<!--
==Lecture 1: Python, RDFlib, and PyCharm==
+
==Getting started==
  
  
Line 36: Line 36:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
==Lecture 2: RDF programming==
+
==Basic RDF programming==
  
 
===Different ways to create an address===
 
===Different ways to create an address===
Line 110: Line 110:
 
# Also using existing ontology for places like California. (like http://dbpedia.org/resource/California from dbpedia.org)
 
# Also using existing ontology for places like California. (like http://dbpedia.org/resource/California from dbpedia.org)
  
schema = "https://schema.org/"
+
schema = Namespace("https://schema.org/")
dbp = "https://dpbedia.org/resource/"
+
dbp = Namespace("https://dpbedia.org/resource/")
  
 
g.add((ex.Cade_Tracey, schema.address, ex.CadeAddress))
 
g.add((ex.Cade_Tracey, schema.address, ex.CadeAddress))
Line 150: Line 150:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 +
===Graph Binding===
 +
<syntaxhighlight>
 +
#Graph Binding is useful for at least two reasons:
 +
#(1) We no longer need to specify prefixes with SPARQL queries if they are already binded to the graph.
 +
#(2) When serializing the graph, the serialization will show the correct expected prefix
 +
# instead of default namespace names ns1, ns2 etc.
 +
 +
g = Graph()
 +
 +
ex = Namespace("http://example.org/")
 +
dbp = Namespace("http://dbpedia.org/resource/")
 +
schema = Namespace("https://schema.org/")
 +
 +
g.bind("ex", ex)
 +
g.bind("dbp", dbp)
 +
g.bind("schema", schema)
 +
</syntaxhighlight>
  
 
===Collection Example===
 
===Collection Example===
Line 175: Line 192:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
==Lecture 3: SPARQL==
+
==SPARQL==
 +
 
 +
Also see the [[SPARQL Examples]] page!
  
===SPARQL queries from the lecture===
+
===Querying a local ("in memory") graph===
<syntaxhighlight>
 
SELECT DISTINCT ?p WHERE {
 
    ?s ?p ?o .
 
}
 
</syntaxhighlight>
 
  
<syntaxhighlight>
+
Example contents of the file family.ttl:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
+
@prefix rex: <http://example.org/royal#> .
 +
@prefix fam: <http://example.org/family#> .
 +
 +
rex:IngridAlexandra fam:hasParent rex:HaakonMagnus .
 +
rex:SverreMagnus fam:hasParent rex:HaakonMagnus .
 +
rex:HaakonMagnus fam:hasParent rex:Harald .
 +
rex:MarthaLouise fam:hasParent rex:Harald .
 +
rex:HaakonMagnus fam:hasSister rex:MarthaLouise .
  
SELECT DISTINCT ?t WHERE {
+
import rdflib
    ?s rdf:type ?t .
+
}
+
g = rdflib.Graph()
</syntaxhighlight>
+
g.parse("family.ttl", format='ttl')
 +
 +
qres = g.query("""
 +
PREFIX fam: <http://example.org/family#>
 +
    SELECT ?child ?sister WHERE {
 +
        ?child fam:hasParent ?parent .
 +
        ?parent fam:hasSister ?sister .
 +
    }""")
 +
for row in qres:
 +
    print("%s has aunt %s" % row)
  
<syntaxhighlight>
+
With a prepared query, you can write the query once, and then bind some of the variables each time you use it:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
+
import rdflib
CONSTRUCT {  
+
    ?s owl:sameAs ?o2 .  
+
g = rdflib.Graph()
} WHERE {
+
g.parse("family.ttl", format='ttl')
    ?s owl:sameAs ?o .
+
    FILTER(REGEX(STR(?o), "^http://www\\.", "s"))
+
q = rdflib.plugins.sparql.prepareQuery(
    BIND(URI(REPLACE(STR(?o), "^http://www\\.", "http://", "s")) AS ?o2)
+
        """SELECT ?child ?sister WHERE {
}
+
                  ?child fam:hasParent ?parent .
</syntaxhighlight>
+
                  ?parent fam:hasSister ?sister .
 +
        }""",
 +
        initNs = { "fam": "http://example.org/family#"})
 +
 +
sm = rdflib.URIRef("http://example.org/royal#SverreMagnus")
 +
 +
for row in g.query(q, initBindings={'child': sm}):
 +
        print(row)
  
 
===Select all contents of lists (rdfllib.Collection)===
 
===Select all contents of lists (rdfllib.Collection)===
Line 217: Line 254:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
==Lecture 4- SPARQL PROGRAMMING==
+
 
 +
===Using parameters/variables in rdflib queries===
 +
 
 +
<syntaxhighlight>
 +
from rdflib import Graph, Namespace, URIRef
 +
from rdflib.plugins.sparql import prepareQuery
 +
 
 +
g = Graph()
 +
ex = Namespace("http://example.org/")
 +
g.bind("ex", ex)
 +
 
 +
g.add((ex.Cade, ex.livesIn, ex.France))
 +
g.add((ex.Anne, ex.livesIn, ex.Norway))
 +
g.add((ex.Sofie, ex.livesIn, ex.Sweden))
 +
g.add((ex.Per, ex.livesIn, ex.Norway))
 +
g.add((ex.John, ex.livesIn, ex.USA))
 +
 
 +
 
 +
def find_people_from_country(country):
 +
        country = URIRef(ex + country)
 +
        q = prepareQuery(
 +
        """
 +
        PREFIX ex: <http://example.org/>
 +
        SELECT ?person WHERE {
 +
        ?person ex:livesIn ?country.
 +
        }
 +
        """)
 +
 
 +
        capital_result = g.query(q, initBindings={'country': country})
 +
 
 +
        for row in capital_result:
 +
            print(row)
 +
 
 +
find_people_from_country("Norway")
 +
</syntaxhighlight>
  
 
===SELECTING data from Blazegraph via Python===
 
===SELECTING data from Blazegraph via Python===
Line 276: Line 347:
  
  
 +
</syntaxhighlight>
 +
===Retrieving data from Wikidata with SparqlWrapper===
 +
<syntaxhighlight>
 +
from SPARQLWrapper import SPARQLWrapper, JSON
 +
 +
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
 +
# In the query I want to select all the Vitamins in wikidata.
 +
 +
sparql.setQuery("""
 +
    SELECT ?nutrient ?nutrientLabel WHERE
 +
{
 +
  ?nutrient wdt:P279 wd:Q34956.
 +
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
 +
}
 +
""")
 +
 +
sparql.setReturnFormat(JSON)
 +
results = sparql.query().convert()
 +
 +
for result in results["results"]["bindings"]:
 +
    print(result["nutrient"]["value"], "  ", result["nutrientLabel"]["value"])
 +
</syntaxhighlight>
 +
 +
More examples can be found in the example section on the official query service here: https://query.wikidata.org/.
 +
 +
===Download from BlazeGraph===
 +
 +
<syntaxhighlight>
 +
"""
 +
Dumps a database to a local RDF file.
 +
You need to install the SPARQLWrapper package first...
 +
"""
 +
 +
import datetime
 +
from SPARQLWrapper import SPARQLWrapper, RDFXML
 +
 +
# your namespace, the default is 'kb'
 +
ns = 'kb'
 +
 +
# the SPARQL endpoint
 +
endpoint = 'http://info216.i2s.uib.no/bigdata/namespace/' + ns + '/sparql'
 +
 +
# - the endpoint just moved, the old one was:
 +
# endpoint = 'http://i2s.uib.no:8888/bigdata/namespace/' + ns + '/sparql'
 +
 +
# create wrapper
 +
wrapper = SPARQLWrapper(endpoint)
 +
 +
# prepare the SPARQL update
 +
wrapper.setQuery('CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }')
 +
wrapper.setReturnFormat(RDFXML)
 +
 +
# execute the SPARQL update and convert the result to an rdflib.Graph
 +
graph = wrapper.query().convert()
 +
 +
# the destination file, with code to make it timestamped
 +
destfile = 'rdf_dumps/slr-kg4news-' + datetime.datetime.now().strftime('%Y%m%d-%H%M') + '.rdf'
 +
 +
# serialize the result to file
 +
graph.serialize(destination=destfile, format='ttl')
 +
 +
# report and quit
 +
print('Wrote %u triples to file %s .' %
 +
      (len(res), destfile))
 +
</syntaxhighlight>
 +
 +
===Query Dbpedia with SparqlWrapper===
 +
 +
<syntaxhighlight>
 +
from SPARQLWrapper import SPARQLWrapper, JSON
 +
 +
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
 +
 +
sparql.setQuery("""
 +
    PREFIX dbr: <http://dbpedia.org/resource/>
 +
    PREFIX dbo: <http://dbpedia.org/ontology/>
 +
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 +
    SELECT ?comment
 +
    WHERE {
 +
    dbr:Barack_Obama rdfs:comment ?comment.
 +
    FILTER (langMatches(lang(?comment),"en"))
 +
    }
 +
""")
 +
 +
sparql.setReturnFormat(JSON)
 +
results = sparql.query().convert()
  
 +
for result in results["results"]["bindings"]:
 +
    print(result["comment"]["value"])
 
</syntaxhighlight>
 
</syntaxhighlight>
  
== Lecture 5: RDFS==
+
==RDFS==
 +
 
 +
===RDFS-plus (OWL) Properties===
 +
<syntaxhighlight>
 +
g.add((ex.married, RDF.type, OWL.SymmetricProperty))
 +
g.add((ex.married, RDF.type, OWL.IrreflexiveProperty))
 +
g.add((ex.livesWith, RDF.type, OWL.ReflexiveProperty))
 +
g.add((ex.livesWith, RDF.type, OWL.SymmetricProperty))
 +
g.add((ex.sibling, RDF.type, OWL.TransitiveProperty))
 +
g.add((ex.sibling, RDF.type, OWL.SymmetricProperty))
 +
g.add((ex.sibling, RDF.type, OWL.IrreflexiveProperty))
 +
g.add((ex.hasFather, RDF.type, OWL.FunctionalProperty))
 +
g.add((ex.hasFather, RDF.type, OWL.AsymmetricProperty))
 +
g.add((ex.hasFather, RDF.type, OWL.IrreflexiveProperty))
 +
g.add((ex.fatherOf, RDF.type, OWL.AsymmetricProperty))
 +
g.add((ex.fatherOf, RDF.type, OWL.IrreflexiveProperty))
 +
 
 +
# Sometimes there is no definite answer, and it comes down to how we want to model our properties
 +
# e.g is livesWith a transitive property? Usually yes, but we can also want to specify that a child lives with both of her divorced parents.
 +
# which means that: (mother livesWith child % child livesWith father) != mother livesWith father. Which makes it non-transitive.
 +
</syntaxhighlight>
  
 
===RDFS inference with RDFLib===
 
===RDFS inference with RDFLib===
You can use the OWL-RL package to add inference capabilities to RDFLib. Download it [https://github.com/RDFLib/OWL-RL GitHub] and copy the ''owlrl'' subfolder into your project folder next to your Python files.
+
You can use the OWL-RL package to add inference capabilities to RDFLib. It can be installed using the pip install command:
 +
<syntaxhighlight>
 +
pip install owlrl
 +
</syntaxhighlight>
 +
Or download it from [https://github.com/RDFLib/OWL-RL GitHub] and copy the ''owlrl'' subfolder into your project folder next to your Python files.
  
 
[https://owl-rl.readthedocs.io/en/latest/owlrl.html OWL-RL documentation.]
 
[https://owl-rl.readthedocs.io/en/latest/owlrl.html OWL-RL documentation.]
  
Example program to get started:
+
Example program to get you started. In this example we are creating the graph using sparql.update, but it is also possible to parse the data from a file.
 
<syntaxhighlight>
 
<syntaxhighlight>
 
import rdflib.plugins.sparql.update
 
import rdflib.plugins.sparql.update
Line 304: Line 487:
 
}""")
 
}""")
  
# The next three lines add inferred triples to g.
 
 
rdfs = owlrl.RDFSClosure.RDFS_Semantics(g, False, False, False)
 
rdfs = owlrl.RDFSClosure.RDFS_Semantics(g, False, False, False)
 +
# RDF_Semantics parameters:
 +
# - graph (rdflib.Graph) – The RDF graph to be extended.
 +
# - axioms (bool) – Whether (non-datatype) axiomatic triples should be added or not.
 +
# - daxioms (bool) – Whether datatype axiomatic triples should be added or not.
 +
# - rdfs (bool) – Whether RDFS inference is also done (used in subclassed only).
 +
# For now, you will in most cases use all False in RDFS_Semtantics.
 +
 +
# Generates the closure of the graph - generates the new entailed triples, but does not add them to the graph.
 
rdfs.closure()
 
rdfs.closure()
 +
# Adds the new triples to the graph and empties the RDFS triple-container.
 
rdfs.flush_stored_triples()
 
rdfs.flush_stored_triples()
  
 +
# Ask-query to check whether a new triple has been generated from the entailment.
 
b = g.query("""
 
b = g.query("""
 
PREFIX ex: <http://example.org#>
 
PREFIX ex: <http://example.org#>
Line 318: Line 510:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
===Languaged tagged RDFS labels===  
+
===Language tagged RDFS labels===  
 
<syntaxhighlight>
 
<syntaxhighlight>
 
from rdflib import Graph, Namespace, Literal
 
from rdflib import Graph, Namespace, Literal
Line 333: Line 525:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
== Lecture 6: RDFS Plus / OWL ==  
+
==OWL==  
===RDFS Plus / OWL inference with RDFLib===  
+
===Basic inference with RDFLib===  
  
 
You can use the OWL-RL package again as for Lecture 5.
 
You can use the OWL-RL package again as for Lecture 5.
Line 396: Line 588:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
== Lab 9 ==
 
 
===Download from BlazeGraph===
 
 
<syntaxhighlight>
 
"""
 
Dumps a database to a local RDF file.
 
You need to install the SPARQLWrapper package first...
 
"""
 
 
import datetime
 
from SPARQLWrapper import SPARQLWrapper, RDFXML
 
 
# your namespace, the default is 'kb'
 
ns = 'kb'
 
 
# the SPARQL endpoint
 
endpoint = 'http://info216.i2s.uib.no/bigdata/namespace/' + ns + '/sparql'
 
 
# - the endpoint just moved, the old one was:
 
# endpoint = 'http://i2s.uib.no:8888/bigdata/namespace/' + ns + '/sparql'
 
 
# create wrapper
 
wrapper = SPARQLWrapper(endpoint)
 
 
# prepare the SPARQL update
 
wrapper.setQuery('CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }')
 
wrapper.setReturnFormat(RDFXML)
 
 
# execute the SPARQL update and convert the result to an rdflib.Graph
 
graph = wrapper.query().convert()
 
 
# the destination file, with code to make it timestamped
 
destfile = 'rdf_dumps/slr-kg4news-' + datetime.datetime.now().strftime('%Y%m%d-%H%M') + '.rdf'
 
 
# serialize the result to file
 
graph.serialize(destination=destfile, format='ttl')
 
  
# report and quit
 
print('Wrote %u triples to file %s .' %
 
      (len(res), destfile))
 
</syntaxhighlight>
 
  
==Semantic Lifting - CSV==
+
==Lifting CSV to RDF==
  
 
<syntaxhighlight>
 
<syntaxhighlight>
Line 488: Line 639:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
==Semantic Lifting - XML==
+
==Lifting XML to RDF==
 
<syntaxhighlight>
 
<syntaxhighlight>
 
from rdflib import Graph, Literal, Namespace, URIRef
 
from rdflib import Graph, Literal, Namespace, URIRef
Line 551: Line 702:
 
</data>
 
</data>
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
==Lifting HTML to RDF==
 +
<syntaxhighlight>
 +
from bs4 import BeautifulSoup as bs, NavigableString
 +
from rdflib import Graph, URIRef, Namespace
 +
from rdflib.namespace import RDF
 +
 +
g = Graph()
 +
ex = Namespace("http://example.org/")
 +
g.bind("ex", ex)
 +
 +
html = open("tv_shows.html").read()
 +
html = bs(html, features="html.parser")
 +
 +
shows = html.find_all('li', attrs={'class': 'show'})
 +
for show in shows:
 +
    title = show.find("h3").text
 +
    actors = show.find('ul', attrs={'class': 'actor_list'})
 +
    for actor in actors:
 +
        if isinstance(actor, NavigableString):
 +
            continue
 +
        else:
 +
            actor = actor.text.replace(" ", "_")
 +
            g.add((URIRef(ex + title), ex.stars, URIRef(ex + actor)))
 +
            g.add((URIRef(ex + actor), RDF.type, ex.Actor))
 +
 +
    g.add((URIRef(ex + title), RDF.type, ex.TV_Show))
  
  
==WEB API Calls (In this case JSON)==
+
print(g.serialize(format="turtle").decode())
 +
</syntaxhighlight>
 +
 
 +
===HTML code for the example above===
 +
<syntaxhighlight>
 +
<!DOCTYPE html>
 +
<html>
 +
<head>
 +
    <meta charset="utf-8">
 +
    <title></title>
 +
</head>
 +
<body>
 +
    <div class="tv_shows">
 +
        <ul>
 +
            <li class="show">
 +
                <h3>The_Sopranos</h3>
 +
                <div class="irrelevant_data"></div>
 +
                <ul class="actor_list">
 +
                    <li>James Gandolfini</li>
 +
                </ul>
 +
            </li>
 +
            <li class="show">
 +
                <h3>Seinfeld</h3>
 +
                <div class="irrelevant_data"></div>
 +
                <ul class="actor_list">
 +
                    <li >Jerry Seinfeld</li>
 +
                    <li>Jason Alexander</li>
 +
                    <li>Julia Louis-Dreyfus</li>
 +
                </ul>
 +
            </li>
 +
        </ul>
 +
    </div>
 +
</body>
 +
</html>
 +
</syntaxhighlight>
 +
 
 +
==Web APIs with JSON==
 
<syntaxhighlight>
 
<syntaxhighlight>
 
import requests
 
import requests
Line 606: Line 820:
  
 
<div class="credits" style="text-align: right; direction: ltr; margin-left: 1em;">''INFO216, UiB, 2017-2020. All code examples are [https://creativecommons.org/choose/zero/ CC0].'' </div>
 
<div class="credits" style="text-align: right; direction: ltr; margin-left: 1em;">''INFO216, UiB, 2017-2020. All code examples are [https://creativecommons.org/choose/zero/ CC0].'' </div>
 +
 +
==OWL - Complex Classes and Restrictions==
 +
<syntaxhighlight>
 +
import owlrl
 +
from rdflib import Graph, Literal, Namespace, BNode
 +
from rdflib.namespace import RDF, OWL, RDFS
 +
from rdflib.collection import Collection
 +
 +
g = Graph()
 +
ex = Namespace("http://example.org/")
 +
g.bind("ex", ex)
 +
g.bind("owl", OWL)
 +
 +
# a Season is either Autumn, Winter, Spring, Summer
 +
seasons = BNode()
 +
Collection(g, seasons, [ex.Winter, ex.Autumn, ex.Spring, ex.Summer])
 +
g.add((ex.Season, OWL.oneOf, seasons))
 +
 +
# A Parent is a Father or Mother
 +
b = BNode()
 +
Collection(g, b, [ex.Father, ex.Mother])
 +
g.add((ex.Parent, OWL.unionOf, b))
 +
 +
# A Woman is a person who has the "female" gender
 +
br = BNode()
 +
g.add((br, RDF.type, OWL.Restriction))
 +
g.add((br, OWL.onProperty, ex.gender))
 +
g.add((br, OWL.hasValue, ex.Female))
 +
bi = BNode()
 +
Collection(g, bi, [ex.Person, br])
 +
g.add((ex.Woman, OWL.intersectionOf, bi))
 +
 +
# A vegetarian is a Person who only eats vegetarian food
 +
br = BNode()
 +
g.add((br, RDF.type, OWL.Restriction))
 +
g.add((br, OWL.onProperty, ex.eats))
 +
g.add((br, OWL.allValuesFrom, ex.VeganFood))
 +
bi = BNode()
 +
Collection(g, bi, [ex.Person, br])
 +
g.add((ex.Vegetarian, OWL.intersectionOf, bi))
 +
 +
# A vegetarian is a Person who can not eat meat.
 +
br = BNode()
 +
g.add((br, RDF.type, OWL.Restriction))
 +
g.add((br, OWL.onProperty, ex.eats))
 +
g.add((br, OWL.QualifiedCardinality, Literal(0)))
 +
g.add((br, OWL.onClass, ex.Meat))
 +
bi = BNode()
 +
Collection(g, bi, [ex.Person, br])
 +
g.add((ex.Vegetarian, OWL.intersectionOf, bi))
 +
 +
# A Worried Parent is a parent who has at least one sick child
 +
br = BNode()
 +
g.add((br, RDF.type, OWL.Restriction))
 +
g.add((br, OWL.onProperty, ex.hasChild))
 +
g.add((br, OWL.QualifiedMinCardinality, Literal(1)))
 +
g.add((br, OWL.onClass, ex.Sick))
 +
bi = BNode()
 +
Collection(g, bi, [ex.Parent, br])
 +
g.add((ex.WorriedParent, OWL.intersectionOf, bi))
 +
 +
# using the restriction above, If we now write...:
 +
g.add((ex.Bob, RDF.type, ex.Parent))
 +
g.add((ex.Bob, ex.hasChild, ex.John))
 +
g.add((ex.John, RDF.type, ex.Sick))
 +
# ...we can infer with owl reasoning that Bob is a worried Parent even though we didn't specify it ourselves because Bob fullfills the restriction and Parent requirements.
 +
 +
</syntaxhighlight>
 +
 +
==Protege-OWL reasoning with HermiT==
 +
 +
[[:File:DL-reasoning-RoyalFamily-final.owl.txt | Example file]] from Lecture 13 about OWL-DL, rules and reasoning.
 +
 +
-->

Latest revision as of 18:11, 23 January 2022

This page will be updated with Python examples related to the lectures and labs. We will add more examples after each lab has ended. The first examples will use Python's RDFlib. We will introduce other relevant libraries later.