Given group assignments

Background

News Angler is a research project of the Intelligent Information Systems (I2S) research group in the Department of Information Science and Media Studies at the University of Bergen. In the project, we are developing News Hunter, a knowledge-graph platform for computer-augmented journalism. The News Hunter platform harvests texts in real time from online news and social media. Each text is lifted semantically into a small knowledge graph that represents the text item (for example a tweet or an RSS article) itself along with the people, places, organisations etc it mentions. In various ways, we want to enrich, combine, and organise these knowledge graphs to support journalists.

Below are some proposed tasks for your group projects in INFO216, Spring 2021. It is ok for several groups to choose the same assignment. We can often define variants so each group does a slightly different thing :-)

Enriching news knowledge graphs

The News Hunter platform harvests texts in real time from online news and social media. Each text is lifted semantically into a small knowledge graph that represents the text item (for example a tweet or an RSS article) itself along with the people, places, organisations etc it mentions. The assigment is to extend such knowledge graphs with additional information taken from reference sources like Wikidata, DBpedia, GeoNames, ConceptNet, YAGO3, YAGO4 and others. The assigment will deliver a software tool that inputs a small knowledge graph that represents a news text and outputs a larger knowledge graph with related information from these reference sources. We will provide example knowledge graphs you can use as input. It is a bonus if the tool offers a Fast API interface and runs in a Docker.

Here are some examples of item graphs in Turtle (rename the files from *.txt to *.ttl):

If you need more item graphs, you can continue with the ones posted below for the "News graph aggregator" task!

The ontology behind the graphs is described in Figure 4 in this paper: https://link.springer.com/article/10.1007/s10270-020-00801-w (but there can be some small changes since then).

To get started, SPARQL endpoints to ConceptNet and YAGO3 are locally available at http://nelson.uib.no/bg-conceptnet and http://nelson.uib.no/bg-yago3 (remember to "Use" the right namespace!).

External linking of CAMEO codes

The Conflict and Mediation Event Observations (CAMEO) codes is a standard for describing potentially newsworthy events. We want to use CAMEO codes in our News Hunter platform, but the codes are not currently linked to URIs from the Linked Open Data cloud, for example from Wikidata, WordNet, ConceptNet or DBpedia. The assignment is to develop a software tool that parses the definitions of CAMEO codes in the CAMEO Event Data Coodebook and uses the textual description of each code to find matching concepts in available LOD resources. The assignment will deliver the software tool along with LOD URIs for each CAMEO code.

GDELT themes

The Global Databse of Events, Language, and Tone (GDELT, gedeltproject.org) harvests an enormous amount of news from the net in real time. It identifies themes in news articles using its own taxonomy, with concepts such as: TAX_FNCACT_SEPARATIST and TAX_DISEASE_MENTAL_ILLNESS. We want to use GDELT themes in our News Hunter platform, but the codes are not currently linked to URIs from the Linked Open Data (LOD) cloud, for example from Wikidata, WordNet, ConceptNet or DBpedia. The assignment is to develop a software tool that parses the definitions of GDELT themes and uses the textual description of each theme to find matching concepts in available LOD resources. The assignment will deliver the software tool along with LOD URIs for each GDELT theme.

News graph aggregator

The News Hunter platform harvests texts in real time from online news and social media. Each text is lifted semantically into a small knowledge graph that represents the text item (for example a tweet or an RSS article) itself along with the people, places, organisations etc it mentions. Sometimes these texts report the same events, which means that the smaller knowledge graphs that represent each of them should be merged into a larger event graph. The assignment is to develop a software tool that inputs a stream of small news graphs and identifies groups of graphs that may represent the same or closely related events, for example because they mention the same people, places or organisations around the same time. The tool shall then merge related graphs into larger single graphs that it outputs. We will give you access to a stream of news graphs you can use as input. It is a bonus if the tool offers a Fast API interface and runs in a Docker.

Example data:

(After you unzip the last file, you get a tar-archive that you must unpack to get the files.)

Each file name starts with a time stamp. For example '20200929T120002-003bd40e-c4da-401d-8e7e-0645e80fcf9d.ttl' is from 2020-09-29, two seconds after 12:00. Hence, sorting the files by name also sorts them chronologically. The time stamp is also available from each graph using the nhterm:sourceDateTime property.

EventKG enricher

EventKG is a big knowledge graph of events represented in RDF. It connects each event with the people, places, organisations etc it involves. The assigment is to enrich a subset of EventKG with additional information taken from reference sources such as Wikidata, DBpedia, GeoNames, ConceptNet, YAGO3, YAGO4 etc. The assignment will deliver a software tool that inputs small knowledge graphs from EventKG, each of them representing an even, and that outputs larger, enriched knowledge graphs with related information from the chosen sources. We will provide access to EventKG and some relevant reference sources. It is a bonus if the tool offers a Fast API interface and runs in a Docker.

A SPARQL endpoint for EventKG is locally available at http://nelson.uib.no/bg-eventkg (remember to "Use" the right namespace!). This paper explains EventKG a little more, including the ontology used: https://link.springer.com/chapter/10.1007/978-3-319-93417-4_18.

ConceptNet and YAGO3 are also locally available at http://nelson.uib.no/bg-conceptnet and http://nelson.uib.no/bg-yago3.