Before you start thinking about your French lessons again: don’t worry, this article is not about the French language. It is about semantics and how computers handle it. “Fanfreluche est une poupée” or “Schanulleke is een pop” in Dutch is a sentence often used to learn French. Schanulleke is Wiske’s doll, a character from the comic books of Suske & Wiske. So, it’s a simple concept: Wiske has a doll, and that doll is called Schanulleke. Up to this point, the human reader is following along. I can guarantee you, for a computer this is a bigger challenge.

multiple triples make linked data

This article occasionally contains code examples: don’t be intimidated, they complement the text itself. The goal is not to be a “get started” guide for linked data technologies, but rather to explain the principles behind them.

Schanulleke or Fanfreluche?

Suske & Wiske is the product of the Belgian comic book artist Willy Vandersteen. Belgium is a multilingual country, so the comic “Suske & Wiske” was published in both Dutch and French. Regardless of the language, Schanulleke or Fanfreluche are both a doll.

Schanulleke is een pop
01010011 01100011 01101000 01100001 01101110 01110101 01101100 
01101100 01100101 01101011 01100101 00100000 01101001 01110011 
00100000 01100101 01100101 01101110 00100000 01110000 01101111 
01110000 00001010

Fanfreluche est une poupée
01000110 01100001 01101110 01100110 01110010 01100101 01101100 
01110101 01100011 01101000 01100101 00100000 01100101 01110011 
01110100 00100000 01110101 01101110 01100101 00100000 01110000 
01101111 01110101 01110000 11000011 10101001 01100101

For the human reader, it is clear that these sentences describe the same concept, albeit in two languages. A computer stores this in binary: in binary, these two sentences are different. For a computer, these sentences represent two different concepts. Good news: humans can distinguish things from each other where computers struggle. Humans can reason with ease, while computers cannot. How do you make a computer understand that Schanulleke is a doll?

Tower of Babel

The sentence “Fanfreluche est une poupée” can be divided into three parts:

Schanulleke              is               een pop
Fanfreluche              est              une poupée
In both French and Dutch, these three parts correspond: they are translations. Each part can be described with its own Universal Resource Identifier (URI):

<https://www.wikidata.org/wiki/Q2731058> <https://www.wikidata.org/wiki/Property:P31> <https://www.wikidata.org/wiki/Q2918349> .

Three URIs replace the words: the triple is born! A triple always consists of 3 parts: a subject, a predicate, and an object. Within linked data, the triple is the smallest possible

The handy thing about these URIs is that they have the same form as a URL: you can click on them. The URI of Schanulleke points to a WikiData page. That page contains more information about the concept “Schanulleke”: translations, links to other concepts, etc.

The above triple is difficult to display on a screen. To make triples more readable, the larger parts are abbreviated. https://www.wikidata.org/wiki/ is abbreviated to wd.

@prefix wd: <https://www.wikidata.org/wiki/> .
@prefix wdt: <https://www.wikidata.org/wiki/Property:> .

wd:Q2731058 wdt:P31 wd:Q2918349 .

wd:Q2731058 wdt:P31 wd:Q2918349” is thus a more universal representation of “Fanfreluche is a doll”. This significantly reduces the complexity for a computer.

1 + 1 is More Than 2

Of course, Schanulleke is not a standalone concept: Schanulleke is a doll from the comic series Suske & Wiske. In language, these two facts look like th

  • Schanulleke is a doll
  • Schanulleke belongs to Wiske

These two facts about Schanulleke can be translated into two triples. Schanulleke appears twice within these facts (Schanulleke) and thus also appears twice within the triples (wd:Q2731058).

@prefix wd: <https://www.wikidata.org/wiki/> .
@prefix wdt: <https://www.wikidata.org/wiki/Property:> .

wd:Q2731058 wdt:P31 wd:Q2918349 .
wd:Q2731058 wdt:P127 wd:Q2667500 . 

These two facts/triples tell us more than they each do individually:

  • Schanulleke is a doll
  • Schanulleke belongs to Wiske
  • Wiske owns a doll

Because the concept of Schanulleke is reused, more meaning can be extracted from the two triples: 1 + 1 is more than 2.

Multiple triples add up to more information than a single triple

Wiki What? WikiData!

The term has been mentioned a few times already: WikiData. WikiData is an open and free knowledge base. The knowledge base contains 15 miljard triples. Fortunately, the world is much larger than just Suske & Wiske: only a very small subset of these 15 billion triples pertains to Schanulleke.

WikiData is a project of the Wikimedia Foundation the same organization behind Wikipedia. Just like Wikipedia, anyone can add, modify, or delete information. Is the WikiData knowledge base always 100% accurate? No, of course not. Does WikiData contain a remarkable amount of data that gives an idea of magnitudes? Absolutely. WikiData is a valuable knowledge base for demonstration purposes: it contains concepts that everyone knows.

WikiData’s first entity is wd:Q1, in human language “the universe,” followed by wd:Q2, the earth, and wd:Q3, life.

Sprinkles of SPARQL

15 billion triples is a lot. To be able to reason over all this information, a specific language is needed: SPARQL. SPARQL allows you to select certain triples and display the results in table form.

SELECT ?doll ?dollLabel
WHERE
{
  ?doll wdt:P31 wd:Q2918349 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} limit 100

Just like language, this query can be dissected:

  • ?doll wdt:P31 wd:Q2918349 looks like a triple but isn’t. ?doll is a placeholder. The query searches for x in the sentence “x is a doll.”
  • You see SERVICE: this service translates the URIs into a human-readable name.

The query should fetch all dolls from WikiData. The result is a table with two columns: doll contains the URIs, and dollLabel contains a readable name. Feel free to test the query on the Wikidata query service: interactieve versie

result

The next query builds on the previous one: an additional limitation has been added.

  • ?doll wdt:P31 wd:Q2918349 . searches for dolls
  • ?doll wdt:P127 wd:Q2667500 . searches for properties owned by Wiske
SELECT ?doll ?dollLabel
WHERE
{
  ?doll wdt:P31 wd:Q2918349 .
  ?doll wdt:P127 wd:Q2667500 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} limit 100

Both limitations together ensure that only dolls owned by Wiske are displayed. The interactive version can be found here terug te vinden.

result

Linked data

WikiData makes its data available as linked data: it is possible to link data from WikiData to data from other knowledge bases using the correct interfaces. For example:

Both URIs describe the same concept, each within their own knowledge base. WikiData stores the link between these two synonyms:

@prefix wd: <https://www.wikidata.org/wiki/> .
@prefix wdt: <https://www.wikidata.org/wiki/Property:> .

wd:Q1296 wdt:P402 <https://www.openstreetmap.org/relation/897671> .

Thanks to this link, data from WikiData can be combined with data from OpenStreetMap. The dataset with 15 billion triples can thus be further expanded with other datasets. SPARQL can be used to link these sources together:

SELECT DISTINCT ?cityName ?nickname ?coordinates
WHERE {
  hint:Query hint:optimizer "None" .
  SERVICE <https://query.wikidata.org/sparql> {
    SELECT ?cityName ?nickname (IRI(CONCAT("https://www.openstreetmap.org/relation/", STR(?osmId))) AS ?osm)
    WHERE {
      ?stad wdt:P31 wd:Q493522 .
      ?stad wdt:P1705 ?cityName .
      ?stad wdt:P1449 ?nickname .
      FILTER(lang(?nickname) = 'nl') 
      ?stad wdt:P402 ?osmId .
    } limit 50
  }
  ?osm osmm:loc ?coordinates .
}
LIMIT 50

Stel, je wil voor alle Belgische steden hun nickname weergeven op een kaart, dan kan je bovenstaande query (openStreetMap query) gebruiken.

The query combines two data sources:

  • WikiData knows what nicknames a city has.
  • WikiData knows which URI the cities have in OpenStreetMap.
  • OpenStreetMap knows where cities are located on the map.

Combined, this results in a map:

bijnamen op een kaart

This is the power of Linked Data: data is published in the right way so that it can be reused. Data sources can be combined, allowing you to ask complex questions in various directions to one or more data sources simultaneously.

Conclusion

We are sure of one fact: Schanulleke is a doll. A computer can make you believe that Schanulleke is a doll by applying a few principles:

  • Facts can be described with triples. These triples consist of URIs.
  • A collection of triples can be stored in a knowledge base like WikiData.
  • Questions can be asked with SPARQL.

Can you fully get started with linked data with just this article? Probably not. Does it give an impression of linked data and the possibilities it opens? Hopefully yes!

Referenties

  • This article was created in preparation for a Nerdlab DataLab
  • A large Linked Data source - WikiData
  • Proofread by Marnix Rummens