Introducing SPARQL

There are a good number of online resources for learning SPARQL including:

As well, we recommend the following books:

And, of course, there is the W3C SPARQL spec and their published Glossary of Linked Data terms.

We assume you can learn SPARQL syntax elsewhere. In this exercise, we will write a series of SPARQL queries over the data you've just loaded in the previous exercise.

We have provided versions of all the queries for this exercise in an associated Query Console workspace, ts-sparql.xml. You may wish to try formulating the queries yourself, before reading the solutions in the workspace. Or you can simply download and import them into Query Console and try them, if you're feeling challenged :).

Browsing the graph

In order to understand what's in our data, it's helpful to explore a bit. The REST API exposes an endpoint for browsing around your graph. Point your browser at http://localhost:9910/v1/graphs/things (replace localhost as needed) and you will see the first 10,000 nodes (listed by IRIs) in the database.

I happen to be interested in bridges so when I did this, I clicked on the <http://dbpedia.org/resource/Brooklyn_Bridge> and got back all the triples that reference the Brooklyn Bridge. Go ahead and do that yourself.

From there, I clicked on the predicate for geographic points (<http://www.georss.org/georss/point>). If you do this, you will see the first 10,000 geo points we have. Scrolling down the results, you'll eventually see a subject: <http://dbpedia.org/resource/Brooklyn>. We've found what looks to be a resource identifier for the city of Brooklyn. You can then click on it to see all the facts about Brooklyn. (Alternatively, if you were looking for Brooklyn to start with, you could have gone and read about DBPedia and learned that it uses the prefix <http://dbpedia.org/resource> for resources).

Asking Questions of DBPedia

You have an identifier for Brooklyn. So, let's see what we can find out about it.

You can see from the things endpoint, that we have facts that use the predicate: <http://dbpedia.org/ontology/birthPlace>. So you can ask "Who was born in Brooklyn?". You can write that in SPARQL as:

You can actually write that so it is a little more readable, with prefixes, as:

You can now see that Danny Kaye was born in Brooklyn. What else do you know about him? You can ask that as

You can use Query Console to execute these SPARQL queries against the tutsem database (make sure to choose Query Type: SPARQL). I'll leave the next few to you:

  1. Find all predicates and objects with Danny Kaye as subject
    1. Return the answer as triples - i.e. Danny Kaye - predicate - object (Hint: SPARQL SELECT returns "solutions"; SPARQL CONSTRUCT returns "triples")
    2. Alternatively, do this via a DESCRIBE query
  2. Who else was born in the same place as Danny Kaye?
  3. Who was born in the same place as Danny Kaye AND died in Seattle?
  4. Find everyone who was born the same place as Danny Kaye OR who died in Washington DC? Return results in descending order of name.

News Data

The BBC data contains news articles and metadata stored as triples. One of the vocabularies used is rnews (<http://iptc.org/std/rNews/2011-10-07#>). You can go learn about rnews when you have time, but for now, let's take as given that it uses the following identifiers:

  • NewsItem - ID of the news item
  • headline
  • datePublished

If you recall, we loaded the news triples into the graph "http://www.bbc.co.uk/news/graph".

  1. Can you find all the headlines and dates of news items in the graph "http://www.bbc.co.uk/news/graph", ordered by date? Try this:

  2. Now, try finding all the headlines and dates of news items in the graph "http://www.bbc.co.uk/news/graph", ordered by date, but only show the items newer than July 11 2013. (hint: use FILTER)
  3. Next, find all the headlines and dates of news items in the graph "http://www.bbc.co.uk/news/graph", ordered by date, but only show the second "page" of results (a page is 25 items).
  4. What if a news item doesnt have a datePublished? Modify your headlines query to include headlines of items that don't have a date. (NB: The dataset doesn't actually have items with missing dates).
  5. Find all the headlines and dates of news items in the graph http://www.bbc.co.uk/news/graph", ordered by date, but only show the items where the headline contains "Elton John"
  6. Are there any news items in the graph http://www.bbc.co.uk/news/graph" newer than August 1st 2013?

If you recall from our data loading exercise, we learned a little about how the IRIs of our news documents are expressed. Now, let's say we want to find out something about one of the news documents. Let's get all the subjects added by OpenCalais and organize them by type. We can do that by issuing a SPARQL query based on a specific IRI like for example, the one below on <http://www.bbc.co.uk/news/world-asia-22965046>:

Nifty!

For additional credit:

  • Try running the SPARQL queries via REST
  • Run some SPARQL queries as XQuery Search API extensions

See also News_Search.xml for additional advanced queries

References

Loading Data

SPARQL and XQuery Together

Comments

  • How do we model relationships of a profile that are time bound? Suppose we have a document of a person profile, and it has several memberships to various organizations at certain points in time. If I have like 2 or several hundreds of such relationships in the profile, how would you suggest we model triples embedded into this profile document? e.g. alpha studiedIn MIT, during 2000 upTo 2004 I think our issue here is how do we distinguish date ranges and its context.
    • I don't have a detailed answer, but one of my colleagues suggested pointing you to the <a href="https://www.w3.org/TR/2016/WD-owl-time-20160712/">Time Ontology</a>.