Building a Semantic Infopanel

Micah Dubinko
Last updated December 22, 2014

If you haven't seen it, the keynote at MarkLogic World 2013 is worth a look. I was on stage demonstrating new Semantics features built into MarkLogic server. Two of the three demos were based on MarkMail, a database of some 60 million messages, with enhanced search capabilities driven by semantics. (The third demo was a built-from-the-ground-up semantic application).

The first demo was of an infopanel, like this:

Since then, several folks have asked about the code behind the demo. What I showed was a fully operational MarkMail instance, including millions of email messages running on Amazon Web Services and, as you can with utility computing, we simply put down the demo after the keynote. A huge part of the demo was showing operation at scale, but reading between the lines, what folks are more interested in is something more portable--a way to see the code in operation and play with it without having to stand up an entire cluster or go through a lengthy setup procedure.

Space won't allow for a full semantics tutorial here. For that, see either the Semantics Getting Started Exercises or the free tutorial from MarkLogic University.

So, in this posting, let's recreate something on a similar level using built-in features. We'll use the Oscars sample application that ships with the product. To get started, create an Application Builder sample project and deploy it. We'll call the relevant database names 'oscar' and 'oscar-modules' throughout. Since Application Builder ships with only a small amount of data, you may also want to run the sample Information Studio collector that will fetch the rest of the dataset.

Before we can query, we need to actually turn on the semantics index. The easiest place to do this is on the page at http://localhost:8000/appservices/. Select the oscar database and hit configure. On the page that comes up, tick the box for Semantics and wait for the yellow flash.

Semantic Data

This wouldn't be much of a semantic application without triple data. Entire books have been written on this kind of data modeling, but one huge advantage of semantics is that there's lots of data already set up and ready to go. We'll use dbpedia. The most recent release as of this writing is version 3.9.

From there we'll grab data that looks relevant to the Oscar application: anything about people and/or movies, picking and choosing from things most likely to have relevant facts:

In all, about 38 million triples--not even enough to make MarkLogic break a sweat, but still a large enough chunk to be inconvenient to download. Since the oscar data and dbpedia ultimately derive from the same source--Wikipedia itself. Since the oscar data preserved URLs it was straightforward to extract all triples that had a matching subject, once prefixed with "http://dbpedia.org/resource/".

I extracted all these triples: Grab them from here and put it somewhere on your local system.

Then simply load these triples via query console. Point the target database to 'oscar' and run this:

import module namespace sem="http://marklogic.com/semantics"
  at "MarkLogic/semantics.xqy";
sem:rdf-load("/path/to/oscartrips.ttl")

Infopanel widget

So an 'infopanel' is what in the MLW demo showed the Hadoop logo, committers, downloads, and other facts about the current query. The default oscar app already has something like this: widgets. Let's create a new widget type that looks up and displays facts about the current query. To start, if you haven't already, build the example application in App Builder. There's some excellent documentation that walks through this process.

Put on your Front End Dev hat and let's build a widget. All the code we will use and modify is in the oscar-modules database, so either hook up a WebDav server or copy the files out to your filesystem to work on them. Back in AppBuilder on the Assemble page, click the small X at the upper-right corner of the pie chart widget. This will clear space for the widget we're about to create, specifically in the div <div id="widget-2" class="widget widget-slot">.

The way to do this is to modify the file application/custom/app-config.js All changes to files in the custom/ directory will survive a redeployment in AppBuilder, which means your changes will be safe, even if you need to go back and change things in Application Builder.

function infocb(dat) {
 $("#widget-2").html("<h2>Infopanel</h2><p>The query is " +
    JSON.stringify(dat.query) + "</p>");
 };
var infopanel = ML.createWidget($("#widget-2"), infocb, null, null);

This gives us the bare minimum possible widget. Now all that's left is to add semantics.

Hooking up the Infopanel query

We need a semantic query, the shape of which is: "starting with a string, find the matching concept, and from that concept return lots of facts to sift through later".

And we have everything we need at hand with MarkLogic 7. The REST endpoint, already part of the deployed app, includes a SPARQL endpoint. So we need to make the new widget fire off a semantic query in the SPARQL language, then render the results into the widget. One nice thing about the triples in use here is that they consistently use the foaf:name property to map between a concept and its string label. So pulling all the triples based on a string-named topic works like this. Again, we'll use Query Console to experiment:

import module namespace sem = "http://marklogic.com/semantics"
    at "/MarkLogic/semantics.xqy";
let $str := "Zorba the Greek"
let $sparql := "
prefix foaf: <http://xmlns.com/foaf/0.1/>
construct { ?topic ?p ?o }
where
{ ?topic foaf:name $str .
?topic ?p ?o . }
"
return sem:sparql($sparql, map:entry("str", $str))

Here, of course, to make this Query Console runnable we are passing in a hard-coded string ("Zorba the Greek") but in the infopanel this will come from the query.

Of course, deciding what parts of the query to use could be quite an involved process. For example, if the query included [decade:1980s] you can imaging all kinds of interesting semantic queries that might produce useful and interesting results. But to keep things simple, we will look for only a s single word query, which includes quoted phrases like "Orson Welles". Also in the name of simplicity, the code sample will only use a few possible predicates. Choosing which predicates to use, and in what order to display them, is a big part of making an infopanel useful.

Here's the code. Put this in config/app-config.js:

function infocb(dat) {
  var qtxt = dat.query && dat.query["word-query"] &&
        dat.query["word-query"][0] && dat.query["word-query"][0].text &&
        dat.query["word-query"][0].text._value
  if (qtxt) {
    $.ajax({
      url: "/v1/graphs/sparql",
      accepts: { json:"application/rdf+json" },
      dataType: "json",
      data: {query:
        'prefix foaf: <http://xmlns.com/foaf/0.1/> ' +
        'construct { ?topic ?p ?o } ' +
        'where ' +
        '{ ?topic foaf:name "' + qtxt + '"@en . ' +
        '?topic ?p ?o . }'
      },
      success: function(data) {
        var subj = Object.keys(data); // ECMAscript 5th ed, IE9+
        var ptitle = "http://xmlns.com/foaf/0.1/name";
        var pdesc = "http://purl.org/dc/elements/1.1/description";
        var pthumb = "http://dbpedia.org/ontology/thumbnail";
        var title = "-";
        var desc = "";
        var thumb = "";
        if (data[subj]) {
          if (data[subj][ptitle]) {
            title = data[subj][ptitle][0].value;
          }
          if (data[subj][pdesc]) {
            desc = "<p>" + data[subj][pdesc][0].value + "</p>";
          }
          if (data[subj][pthumb]) {
            thumb = "<img style='width:150px; height:150px' src='" +
                data[subj][pthumb][0].value + "'/>";
          }
        }
        $("#widget-2").html("<h2>" + title + "</h2>" + desc + thumb );
      }
   });
  } else { $("#widget-2").html("no data")} 
};

var infopanel = ML.createWidget($("#widget-2"), infocb, null, null);

This works by crafting a SPARQL query and sending it off to the server. The response comes back in RDF/JSON format, with the subject as a root object in the JSON, and each predicate against that subject as a sub-object. The code looks through the predicates and picks out interesting information for the infopanel, formatting it as HTML.

I noted in working on this that many of the images referenced in the dbpedia image dataset actually return 404 on the web. If you are not seeing thumbnail images for some queries, this may be why. An infopanel implementation can only be as helpful as the data underneath. If anyone knows of more recent data than the official dpbedia 3.9 data, do let me know.

Where to go from here

I hope this provides a base upon which many developers can play and experiment. Any kind of app, but especially a semantic app, comes about through an iterative process. There's a lot of room for expansion in these techniques. Algorithms to select and present semantic data can get quite involved; this only scratches the surface.

The other gem in here is the widget framework, which has actually been part of all Application Builder apps since MarkLogic 6. Having that technology as a backdrop made it far easier to zoom in and focus on the semantic technology. Try it out, and let me know in the comments how it works for you.

Comments

  • In order to get the multiple word match to work, I just changed the javascript to be var qtxt = dat.qtext instead of var qtxt = dat.query && dat.query["word-query"] && dat.query["word-query"][0] && dat.query["word-query"][0].text && dat.query["word-query"][0].text._value This works. However, note that you have to use the correct case "Zorba the Greek" to get it to match. Also note that I changed the input triples to change the thumbnail entries two ways: :%s/\/commons\//\/en\//g :%s/\/200px/\/150px/g One thing that I'd still like to know is perhaps what the thought is behind case-insensitivity with regards to SPARQL. Ideally, I'd like the SPARQL query to take in what the user typed in and insensitively match the appropriate field. It seems like you can only do this with a FILTER and regular expression. Is that the case? If so, is there a better way to do it using XQuery rather than SPARQL? In other words, a recommendation for using XQuery in this instance rather than SPARQL?
    • David thank you for all of your analysis and fixes in this tutorial. Some of what you found was intentional, in that this was a quick demo, and other things -- well, they need fixing. The last fix you found might not work in all cases. It actually relied on some unsupported code! I will review this tutorial and help incorporate the changes. The case-insensitivity of "Zorba the Greek" is due simply to a hack -- the tutorial uses this string to construct a IRI -- I wouldn't recommend doing that at all! In a non-demo scenario, you'd want to construct a SPARQL query whereby you match some property of the :ZorbaTheGreek object, rather than us its IRI directly.
  • The description is narrowed to people by http://purl.org/dc/elements/1.1/description and all people have two names, so you have to fix the part about being able to enter two words before being able to retrieve the description. And if you type in a movie name, you'll never get the description because the movies don't have them. I might try adding http://www.w3.org/2000/01/rdf-schema#comment in case the description doesn't exist.
  • The javascript in application/custom/app-config.js only works for single word queries. As soon as there's a second word, there's and "and query" around the "word query" structure, which then keeps the qtxt variable from having anything in it. Another observation is that the thumbnail links for Wikipedia don't seem to be correct any longer - there is a "common" in the path that now appears to be "en". The description also doesn't seem to work, but I haven't found the reason for that yet. On a bright note, title works - that is, if you use a one word title. Try "Amarcord".
  • I changed the qconsole code to this and it worked. xquery version "1.0-ml"; import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy"; let $str := '"Zorba the Greek"@en' let $sparql := fn:concat(" prefix foaf: <http://xmlns.com/foaf/0.1/> construct { ?topic ?p ?o } where { ?topic foaf:name ", $str, " . ?topic ?p ?o . } ") return sem:sparql($sparql, map:entry("str", $str)) The "no data" may be because of the REST call not succeeding - going to try to get that call to work outside of js...
  • A few comments 1. Hard coded query for "Zorba the Greek" didn't work in query console. Checked oscartips.ttl file and saw it was displayed as "Zorba the Greek"@en After removing @en the hardcoded query worked in query console 2. Do you mean the code should reside in custom/app-config.js and not config/app-config.js ? 3. Got the app to work and show "no data" but can't seem to fill the infobox with anything else. Checked that triples were loaded properly and they seem to be in the database. Will keep trying.