Semantic triples are less widely understood than some other data models, and combining them with documents is a capability unique to MarkLogic. This leads to some questions. Happily, Stephen Buxton has answers.
The Semantics Guide describes inferencing.
I’ve attached a Query Console workspace that does “Hello World” inferencing and steps you through using one of the built-in rulesets (RDFS); creating and using your own; and combining rulesets. You can do this via Java or Jena or REST too.
Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts. A workspace lets you import a set of pre-written queries. See instructions to import a workspace.
Inference is a bit tricky to get your head around – you need data (triples); an ontology that tells you about the data (also triples); and rules that define the ontology language (rulesets). It may help to watch this video of Stephen’s talk at MarkLogic World in San Francisco (start at 18:50).
In general yes, inference is expensive no matter what tool you use. When considering inference, you should:
{ ?s ?p ?o }
with inference{ ?s ?p ?o }
in the query is to include all possible rulesets in inference(Note inference is expensive no matter which database you use. Many users of Triple Stores start off with very complex inferencing, and whittle it down as they move toward production.)
A combination query brings together queries about triples and documents in a single search.
In MarkLogic you can do a SPARQL query and restrict the results according to the context document (the document the triples are embedded in). See the Semantics Guide for an example.
You can also search documents, and restrict the results according to the triples embedded in them.
The biggest difference between these two approaches is that the first returns solutions (the things that SPARQL returns) while the second returns documents, or parts of documents. (Side note: many people assume SPARQL returns triples. A SPARQL query returns solutions — that is, a sequence of “rows” according to what you specify in the SELECT clause).
For more examples of combination queries and inference, see the materials for the half-day workshop on MarkLogic semantics, including data, a setup script, and Query Console workspaces.
It depends.
Here’s another place where MarkLogic supports the standards around Triple Stores, AND provides a document store, AND provides a bridge between the two.
If you treat MarkLogic like a Triple Store, then a triple can only belong to one Named Graph; when you DROP that graph (using SPARQL Update), then all the triples in that graph will be deleted. You can also create permissions on the Named Graph, which will apply to all triples in that Named Graph.
If you treat MarkLogic like a Document Store, then Named Graphs map to MarkLogic collections. If the document containing the triple is in collection-A, then you can query Named Graph <collection-A> and find that triple. A document can be in any number of collections, and so triples can be in any number of Named Graphs. If you do an xdmp:collection-delete(), all the documents in that collection will be deleted, even if those documents belong to other collections too. See workspace collections.xml.
A Named Graph is a convenient way to partition triples when using MarkLogic as a Triple Store only. In that case, you may well want to DROP a graph and all its contents.
Document collections are more flexible, but have slightly different semantics (see above).
You can get the equivalent query power of Named Graphs by doing a combination query (SPARQL + a document query), where the document query restricts results to some collection. This is exactly as efficient as querying by Named Graph in SPARQL, but more flexible.
Remember, SPARQL queries don’t return triples, they return solutions. So it doesn’t make sense to “return the documents that the resulting triples came from”. You can filter the results according to some document query with a combination query (see above). And you can find the documents that contain triples that match some graph pattern using
cts:triple-range-query (see above).
It requires some kind of combination query. There’s no way to express in SPARQL “… and return the context document”, especially since the SPARQL query contains solutions rather than triples.
However, every document in MarkLogic is addressed via a unique URI — the “name” of the document. These URIs can be subjects or objects in triples. SPARQL can certainly return document URIs, which you can then de-reference using
fn:doc().
This depends on the overall architecture of the system you are building. All of the ways of interacting with MarkLogic have access to the semantic functionality. If you are using a two-tier architecture, you’ll work with XQuery or JavaScript. If you are using a three-tier architecture with the REST API, your calls will go through the semantics endpoints, possibly using the Java or Node.js Client API. If you’re working with Java, you may want to use the Jena library.
Even with a three-tier architecture, at some point you may want to write some Server-Side code (much the way you’d write PL/SQL code in Oracle) — then you should choose between XQuery and JavaScript, which are equivalent in terms of expressive power. If you want to access that Server-side code via REST, you can write a REST extension.
Since you can specify the kind of inference on a per-query basis, you can run the same query with and without inference and examine the difference.
You should look at DESCRIBE Queries. Also, take a look at
sem:transitive-closure — this is an XQuery library function (which lives in $MARKLOGIC/Modules/MarkLogic/semantics/sem-impl.xqy). If it doesn’t do exactly what you want, you can copy it and make changes.
Faceted Search lets you search over documents and display a value+count alongside the search results, the way a product search on amazon.com shows you the facets for brand, color, price band, and so on. You can build semantics-driven facets by writing a custom constraint.
Yes, MarkLogic works well as a Triple Store. It supports all the major standards – SPARQL 1.1, SPARQL 1.1 Update, Graph Protocol – so it can be used anywhere a regular Triple Store is used. In addition, MarkLogic has Enterprise features such as security, ACID transactions, scale-out, HA/DR, and so on which most Triple Stores don’t have. And many people find that they start out using MarkLogic as “just a Triple Store” and over time they move much of their data – the data that represents entities in the real world – into documents. It’s nice to have that option!
Data is often grouped into entities (such as Person or Article). Consider modeling most entity data as documents and modeling only some of the “attributes” of your entities as triples — those attributes where you need to query across a graph, or up and down a hierarchy, or you need inference. You should also model the semantics of the data as triples — for example, you may want an ontology that indicates “CA” is a state in the USA, and it’s the same as “California”; that “CA” is part of the address; and so on.
For additional perspectives, you can watch David Gorbet’s Escape the Matrix MarkLogic World keynote or Pete Aven and Mike Bower’s Multi-Model Data Integration in the Real World.
Do you have additional questions about Semantics best practices? Ask away!
By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.