Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Design Patterns: The Triple Provenance Pattern

by Tom Ternquist

MarkLogic design patterns are reusable solutions for many of the commonly occurring problems encountered when designing MarkLogic applications. These patterns may be unique to applications on MarkLogic or may be industry patterns that have MarkLogic specific considerations. Unlike recipes, MarkLogic design patterns are generally more abstract and applicable in multiple scenarios.

Triple Provenance with Document Annotations Design Pattern

Intent

Semantics applications often need to capture provenance information at the triple-level.

Using the Envelope Pattern, annotate JSON/XML serialization of triples with provenance details.

Motivation

When building applications that leverage data from disparate sources, especially in a semantics context, it is common to want to capture provenance information, such as source and last updated time. With RDF alone, reification (i.e. statements about statements, see Reification on Semantic web) is a technique that can be used, but it results in a significant expansion in the number of triples needed and can greatly complicate SPARQL queries.

A solution that can provide provenance details for a triple without the added complexity of reification would be ideal. Fortunately, triples stored on documents in MarkLogic can take advantage of their serialization as JSON and XML to provide an additional level of context. This is achieved through additional metadata on the document, specifically on the triple objects.

Applicability

This pattern is applicable when you need to capture triple-level provenance details. This pattern requires that triples be persisted on documents, not using MarkLogic Managed Triples. This pattern is suitable for cases where the provenance details does not need to be returned directly as part of a SPARQL query but rather it is acceptable to retrieve it off of the document.

Participants

The participants involved implementing this pattern are as follows:

  • Update code for annotating triples
  • Retrieval code for getting provenance detail for triple

Examples of each can be found under Sample Code below.

Collaborations

The retrieval code must be aware of how the update code has persisted the provenance details.

Consequences

This pattern enables the persistence of provenance details for a given triple by storing annotations on triples serialized in JSON or XML. Retrieval of provenance details is facilitated through use of JavaScript or XQuery to path into documents, identify matching target triple and return the annotations.

To take advantage of this pattern, you cannot use Managed Triples and need to add provenance annotations during document insertion / update or prior to ingestion. You must also be able to identify the document where the triple resides.

To take advantage of this pattern, you cannot use Managed Triples and need to add provenance annotations during document insertion / update or prior to ingestion. You must also be able to identify the document where the triple resides.

A trade-off using this pattern is that you cannot use pure SPARQL to get to the provenance details.

Implementation

If you are implementing this pattern, it is important that there is a consistent process for adding and retrieving provenance details.

If your application uses Template Driven Extraction (TDE), you can wrap elements/properties you would like to annotate with provenance details like this:

Here's a sample TDE template:

Sample Code

On XML documents this can be most easily achieved by using attributes on the triples:

Here is the sample approach in JSON, but instead of using attributes, we instead add properties to the triple object:

Here is an example of how you might retrieve the provenance details:

Related Patterns

Envelope Pattern

Conclusion

With triples alone, it can be challenging to capture provenance details without introducing complexity that negatively impacts the usability of your triples and query performance. Through use of MarkLogic's multi-model support, we are able to take advantage of embedding triples on documents with annotations that provide additional context and can be retrieved easily using a small amount of JavaScript or XQuery code.

For more information on additional ways to take advantage of embedding triples on documents, see the Semantics guide and the chapter on Unmanaged Triples.

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

Comments

The commenting feature on this page is enabled by a third party. Comments posted to this page are publicly visible.