The revolutionary Optic API blends the relational world with rich NoSQL document features by providing the capability to perform joins and aggregates over documents. One of the enabling features, Template Driven Extraction, makes it possible to create a relational lens over documents stored in MarkLogic by using templates to specify the parts of a document that make up a row in a view. You can access that data using either SQL or the Optic API.
The MarkLogic Optic API makes it possible to perform powerful NoSQL joins and aggregates across documents. Developers can use document search to retrieve rows projected from documents, join documents as columns within rows, and construct document structures dynamically — all performed efficiently within the database and accessed programmatically from your application.
NoSQL databases bias towards denormalization of data, where documents represent real world entities. This data model supports development and search — all information about an entity is conveniently in one place for efficient retrieval and update. But in some cases your entities are naturally in normalized form and it is better to keep them that way. Or, your questions rely on the relationships across entities. The flexible Optic API allows you to use your data as-is, producing denormalized data on read for use in your applications. The API makes it possible to harness the full power of MarkLogic's document and search capabilities using familiar syntax, incorporating common SQL concepts, regardless of the underlying shape of your data.
For the examples in this section you need MarkLogic 9 installed and configured. See the Installation Procedures if you have not previously done this step.
Sample queries use data from the Samplestack project. You can use Samplestack's provided gradle scripts to secure and load the database.
To make use of Samplestack's set up process:
- Make sure you have git and Java 8 installed and on your path
git clone https://github.com/marklogic/marklogic-samplestack
- Create and configure the database by running
./gradlew assemble(it's okay if some of the tests fail; note that you must have port 8006 free before setting up Samplestack)
- Load the data set using
By default this process will secure MarkLogic with the username admin, password admin. If you have already secured MarkLogic, modify appserver/java-spring/gradle.properties before running the gradle scripts using your own admin credentials.
Alternatively, you can view and load the raw data yourself.
Example 0 -- Templates for Samplestack Data
We'll start by surfacing key information from our data set by creating templates. Template Driven Extraction (TDE) indexes documents as-is in the row index for scalable aggregates and joins. Read more about template driven extraction for details on laying a relational lens over your data for use in the Optic API.
Samplestack seed data comes from Stack Overflow. The data set consists of two types of JSON documents, which are modeled on real-world entities and activities:
- One type of document represents Question and Answer discussion threads (sample QnA doc)
- The other document type represents those who contribute to QnA threads by asking and answering questions, like a user profile (sample Contributor doc)
There can be a many-to-many relationship between QnA and Contributor documents. The document data model is well suited to Samplestack, which prioritizes searchability, so users can find the QnA thread most relevant to their needs. The application also supports updates, and the flexible document model easily adapts to capture additional answers, comments, or other data that might later be part of a QnA thread.
By putting key fields, like User Reputation, into the row index, we're adding a relational lens over our documents. For example, we can quickly aggregate Reputation across Users. Data in rows can also be efficiently joined with other distributed, yet related, data -- such as QnA threads where the Users participate.
Template Driven Extraction
The included examples make use of a few different views. Open Query Console to define lenses over the sample data. Download and import the Samplestack-TDE workspace and follow the instructions to load the templates.
Follow the same process to load QnA, Answer, and QTag templates by running the included tabs. Using
tde.templateInsert will upload the provided template to the Schemas database with the
tde collection and automatically trigger population of the row index based on content in MarkLogic. In the provided workspace you'll also find a couple of tabs to validate and explore the views created over your documents. Here is a look at the three relational lenses we are using over our two Samplestack document types:
Now it's time to use the Optic API to query the data with the exposed relational lenses. For the best experience, use the language driver you are most comfortable with:
- The MarkLogic Java Client API supports Optic query construction and row result processing -- see detailed Java Optic examples here.
- XQuery Examples are available at the bottom of the current page.
- The REST API supports Optic queries and is great for those who want to build their own language wrappers. See REST API Examples farther down the page for tips on getting started.
Example 1 -- Row Query
Begin using the Optic API by performing a basic query on a view over documents. Querying the view will return rows. Run the following in Query Console to Query User Rows:.
That's it! You've just retrieved your first row set. In Query Console, you can choose to view your results as a SQL table, JSON, or text. As JSON, your results should look something like this:
In this example, a query is constructed to return rows
fromView "Contributors", in the "Samplestack" schema. It is specifically retrieving the columns
UserLocation. Resulting columns are qualified by "SchemaName.ViewName".
Additional functions modify the result set, and should look familiar to those with SQL experience:
orderBysorts the row results by the UserReputation column
wherefilters the results to rows where the UserReputation column is greater than the specified minimum reputation level
limitconstrains the number of rows returned (to 25 in this case)
The final step is to execute the query using
result. Until this point you are only building the query plan.
Note the import of the
The SQL equivalent for this query is provided for reference:
Take a minute to explore the other included Query tab to Query QnA Rows.
In these basic examples, each of the rows returned is contained within a single document. The following examples introduce increasingly more powerful capabilities, including the opportunity to return rows that span documents.
Example 2 – Aggregates and Grouping
Use the MarkLogic Optic API to conveniently perform aggregate functions on values across documents. In this example, several operations are performed to get a sense of basic statistics about the Samplestack User data. Run the following query to aggregate user stats:
The first parameter in the
groupBy is empty to groupBy all rows. A set of aggregates are defined to count the users, calculate the minimum/maximum/average values for Reputation, and compute the fraction of users who have specified a location in their profile. All of these aggregate functions are in the Optic library and the operations must be qualified. In the
select statement, further processing is done to format the output: average reputation is truncated and the location fraction is displayed as a percentage. The Optic API has access to MarkLogic’s other function libraries such as
fn. Calls to built-in libraries must be prefixed using the convention
op so calls are deferred and executed with the rest of the query plan as optimized by the engine. When passing column values to built-in libraries as parameters, use the
col identifier, especially to distinguish from strings.
The next query performs similar operations in order to aggregate QnA stats. One difference from the previous query is that a grouping parameter – column name ‘Accepted’ – is applied to the
groupBy statement to separate summary statistics for questions that have an accepted answer from those questions that do not have an accepted answer.
Example 3 – Row Joins
This example joins Question & Answer information with related user data to return a richer result. That is, more data elements (columns) are returned than are stored in the QnA document itself – this is denormalized data-on-read, in action!
Run the following query to join rows in order to answer the question, "What is the reputation of the asker for the highest voted questions?":
The questions (QnA) view is joined with the users (Contributors) view
on user name – which is called "Asker" and "ContributorUserName" in the respective source views. This value is used as a key to join related information. The inner join yields one output row set that concatenates one left row and one right row for each match between the keys in the left and right row sets. An inner join is a type of filter. If there is no user profile row corresponding to the question document, fewer than 10 results will be returned.
The Optic API is purposefully flexible about the ability to specify the sequence and frequency of operations.
orderBy is deliberately used twice in this example. Because we are only interested in the 10 most voted questions, we can filter (order+limit) so we only join on the rows of interest. Because relational joins are not guaranteed to preserve order, we sort the final output.
desc specifies the sort order.