The revolutionary Optic API blends the relational world with rich NoSQL document features by providing the capability to perform joins and aggregates over documents. One of the enabling features, Template Driven Extraction, makes it possible to create a relational lens over documents stored in MarkLogic by using templates to specify the parts of a document that make up a row in a view. You can access that data using either SQL or the Optic API.
The Optic API uses row, triple, and/or lexicon lenses over documents, and is powered by the new row index. With the Optic API, you can use full-text document search as a filter, perform row operations to join and aggregate data, and retrieve or construct documents on output. It’s available in all of MarkLogic’s supported languages: JavaScript, XQuery, REST, Java, and soon Node.js. Each implementation adopts language-specific patterns so it feels conceptually familiar if you have relational experience and syntactically natural given your existing programming knowledge.
Here is an example of a SQL query converted to using the Optic API in JavaScript:
select UserLocation, count(QnATitle) as questionCount from Samplestack.Contributors inner join Samplestack.QnA on Asker=ContributorUserName group by UserLocation limit 50
op.fromView('Samplestack', 'Contributors') .joinInner(op.fromView('Samplestack','QnA'), op.on('Asker','ContributorUserName')) .groupBy('UserLocation', op.count('questionCount', 'QnATitle')) .limit(50) .result();
We’ll get into more detail regarding the similarities as we go along.
The MarkLogic Optic API makes it possible to perform powerful NoSQL joins and aggregates across documents. Developers can use document search to retrieve rows projected from documents, join documents as columns within rows, and construct document structures dynamically — all performed efficiently within the database and accessed programmatically from your application.
NoSQL databases bias towards denormalization of data, where documents represent real world entities. This data model supports development and search — all information about an entity is conveniently in one place for efficient retrieval and update. But in some cases your entities are naturally in normalized form and it is better to keep them that way.
Or, your questions rely on the relationships across entities. The flexible Optic API allows you to use your data as-is, producing denormalized data on read for use in your applications. The API makes it possible to harness the full power of MarkLogic’s document and search capabilities using familiar syntax, incorporating common SQL concepts, regardless of the underlying shape of your data.
For the examples in this section, you need MarkLogic 9 installed and configured. Sample queries use data from the Samplestack project. You can use Samplestack’s provided gradle scripts to secure and load the database.
To make use of Samplestack’s set up process:
git clone https://github.com/marklogic/marklogic-samplestack
cd marklogic-sampletack/appserver/java-spring
./gradlew assemble
(it’s okay if some of the tests fail; note that you must have port 8006 free before setting up Samplestack)./gradlew dbload
By default this process will secure MarkLogic with the username admin, password admin. If you have already secured MarkLogic, modify appserver/java-spring/gradle.properties before running the gradle scripts using your own admin credentials.
Alternatively, you can view and load the raw data yourself (download and extract), by running the following script requiring MLCP:
/path/to/mlcp.sh import -host localhost -port 8000 -username admin -password admin -input_file_path seed-data -output_uri_replace ".*seed-data,''"
This will load the documents into the “Documents” database.
We’ll start by surfacing key information from our data set by creating templates. Template Driven Extraction (TDE) indexes documents as-is in the row index for scalable aggregates and joins. Read more about template driven extraction for details on laying a relational lens over your data for use in the Optic API.
Samplestack seed data comes from Stack Overflow. The data set consists of two types of JSON documents, which are modeled on real-world entities and activities:
There can be a many-to-many relationship between QnA and Contributor documents. The document data model is well suited to Samplestack, which prioritizes searchability, so users can find the QnA thread most relevant to their needs. The application also supports updates, and the flexible document model easily adapts to capture additional answers, comments, or other data that might later be part of a QnA thread.
By putting key fields, like User Reputation, into the row index, we’re adding a relational lens over our documents. For example, we can quickly aggregate Reputation across Users. Data in rows can also be efficiently joined with other distributed, yet related, data — such as QnA threads where the Users participate.
Template-Driven Extraction
The included examples make use of a few different views. Open Query Console to define lenses over the sample data.
Download and import the Samplestack-TDE workspace and follow the instructions to load the templates. Please make sure to select the samplestack database as the workspace defaults to “Documents” database.
Use JavaScript to load the user template:
'use strict' declareUpdate(); var tde = require("/MarkLogic/tde.xqy"); // Load the user template for user profile rows var template = xdmp.toJSON( { "template":{ "context":"/com.marklogic.samplestack.domain.Contributor", "rows":[ { "schemaName":"Samplestack", "viewName":"Contributors", "columns":[ { "name":"ContributorUserName", "scalarType":"string", "val":"userName" }, { "name":"UserReputation", "scalarType":"long", "val":"reputation", "nullable": true }, { "name":"UserLocation", "scalarType":"string", "val":"location", "nullable": true, "invalidValues":"ignore" }, { "name":"UserURI", "scalarType":"string", "val":"xdmp:node-uri(.)" } ] } ] } } ); tde.templateInsert("/ss-ea/user-TDE.tdej", template);
Follow the same process to load QnA, Answer, and QTag templates by running the included tabs. Using tde.templateInsert
will upload the provided template to the Schemas database with the tde
collection and automatically trigger population of the row index based on content in MarkLogic.
In the provided workspace, you’ll also find a couple of tabs to validate and explore the views created over your documents. Here is a look at the three relational lenses we are using over our two Samplestack document types:
Now it’s time to use the Optic API to query the data with the exposed relational lenses. For the best experience, use the language driver you are most comfortable with:
To explore the Optic API’s capabilities, you can paste these examples into Query Console. Choose “JavaScript” or “XQuery” as the query type and use the “samplestack” database.
Begin using the Optic API by performing a basic query on a view over documents. Querying the view will return rows. Run the following in Query Console to Query User Rows:
'use strict' const op = require('/MarkLogic/optic'); const minRep = 5000; op.fromView('Samplestack', 'Contributors') .select(['ContributorUserName', 'UserReputation', 'UserLocation']) .orderBy('UserReputation') .where(op.gt(op.col('UserReputation'), minRep)) .limit(25) .result();
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; declare default function namespace "http://marklogic.com/optic"; declare option xdmp:mapping "false"; let $min-rep := 5000 return from-view("Samplestack", "Contributors") => select(("ContributorUserName", "UserReputation", "UserLocation")) => order-by("UserReputation") => where(gt(col("UserReputation"), $min-rep)) => limit(25) => result()
That’s it! You’ve just retrieved your first row set. In Query Console, you can choose to view your results as a SQL table, JSON, or text. As JSON, your results should look something like this:
{ "Samplestack.Contributors.ContributorUserName": "souser499497@example.com", "Samplestack.Contributors.UserReputation": 5001, "Samplestack.Contributors.UserLocation": "Romania" } { "Samplestack.Contributors.ContributorUserName": "souser543693@example.com", "Samplestack.Contributors.UserReputation": 5003, "Samplestack.Contributors.UserLocation": "Warren, MI" } { "Samplestack.Contributors.ContributorUserName": "souser358163@example.com", "Samplestack.Contributors.UserReputation": 5005, "Samplestack.Contributors.UserLocation": "Karachi, Pakistan" }
In this example, a query is constructed to return rows fromView
“Contributors”, in the “Samplestack” schema. It is specifically retrieving the columns ContributorUserName
, UserReputation
, and UserLocation
. Resulting columns are qualified by “SchemaName.ViewName”.
Additional functions modify the result set, and should look familiar to those with SQL experience:
orderBy
sorts the row results by the UserReputation columnwhere
filters the results to rows where the UserReputation column is greater than the specified minimum reputation levellimit
constrains the number of rows returned (to 25 in this case)The final step is to execute the query using result
. Until this point you are only building the query plan.
Note the import of the /MarkLogic/optic
JavaScript library at the beginning of the code sample, using the convention op
.
The SQL equivalent for this query is provided for reference:
SELECT ContributorUserName, UserReputation, UserLocation FROM Samplestack.Contributors WHERE UserReputation > 5000 ORDER BY UserReputation LIMIT 25 OFFSET 0;
You will notice the names and purposes of the Optic API operations are familiar from SQL, but have the benefit of language integration. Not only is the idiomatic, chainable function style familiar to JavaScript programmers, but the ability to incorporate variables makes data retrieval a safer, smoother experience. The use of the minRep
constant in this example demonstrates the benefits of using the Optic API over using Common Table Expressions and string concatenation to submit a SQL query from JavaScript.
Take a minute to explore the other included Query tab to Query QnA Rows.
In these basic examples, each of the rows returned is contained within a single document. The following examples introduce increasingly more powerful capabilities, including the opportunity to return rows that span documents.
Use the MarkLogic Optic API to conveniently perform aggregate functions on values across documents. In this example, several operations are performed to get a sense of basic statistics about the Samplestack User data.
Run the following query to aggregate user stats:
'use strict' const op = require('/MarkLogic/optic'); op.fromView('Samplestack', 'Contributors') .groupBy(null, [ op.count('userCount', 'contributorusername'), op.min('minReputation', 'userReputation'), op.avg('aRep', 'userReputation'), op.max('maxReputation', 'userReputation'), op.count('locationCount', 'userlocation') ]) .select(['userCount', 'minReputation', op.as('avgReputation', op.math.trunc(op.col('aRep'))), 'maxReputation', op.as('locationPercent', op.fn.formatNumber(op.xs.double( op.divide(op.col('locationCount'), op.col('userCount'))),'##%')) ]) .result();
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; import module namespace ofn="http://marklogic.com/optic/expression/fn" at "/MarkLogic/optic/optic-fn.xqy"; import module namespace omath="http://marklogic.com/optic/expression/math" at "/MarkLogic/optic/optic-math.xqy"; import module namespace oxs="http://marklogic.com/optic/expression/xs" at "/MarkLogic/optic/optic-xs.xqy"; declare option xdmp:mapping "false"; (: Get basic stats about some of the numeric values in Samplestack: * Users - count, reputation range, sparseness of location :) op:from-view("Samplestack", "Contributors") => op:group-by((), ( op:count("userCount", "contributorusername"), op:min("minReputation", "userReputation"), op:avg("aRep", "userReputation"), op:max("maxReputation", "userReputation"), op:count("locationCount", "userlocation") )) => op:select(("userCount", "minReputation", op:as("avgReputation", omath:trunc(op:col("aRep"))), "maxReputation", op:as("locationPercent", ofn:format-number(oxs:double( op:divide(op:col("locationCount"), op:col("userCount"))), "##%")) )) => op:result()
The first parameter in the groupBy
is empty to groupBy all rows, based on no particular column nor value. A set of aggregates are defined to count the users, calculate the minimum/maximum/average values for Reputation, and compute the fraction of users who have specified a location in their profile. All of these aggregate functions are in the Optic library and the operations must be qualified. In the select
statement, further processing is done to format the output: average reputation is truncated and the location fraction is displayed as a percentage.
The Optic API has access to MarkLogic’s other function libraries such as math
and fn
. Calls to built-in libraries must be prefixed using the convention op
so calls are deferred and executed with the rest of the query plan as optimized by the engine. When passing column values to built-in libraries as parameters, use the col
identifier, especially to distinguish from strings.
The next query performs similar operations in order to aggregate QnA stats. One difference from the previous query is that a grouping parameter – column name ‘Accepted’ – is applied to the groupBy
statement to separate summary statistics for questions that have an accepted answer from those questions that do not have an accepted answer.
'use strict' const op = require('/MarkLogic/optic'); op.fromView('Samplestack', 'QnA') .groupBy('Accepted', [ op.count('questionCount', 'QnATitle'), op.min('earliestDate', 'QAskDate'), op.max('latestDate', 'QAskDate'), op.avg('avgNumAnswers', 'AnswerCount'), op.avg('avgQuestionVotes', 'QVoteCount') ]) .result();
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; declare option xdmp:mapping "false"; (: Get basic stats about some of the numeric values in Samplestack: * QnA Threads - count, date range, % answered, avg # answers, vote count/itemtally :) op:from-view("Samplestack", "QnA") => op:group-by("Accepted", ( op:count("questionCount", "QnATitle"), op:min("earliestDate", "QAskDate"), op:max("latestDate", "QAskDate"), op:avg("avgNumAnswers", "AnswerCount"), op:avg("avgQuestionVotes", "QVoteCount") )) => op:result()
This example joins Question & Answer information with related user data to return a richer result. That is, more data elements (columns) are returned than are stored in the QnA document itself – this is denormalized data-on-read, in action!
Run the following query to join rows in order to answer the question, “What is the reputation of the asker for the highest voted questions?”:
'use strict' const op = require('/MarkLogic/optic'); const qs = op.fromView('Samplestack','QnA') const users = op.fromView ('Samplestack', 'Contributors') qs.orderBy(op.desc('QVoteCount')) .limit(10) .joinInner(users, op.on('Asker','ContributorUserName')) .orderBy(op.desc('QVoteCount')) .select(['QnATitle', 'QVoteCount', 'Asker', 'UserReputation']) .result();
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; declare option xdmp:mapping "false"; (: Join 2 types of rows - Users & Questions What is the reputation of the asker for the highest voted questions? :) let $qs := op:from-view("Samplestack", "QnA") let $users := op:from-view("Samplestack", "Contributors") return $qs => op:order-by(op:desc("QVoteCount")) => op:limit(10) (: only interested in the top 10 :) => op:join-inner($users, op:on("Asker", "ContributorUserName")) => op:order-by(op:desc("QVoteCount")) => op:select(("QnATitle", "QVoteCount", "Asker", "UserReputation")) => op:result()
The questions (QnA) view is joined with the users (Contributors) view on
user name– which is called “Asker” and “ContributorUserName” in the respective source views. This value is used as a key to join related information. The inner join yields one output row set that concatenates one left row and one right row for each match between the keys in the left and right row sets. An inner join is a type of filter. If there is no user profile row corresponding to the question document, fewer than 10 results will be returned.
The Optic API is purposefully flexible about the ability to specify the sequence and frequency of operations. orderBy
is deliberately used twice in this example. Because we are only interested in the 10 most voted questions, we can filter (order+limit) so we only join on the rows of interest. Because relational joins are not guaranteed to preserve order, we sort the final output. desc
specifies the sort order. You can try removing the second `op:order-by`/`op.orderBy` to see how it affects the final result.
Almost all activities with a database start with exploration of what tables are available and how those tables look like.
Here’s an example of that activity in SQL:
SHOW DATABASES USE database_name SHOW TABLES DESC table_name
There is currently no direct counterpart for displaying the columns of a table. But the below query will provide a good background regarding available views:
xdmp.invokeFunction( function() { return fn.collection('http://marklogic.com/xdmp/tde') } , {'database':xdmp.schemaDatabase()} )
function() { fn:collection('http://marklogic.com/xdmp/tde') } , <options xmlns="xdmp:eval"> <database>{xdmp:schema-database()}</database> </options> )
SELECT * FROM samplestack.contributors LIMIT 10
'use strict' const op = require('/MarkLogic/optic'); op.fromView("samplestack", "contributors") .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "contributors") =>op:limit(10) =>op:result()
The sample result would be as follows:
In JSON:
[ { "Samplestack.Contributors.ContributorUserName": "souser1730908@example.com", "Samplestack.Contributors.UserReputation": 1108, "Samplestack.Contributors.UserLocation": "London, Canada", "Samplestack.Contributors.UserURI": "/com.marklogic.samplestack.domain.Contributor/sou1730908.json" } , ... ]
Hrm… what if I want those column names to be the key? That is, without the schema and table prefix. I don’t really want my client to keep handling the schema and table names with each request.
Using Optic API:
'use strict' const op = require('/MarkLogic/optic'); op.fromView("samplestack", "contributors", "") .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "contributors","") =>op:limit(10) =>op:result()
Resulting in:
In JSON:
[ { "ContributorUserName": "souser1730908@example.com", "UserReputation": 1108, "UserLocation": "London, Canada", "UserURI": "/com.marklogic.samplestack.domain.Contributor/sou1730908.json" } , ... ], ... ]
Now that we have some basic syntax going, let’s explore the data some more.
SELECT contributorusername, userreputation FROM samplestack.contributors WHERE userlocation = 'London' LIMIT 10
const op = require('/MarkLogic/optic');xquery version "1.0-ml"; op.fromView("samplestack", "contributors", "") .where(op.eq(op.col('userlocation'), 'London')) .select((op.col('contributorusername'), 'userreputation')) .limit(10) .result()
import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "contributors", "") =>op:where(op:eq(op:col('userlocation'), 'London')) =>op:select((op:col('contributorusername'), 'userreputation')) =>op:limit(10) =>op:result()
The sample result would be:
Some things that are worth noting above:
SELECT * FROM samplestack.contributors WHERE userlocation = 'London' AND userreputation <= 1000 OR userlocation = 'London, Canada' AND userreputation >= 1000 LIMIT 10
const op = require('/MarkLogic/optic'); op.from-view("samplestack", "contributors", "") .where( op.or([ op.and([ op.eq(op.col('userlocation'), 'London') , op.le(op.col('userreputation'), 1000) ]) , op.and([ op.eq(op.col('userlocation'), 'London, Canada') , op.ge(op.col('userreputation'), 1000) ]) ]) ) .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "contributors", "") =>op:where( op:or(( op:and(( op:eq(op:col('userlocation'), 'London') , op:le(op:col('userreputation'), 1000) )) , op:and(( op:eq(op:col('userlocation'), 'London, Canada') , op:ge(op:col('userreputation'), 1000) )) )) ) =>op:limit(10) =>op:result()
The sample result would be:
Now that we have a good grasp of our data, let’s try something a little bit more challenging. Let’s try to look for the top ‘asker’ per month.
SELECT asker, month, count(1) as qcount FROM (SELECT asker, year(qaskdate)|| '-' || right('0'||month(qaskdate), 2) as month FROM samplestack.qna) GROUP BY asker, month ORDER BY qcount desc LIMIT 10
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "qna") .groupBy( [ op.col('asker') , /* * NOTE: we do not need to do a separate ‘require’ * supported fn, math and cts functions are part of the sjs library module. */ op.as('month', op.fn.formatDateTime(op.col('qaskdate'), '[Y0001]-[M01]')) ] , op.count('qcount') ) .orderBy(op.desc('qcount')) .limit(10) .result()
xquery version "1.0-ml"; (: The base module required to use Optic API :) import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; (: The module required for functions, e.g. fn:concat, fn:format-dateTime, etc. :) import module namespace ofn="http://marklogic.com/optic/expression/fn" at "/MarkLogic/optic/optic-fn.xqy"; (: The module required for type-casting values, e.g. xs:long, xs:dateTime :) import module namespace oxs="http://marklogic.com/optic/expression/xs" at "/MarkLogic/optic/optic-xs.xqy"; (: : A similar set of libraries is available for math, sem and other namespaces : Please check your <install>/Modules/MarkLogic/optic/ folder for a complete list. :) op:from-view("samplestack", "qna") =>op:group-by( ( op:col('asker') , op:as('month', ofn:format-dateTime(op:col('qaskdate'), '[Y0001]-[M01]')) ) , op:count('qcount') ) =>op:order-by(op:desc('qcount')) =>op:limit(10) =>op:result()
The above query makes use of supported functions in MarkLogic. XQuery does not support child modules, so you need to do multiple imports. SJS, on the other hand, embeds these functions as part of the Optic library module. These functions can be referenced in a similar fashion, for example fn.formatDateTime
is referenced as op.fn.formatDateTime
.
Note: Do NOT try to use the XQuery modules on SJS. It will not function well.
The list of supported SQL functions is available here while the list of supported Optic API functions is available here.
version "1.0-ml"; (: The base module required to use Optic API :) import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; (: The module required for functions, e.g. fn:concat, fn:format-dateTime, etc. :) import module namespace ofn="http://marklogic.com/optic/expression/fn" at "/MarkLogic/optic/optic-fn.xqy"; (: The module required for type-casting values, e.g. xs:long, xs:dateTime :) import module namespace oxs="http://marklogic.com/optic/expression/xs" at "/MarkLogic/optic/optic-xs.xqy";
Let’s limit this to those having more than 10 questions.
SELECT asker, month, count(1) as qcount FROM ( SELECT asker, year(qaskdate)|| '-' || right('0'||month(qaskdate), 2) as month FROM samplestack.qna ) GROUP BY asker, month HAVING qcount >= 10 ORDER BY qcount DESC LIMIT 10
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "qna") .groupBy( [ op.col('asker') , op.as('month', op.fn.formatDateTime(op.col('qaskdate'), '[Y0001]-[M01]')) ] , op.count('qcount') ) .where(op.ge(op.col('qcount'), 10)) .orderBy(op.desc('qcount')) .limit(10) .result()
op:from-view("samplestack", "qna") =>op:group-by( ( op:col('asker') , op:as('month', ofn:format-dateTime(op:col('qaskdate'), '[Y0001]-[M01]')) ) , op:count('qcount') ) =>op:where(op:ge(op:col('qcount'), 10)) =>op:order-by(op:desc('qcount')) =>op:limit(10) =>op:result()
It is important to note that the ‘where’ clause appears after the ‘group by’ clause. The earlier clause (modify plan) will dictate the data (row set) available to the succeeding clause, i.e.:
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "qna") .groupBy( [ op.col('asker') , op.as('month', op.fn.formatDateTime(op.col('qaskdate'), '[Y0001]-[M01]')) ] , op.count('qcount') ) .where(op.and( op.ge(op.col('qcount'), 10) , //(. THIS NEXT LINE SHOULDN'T GO HERE .) op.eq(op.fn.yearFromDateTime(op.col('qaskdate')), 2012) )) .orderBy(op.desc('qcount')) .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "qna") =>op:group-by( ( op:col('asker') , op:as('month', ofn:format-dateTime(op:col('qaskdate'), '[Y0001]-[M01]')) ) , op:count('qcount') ) =>op:where(op:and(( op:ge(op:col('qcount'), 10) , (: THIS NEXT LINE SHOULDN'T GO HERE :) op:eq(ofn:year-from-dateTime(op:col('qaskdate')), 2012) ))) =>op:order-by(op:desc('qcount')) =>op:limit(10) =>op:result()
The above will not work since ‘qaskdate’ is no longer part of the rowset specified by the ‘group-by’ modifier. To limit the result to a particular constraint, the query would have to be adjusted as follows:
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "qna") //(. THIS NEXT LINE SHOULD GO HERE INSTEAD .) .where(op.eq(op.fn.yearFromDateTime(op.col('qaskdate')), 2012)) .groupBy( [ op.col('asker') , op.as('month', op.fn.formatDateTime(op.col('qaskdate'), '[Y0001]-[M01]')) ] , op.count('qcount') ) .where(op.ge(op.col('qcount'), 10)) .orderBy(op.desc('qcount')) .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "qna") (: THIS NEXT LINE SHOULD GO HERE INSTEAD :) =>op:where(op:eq(ofn:year-from-dateTime(op:col('qaskdate')), 2012)) =>op:group-by( ( op:col('asker') , op:as('month', ofn:format-dateTime(op:col('qaskdate'), '[Y0001]-[M01]')) ) , op:count('qcount') ) =>op:where(op:ge(op:col('qcount'), 10)) =>op:order-by(op:desc('qcount')) =>op:limit(10) =>op:result()
All the columns of ‘samplestack.qna’ are available to the initial ‘where’ plan modifier and the succeeding ‘group-by’ plan modifier.
But how do I now limit the count to users that come from ‘London’?
SELECT asker, month, count(1) as qcount FROM ( SELECT asker, year(qaskdate)|| '-' || right('0'||month(qaskdate), 2) as month FROM samplestack.qna WHERE asker in ( SELECT contributorusername FROM samplestack.contributors WHERE userlocation='London' ) ) GROUP BY asker, month ORDER BY qcount desc LIMIT 10
Unfortunately, such queries cannot be directly ported to Optic API. What we can do is to adjust the approach.
SELECT asker, month, count(1) as qcount FROM ( SELECT asker, year(qaskdate)|| '-' || right('0'||month(qaskdate), 2) as month FROM samplestack.qna LEFT JOIN samplestack.contributors ON contributors.contributorusername = qna.asker WHERE userlocation='London' ) GROUP BY asker, month ORDER BY qcount desc LIMIT 10
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "qna") .joinInner(op.fromView("samplestack", "contributors"), null, op.eq(op.col('contributorusername'), op.col('asker'))) .where(op.eq(op.col('userlocation'), 'London')) .groupBy( [ op.col('asker') , op.as('month', op.fn.formatDateTime(op.col('qaskdate'), '[Y0001]-[M01]')) ] , op.count('qcount') ) .orderBy(op.desc('qcount')) .limit(10) .result()
version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "qna") =>op:join-inner(op:from-view("samplestack", "contributors"), (), op:eq(op:col('contributorusername'), op:col('asker'))) =>op:where(op:eq(op:col('userlocation'), 'London')) =>op:group-by( ( op:col('asker') , op:as('month', ofn:format-dateTime(op:col('qaskdate'), '[Y0001]-[M01]')) ) , op:count('qcount') ) =>op:order-by(op:desc('qcount')) =>op:limit(10) =>op:result()
One major limitation of SQL is the inability to search for a particular word across all fields. Optic API supports use of ‘cts:query’ objects to be used in the ‘where’ modifier:
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "contributors") .where(cts.wordQuery('london')) .limit(10) .result()
version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "contributors") =>op:where(cts:word-query('london')) =>op:limit(10) =>op:result()
It is important to note, though, that this word search will apply to the entire document and not just the fields involved in the view.
You can also sort the results using `score` to take advantage of MarkLogic’s search engine capabilities:
'use strict'; const op = require('/MarkLogic/optic'); let docSearch = op.fromSearch(cts.jsonPropertyWordQuery('displayName', ['adam', 'alan'])) let viewSearch = op.fromView('Samplestack', 'Contributors', null, op.fragmentIdCol("$$sourceId")) docSearch .joinInner(viewSearch, op.on('fragmentId', '$$sourceId')) .orderBy(op.desc('score')) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; let $doc-search := op:from-search(cts:json-property-word-query('displayName', ('adam', 'alan'))) let $view-search := op:from-view('Samplestack', 'Contributors', (), op:fragment-id-col("$$sourceId")) return $doc-search => op:join-inner($view-search, op:on('fragmentId', '$$sourceId')) => op:order-by(op:desc('score')) => op:result()
This particular example allows us to retrieve the `contributor` row for a user whose displayName contains the word ‘adam’ or ‘alan’. Note that this displayName property is not part of the view, but part of the source document. Notice that the result now includes a `score` column:
Information about the source document can also be included as part each row:
const op = require('/MarkLogic/optic'); op.fromView("samplestack", "contributors", "", op.fragmentIdCol("$$sourceId")) .joinDocUri('source_uri', op.fragmentIdCol('$$sourceId')) .joinDoc('source_doc', op.fragmentIdCol('$$sourceId')) .where(op.eq(true, op.fn.matches(op.xpath('source_doc', '//aboutMe'),'man'))) .select([ 'source_uri' , 'source_doc' , op.as('displayName', op.xpath('source_doc', '//displayName')) , op.as('aboutMe', op.xpath('source_doc', '//aboutMe')) ]) .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-view("samplestack", "contributors", "", op:fragment-id-col("$$sourceId")) =>op:join-doc-uri('source_uri', op:fragment-id-col('$$sourceId')) =>op:join-doc('source_doc', op:fragment-id-col('$$sourceId')) =>op:where(op:eq(true(), ofn:matches(op:xpath('source_doc', '//aboutMe'),'man'))) =>op:select(( 'source_uri' , 'source_doc' , op:as('displayName', op:xpath('source_doc', '//displayName')) , op:as('aboutMe', op:xpath('source_doc', '//aboutMe')) )) =>op:limit(10) =>op:result()
`$$sourceId` is just a column name to reference the fragmentId of each row found. This fragmentId can then be referenced and used as a column to retrieve the source fragment. Additionally, this fragmentId can also be used to join the active row with other rows coming from the same fragment. By default, this means document. So if a single document results in multiple rows, e.g. `answers` in our current test data, then those rows are likely to return the same fragmentId. More information about fragments is available in the Fragments chapter of the Administrator’s Guide.
Optic API is not limited to views. We can also make use of existing indexes. Referencing the data we have in this article, we can now produce the same result using with and without the Optic API.
XQuery without Optic API:
let result = []; let tuples = cts.valueTuples( [ cts.elementReference(xs.QName('age')) , cts.elementReference(xs.QName('gender')) ], ["item-frequency", "frequency-order"] ) for (let tuple of tuples){ let query = cts.andQuery([ cts.elementValueQuery( xs.QName('age') , xs.string(tuple[0]) ) , cts.elementValueQuery( xs.QName('gender') , tuple[1] ) ]) let aggregate = cts.avgAggregate( cts.elementReference(xs.QName( 'height')) , null, query ) result.push(fn.stringJoin([ xs.string(tuple[0]), tuple[1], aggregate != null ? xs.string(aggregate) : null ], '-')) } result
xquery version "1.0-ml"; for $tuple in cts:value-tuples( ( cts:element-reference(xs:QName('age')) , cts:element-reference(xs:QName('gender')) ), ("item-frequency", "frequency-order")) let $query := cts:and-query(( cts:element-value-query( xs:QName('age') , xs:string($tuple[1]) ) , cts:element-value-query( xs:QName('gender') , xs:string($tuple[2]) ) )) return $tuple[1] || " - " || $tuple[2] || " - " || cts:avg-aggregate( cts:element-reference(xs:QName( 'height')) , (), $query )
Using Optic API:
const op = require('/MarkLogic/optic'); op.fromLexicons( { "age": cts.elementReference(xs.QName("age")), "gender": cts.elementReference(xs.QName("gender")), "height": cts.elementReference(xs.QName("height")) } , "person" ) .groupBy(["age", "gender"], op.avg('averageHeight', op.col('height'))) .limit(10) .result()
xquery version "1.0-ml"; import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy"; op:from-lexicons( map:new(( map:entry("age", cts:element-reference(xs:QName("age"))) , map:entry("gender", cts:element-reference(xs:QName("gender"))) , map:entry("height", cts:element-reference(xs:QName("height"))) )) , "person" ) =>op:group-by(("age", "gender"), op:avg('averageHeight', op:col('height'))) =>op:limit(10) =>op:result()
Your final results should look something like this:
By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.