Search by Root Element

Problem

You want to look for documents that have a particular root XML element or JSON property and combine that with other search criteria.

Solution

Applies to MarkLogic versions 7+

(: Return a query that finds docs with the specified root element :)
declare function local:query-root($qname as xs:QName) 
{
  let $ns := fn:namespace-uri-from-QName($qname)
  let $prefix := if ($ns eq "") then "" else "pre:"
  return
    xdmp:with-namespaces(
      map:new(map:entry("qry", "https://marklogic.com/cts/query")), 
      cts:term-query(
        xdmp:value(
          "xdmp:plan(/" || $prefix || fn:local-name-from-QName($qname) || ")",
          map:entry("pre", $ns)
        )/qry:final-plan//qry:term-query/qry:key
      )
  )
};

You can then call it like this:

cts:search(
  fn:doc(),
  cts:and-query((
    local:query-root(xs:QName("ml:base")),
    cts:collection-query("published")
  ))
)

Discussion

It’s easy to find all the documents that have a particular root element or property: use XPath (/ml:base). However, that limits the other search criteria you can use. For instance, you can’t combine a cts:collection-query with XPath. What we need is a way to express /ml:base as a cts:query.

The local:query-root function in the solution returns a cts:term-query that finds the target element as a root. We’re using a bit of trickery to get there (including the fact that cts:term-query is an undocumented function). Let’s dig in a bit deeper to see what’s happening.

We can ask MarkLogic how it will evaluate an XPath expression like this:

declare namespace ml = "https://marklogic.com";
xdmp:plan(/ml:base)

The result looks like this (note that if you run this, the identifiers will be different):

<qry:query-plan xmlns:qry="https://marklogic.com/cts/query">
  <qry:expr-trace>xdmp:eval("declare namespace ml = &quot;https://marklogic.com&quot;;&#10;xdm...", (), <options xmlns="xdmp:eval"><database>17588436587394393575</database>...</options>)</qry:expr-trace>
  <qry:info-trace>Analyzing path: fn:collection()/ml:base</qry:info-trace>
  <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
  <qry:info-trace>Step 2 is searchable: ml:base</qry:info-trace>
  <qry:info-trace>Path is fully searchable.</qry:info-trace>
  <qry:info-trace>Gathering constraints.</qry:info-trace>
  <qry:info-trace>Executing search.</qry:info-trace>
  <qry:final-plan>
    <qry:and-query>
      <qry:term-query weight="0">
        <qry:key>682925892541848129</qry:key>
        <qry:annotation>doc-root(element(ml:base),doc-kind(document))</qry:annotation>
      </qry:term-query>
    </qry:and-query>
  </qry:final-plan>
  <qry:info-trace>Selected 0 fragments</qry:info-trace>
  <qry:result estimate="0"/>
</qry:query-plan>

Looking at the term-query in the <final-plan> element, we get some visibility into the Universal Index — the index that stores terms and structure for every XML, JSON, and text document that we store in MarkLogic. This index records things like the words, XML element or JSON properties, parent/child relationships among elements and properties, and words that occur within specific elements or properties. Exactly what is recorded depends on the settings you have configured in your database. In each case, the word or structure is mapped to a key.

Take another look at the <final-plan> element — this is the query that MarkLogic will run. We can see that it’s using a term query and the annotation tells us what it means. A bit of XPath pulls out that key, which we then use to build a cts:query that we can combine with other queries.

declare namespace qry = "https://marklogic.com/cts/query";
declare namespace ml  = "https://marklogic.com";
xdmp:plan(/ml:base)/qry:final-plan//qry:term-query/qry:key

So why are we using xdmp:value? We can run xdmp:plan with an explicit XPath expression, but if we want to work with a dynamic path (provided at run-time), then we can’t build a string and pass it to xdmp:plan. However, we can build a string that includes the reference to xdmp:plan and then pass the whole thing to xdmp:value, which will evaluate it. xdmp:value
also accepts bindings, which allow us to use namespaces in the string we pass into xdmp:plan.

I used xdmp:with-namespaces so that the function can be self-contained. Without that, the code would require the qry namespace declaration at the top of the module where the local:query-root function lives.

One more interesting bit: notice $prefix as part of the string passed to xdmp:value. With a QName, there might be a prefix (if constructed with xs:QName or there might not be (if constructed with fn:QName or if the QName doesn’t use a namespace. To handle all these cases, the recipe assigns whatever namespace is present to the prefix “pre”. However, if the namespace URI is the empty string, then we skip the prefix in the XPath that we send to xdmp:plan.

That last complexity is there because the parameter to the function takes an xs:QName. The function could be written to take a string (like “/ml:base”) or a namespace and a localname. Requiring an xs:QName lets the caller build the QName using any of the available methods (xs:QName, fn:QName (note that this approach doesn’t create any prefix), but also limits what goes into xdmp:value. Keeping tight control over this is important to prevent code injection.

Written Tutorial

Problem

Solution

Discussion

Learn More

Stay on top of everything Marklogic.

This website uses cookies.