Blog(RSS)

Understanding map:map operators, aggregates and use cases

by Gary Vidal

In this article I am going to go into a comprehensive discussion on maps, operators and show even more functionality exposed through maps. In a previous post Returning Lexicon Values using XPath Expressions, I alluded to the fact that map:map supports operators and used it to compute the difference of two maps to filtering results for processing. I want to provide a more in-depth discussion into how maps work and then delve deeper into powerful features provided by maps.

Introduction to Maps

To begin, let's formally define some basic constructs for how a map:map works. Maps are in-memory key/value structures, introduced to MarkLogic in version 5. Out of the box, maps provide the ability to perform fast in-memory inserts/updates/lookups/existence checks. Maps are also mutable structures, so you can change them without creating copies like you would changing XML Node types. This allows all operations to execute very efficiently and with side-effects, not common in functional programming languages like XQuery.

The basic operations you can perform with maps are well documented on the MarkLogic map functions page.

Map Operations

  • map:map - Creates a new map or creates a map with data from an xml serialization of a map:map.
  • map:put - Puts a value by key into a map
  • map:get - Gets a value from a map by key
  • map:keys - Returns all the keys present in a map.
  • map:remove - Removes a value by key from a map
  • map:count - Returns the count of the keys in the map
  • map:clear - Clears the map of all key/values

Introduced in MarkLogic 7

  • map:new - Creates a new map:map, but accepts a sequence of existing or map:entry(k,v). This is a very composable and convenient way to join multiple maps together.
  • map:entry - Create map:map with a single key/value structure.

Lexicon Support for Maps.

Maps are also supported as output for many lexicon based functions including

  • Scalar lexicon functions (cts:element-*-values,cts:values) - Returns a map where the key and the value are the same.
  • value-co-occurrence functions (cts:element-*-value-co-occurrences, cts:value-co-occurences) - Returns a map where the key is equal to the first tuple and the value is a sequence of the second tuple.

Maps by Example

Let's now walk through various examples of using maps, to get a better understanding of how and why to use them. The first example sticks a series of different value types inside a map, then walks the keys to describe each value.

xquery version "1.0-ml";
let $map := map:map()
let $puts := (
  map:put($map, "a", "a"),
  map:put($map, "b", <node>Some node</node>),
  map:put($map, "c", (1,2,3,4,5)),
  map:put($map, "d", function() {"Hello World"})
)
for $key in map:keys($map)
return
  fn:concat("Key:", $key, " is ", xdmp:describe(map:get($map, $key)))

Executed in Query Console, this returns

Key:c is (1, 2, 3, ...)
Key:b is <node>Some node</node>
Key:d is function() as item()*
Key:a is "a"

As you can see from the example, a map can flexibly store values, nodes and even functions.

Passing Maps By Reference

Another feature of maps is they can be passed around by reference. This allows sharing information between different modules/transactions and maintain a single instance across them. In the example below, We are going to take attendance by allowing multiple spawned functions to add entries to a global map across seperate transactions. In the final function we will check if a value is present and answering accordingly.

xquery version "1.0-ml";
let $map := map:map()
let $foo := xdmp:spawn-function(function() {
  map:put($map, "foo", "Foo is here")
})
let $bar := xdmp:spawn-function(function() {
  map:put($map,"bar", "Bar is hear(yawn)")
})
let $baz := xdmp:spawn-function(function() {
  if(map:contains($map, "bar")) then 
    map:put($map, "baz", "Baz is here, only if bar is here.")
  else map:put($map, "baz", "Baz is here, but why is bar always late")
})
return
  $map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="bar">
    <map:value xsi:type="xs:string">Bar is hear(yawn)</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, only if bar is here.</map:value>
  </map:entry>
</map:map>

But be aware of the fact that each spawn-function call is "non-blocking" for the return, so it could return before all results that come back. In the next example, we will have the "bar" function sleep for 1s before it executes its map:put.

...
let $bar := xdmp:spawn-function(function() {
  xdmp:sleep(1000),
  map:put($map,"bar", "I am lazy bar")
})
...
return
  $map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, but why is bar always late</map:value>
  </map:entry>
</map:map>

(As you can see "baz" is pretty upset bar is not present.

To provide "blocking" you can pass result=true option to xdmp:spawn-function or use xdmp:invoke-function in its stead.)

Maps and JSON

Maps are also directly serializable to json using xdmp:to-json. In fact, map:map and json:object in MarkLogic are something like cousins, as they can be used interchangeably support identity/casting between types. A fundamental difference is that json:object maintains key order, but map:map does not. So in cases where you care about the ordering of the keys, you can use a json:object and all puts will preserve the order. As you can see from the example below, you can compose a json:object and use map functions to populate it with data and render the output to json. The following object composes the same json structure using map:map and json:object:

let $json-object := map:map()
let $puts := (
  map:put($json-object, "name", "Gary Vidal"),
  map:put($json-object, "age", 40),
  map:put($json-object, "birthdate", xs:date("1974-09-09"))
)
return
  xdmp:to-json($json-object)

Returns:
{"birthdate":"1974-09-09", "name":"Gary Vidal", "age":40}
let $json-object := json:object()
...
return
  xdmp:to-json($json-object)

Returns:
{"name":"Gary Vidal", "age":40, "birthdate":"1974-09-09"}

In the example above, the order is preserved using json:object, where the map:map is not.

Map Operators

Map operators were formally introduced in MarkLogic 7, but have been available since MarkLogic 5. The documentation for map operators can be found at Map Operators.

Operator Description
+ Computes the union (distinct) of two maps, such as (ex. $map1 + $map2).
- means the difference of two maps (think of it as set difference) (ex $map1 - $map2). This operator also works as an unary operator. So, -B has the keys and values reversed (-$map1)
* means intersection (ex. $map1 * $map2) where only the keys present in both maps are returned.
div means inference. So A div B would consists of keys from the map A, and values from the map B, where A's value is equal to B's key or simply a join.
mod (ex. $map1 mod $map2) is equivalent to -A div B

Now that you have a basic understanding of the operators let's apply them to some examples.

Union (Distinct) ($map + $map)

xquery version "1.0-ml";
let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b")
))
let $map2 := map:new((
  map:entry("a", "b"),
  map:entry("b", "b"),
  map:entry("c", "c")
))
return
  $map1 + $map2

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="b">
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
  <map:entry key="c">
    <map:value xsi:type="xs:string">c</map:value>
  </map:entry>
  <map:entry key="a">
    <map:value xsi:type="xs:string">a</map:value>
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
</map:map>

As you can see all keys from $map1 and $map2 are combined and only the distinct values are returned. It's important to understand this distinction, because if you are counting the values after you union, you will get the distinct union's count not a merge of the 2 maps where duplicate values are repeated.

Difference ($map - $map)

In the example below we want to compute the difference between two maps.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
  $map1 - $map2

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="e">
    <map:value xsi:type="xs:string">e</map:value>
  </map:entry>
</map:map>

Wait! Why did it return only the entry for key:'e'? This is due to the ordering of the difference, that only computes the difference of keys in $map1 not present in $map2. So to compute all differences, you must do a bit more math to solve, but the answer is quite simple.

  ($map1 - $map2) + ($map2 - $map1)

Returns keys (cde)

Inversion (-$map)

Inversion of a map is quite simple as you are simple inverting the map:map, so each value becomes a key and every key becomes a value. Since all keys are strings, you will lose type if your values are non-string types. The string function will be computed for all non-string values during inversion.

xquery version "1.0-ml";
let $map := map:new((
  map:entry("a", 1),
  map:entry("b", ("v1", "v2")),
  map:entry("c", function() {"Hello World"}),
  map:entry("d", <node>Some node</node>)
))
return -$map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
<map:entry key="v2">
<map:value xsi:type="xs:string">b</map:value>
</map:entry>
<map:entry key="function() as item()*">
<map:value xsi:type="xs:string">c</map:value>
</map:entry>
<map:entry key="&lt;node&gt;Some node&lt;/node&gt;">
<map:value xsi:type="xs:string">d</map:value>
</map:entry>
<map:entry key="1">
<map:value xsi:type="xs:string">a</map:value>
</map:entry>
<map:entry key="v1">
<map:value xsi:type="xs:string">b</map:value>
</map:entry>
</map:map>

Intersects Operator ($map * $map)

In the following example, we are assuming we have key/values present in both maps and only want those key/values that intersect.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
   $map1 * $map2

Returns keys (a, b)

It is important to note that the intersects operation is computed on key and value, so in cases where both maps share the same key, but not the same value for that key, then the keys do not intersect.

Inference/Join Operator ($map1 div $map2)

For inferencing/join we will focus on a more practical example of joining students names to test scores. In the example below each user is assigned an id noted as the value of the map:entry. Another map stores the id and all the scores for each test. You can see the scores are now joined directly to the name via its id value.

For real-world use case, this could easily be stored in MarkLogic as xml/json fragments, with range indexes enabled for (name, id, score). This is where cts:value-co-occurrences should be used to return map as output for (name, id) and (id, score).


let $map1 := map:new((
   map:entry("jenny", "a1"),
   map:entry("bob"  , "b1"),
   map:entry("tom"  , "c1"),
   map:entry("rick" , "d1")
))

let $map2 := map:new((
  map:entry("a1", (90,95,100,88)),
  map:entry("b1", (77,68,82,60)),
  map:entry("c1", (0,0,85,89))
))
return
 $map1 div $map2

Returns

 <map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">90</map:value>
    <map:value xsi:type="xs:integer">95</map:value>
    <map:value xsi:type="xs:integer">100</map:value>
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="tom">
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">85</map:value>
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="bob">
    <map:value xsi:type="xs:integer">77</map:value>
    <map:value xsi:type="xs:integer">68</map:value>
    <map:value xsi:type="xs:integer">82</map:value>
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

mod Operators ($map1 mod $map2)

I am going to skip the example for the mod operator since it equates to (-$map1 div $map2).

Combining Techniques to Solve Complex Issues

So we learned a little bit about all the operators, now lets start to combine them to solve more complicated problems.

Diffing Data

One problem I have encountered very often is determining the difference between 2 structures. Let's consider 2 documents that have similar structures, but have differences in ordering or values. We want to create a diff-gram that determines what inserts or updates need to occur between 2 documents. Using the difference and intersects operators we can compose a complete diff-gram that runs quite efficiently. In the example below, we take 2 nodes and then iterate each node and put each path inside a map:map, then compute the difference using map operators.

xquery version "1.0-ml";
let $version1 := 
  <node>
    <last-modifed>2001-01-01</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Bob Franco</author>
    <author>Billy Bob Thornton</author>
    <added>I am added</added>  
  </node>
let $version2 := 
  <node>
    <last-modifed>2001-01-12</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Billy Bob Thornton</author>
    <author>James Franco</author>
    <added1>I am added1</added1>
  </node>
let $version1-map := map:map()
let $version2-map := map:map()
let $_ := ( 
  (:Map values paths to maps:)
  $version1/element() ! map:put($version1-map, xdmp:path(.), fn:data(.)),
  $version2/element() ! map:put($version2-map, xdmp:path(.), fn:data(.))
)
let $same    := $version1-map * $version2-map
let $inserts := $version1-map - $version2-map
let $deletes := $version2-map - $version1-map
return 
  <diff>{(
    map:keys($same)    ! <same path="{.}">{map:get($same,.)}</same>,
    map:keys($deletes) ! <delete path="{.}">{map:get($deletes,.)}</delete>,
    map:keys($inserts) ! <insert path="{.}">{map:get($inserts,.)}</insert>
  )}</diff>

Returns

<diff>
  <same path="/node/subtitle">Same ole title</same>
  <same path="/node/title">Here is a title</same>
  <delete path="/node/last-modifed">2001-01-12</delete>
  <delete path="/node/added1">I am added1</delete>
  <delete path="/node/author[2]">James Franco</delete>
  <delete path="/node/author[1]">Billy Bob Thornton</delete>
  <insert path="/node/last-modifed">2001-01-01</insert>
  <insert path="/node/added">I am added</insert>
  <insert path="/node/author[2]">Billy Bob Thornton</insert>
  <insert path="/node/author[1]">Bob Franco</insert>
</diff>

Creating Fast 2-Way Lookup Tables

When working with Excel (2007+) in MarkLogic, you often will need to convert between R1C1 and the index value of a column and vice-versa. To calculate the column name over 255 columns over every row can be expensive, so computing this lookup table only once can drastically improve the performance of an application. In the example below, you will note that I only compute the table once and use the inversion (-) operator to create the reverse direction.

xquery version "1.0-ml";
declare variable $ALPHA-INDEX-MAP := 
  let $map := map:map()
  let $alpha := ("", (65 to 65+25 ) ! fn:codepoints-to-string(.))
  let $calcs := 
    for $c1 in $alpha
    for $c2 in $alpha
    for $c3 in $alpha[2 to fn:last()]
    where $c1 = "" or fn:not($c2 = "")
    return 
      ($c1 || $c2|| $c3)
  let $_ := for $col at $pos in $calcs return map:put($map, $col, $pos)
  return
    $map
;  
declare variable $INDEX-ALPHA-MAP := -$ALPHA-INDEX-MAP;

(: Which index corresponds to ZA? :)
map:get($ALPHA-INDEX-MAP, "ZA"),
(: Which alpha corresponds to 32? :)
map:get($INDEX-ALPHA-MAP, "32")

Aggregating map:map data

Starting with MarkLogic 7, maps now support aggregate functions such as min/max/sum/avg. To perform aggregates, we will use built-in functions corresponding to the aggregate we want to apply and pass the map:map as the argument. Let's look at our student scores example and assume that we are getting values from MarkLogic lexicon functions like cts:value-co-occurrences, but we will illustrate examples with sample code below:

let $student-scores := map:new((
  map:entry("jenny", (90,95,100,88)),
  map:entry("bobby", (77,68,82,60)),
  map:entry("rick",  (0,0,85,89))
))
return
  $student-scores

Now we want to compute the average score per student

 fn:avg($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:decimal">43.5</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:decimal">93.25</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:decimal">71.75</map:value>
  </map:entry>
</map:map>

Now lets compute the max score per student.

fn:max($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">100</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">82</map:value>
  </map:entry>
</map:map>

The same can be done for min

fn:min($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">0</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

... And Sum

fn:sum($student-scores)
<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">174</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">373</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">287</map:value>
  </map:entry>
</map:map>

And finally count?

fn:count($student-scores)
returns
1

Well that was not expected ... or was it? The count function is ambiguous in what should it count. Should it count the values of the map by key or the map itself? A simple solution is as below and satisfies our need to count the number of tests (although not as efficient as using an aggregate function):

 map:new(
   map:keys($student-scores) ! map:entry(., fn:count(map:get($student-scores, .)))
 )

Returns

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
</map:map>

I hope this helps you understand the potential power of using map:map operators and aggregates. In the future, I will be writing a series of articles on harvesting the power of maps.

Stay Tuned ...

Returning Lexicon Values using XPath Expressions

by Gary Vidal

I am often asked, how can I evaluate an XPath expression to return a lexicon of values, such as cts:uris, without pulling each document from disk?. Often this arises from scenarios for bulk processing documents using tools like corb or MarkLogic's Task-Server to spawn processing across your cluster. When performing bulk operations, you need to ensure you can process documents that meet or do not meet a specific condition. Additionally, you must ensure that if the processing fails you can continue where you left off, without reprocessing all documents.

For this article we will focus on a specific problem: how do I find the URIs of documents that do not have a deeply nested structure? In general, you will find this problem not easily solvable using pure cts:query constructs ... till now.

Consider the following code for 2 documents having similar nested structures, but the second is missing the /p:parent/p:outer/p:last element.

declare namespace p = "p";
let $doc1 :=
  <p:parent>
    <p:outer>
      <p:first>pf1</p:first>
      <p:last>pl1</p:last>
    </p:outer>
    <p:child>
      <p:inner>
        <p:first>cf1</p:first>
        <p:last>cl1</p:last>
      </p:inner>
    </p:child>
  </p:parent>

let $doc2 :=
  <p:parent>
    <p:outer>
      <p:first>pf2</p:first>
    </p:outer>
    <p:child>
      <p:inner>
        <p:first>cf2</p:first>
        <p:last>cl2</p:last>
      </p:inner>
    </p:child>
 </p:parent>

return (
  xdmp:document-insert("doc1",$doc1),
  xdmp:document-insert("doc2",$doc2)
)

Determining which uris have this structure is very simple using XPath Expression as below:

doc()[p:parent/p:outer/p:last]/xdmp:node-uri(.)
[Returns]
doc1

The problem with this approach is that the XPath expression runs "filtered", which requires fragments to be pulled from disk to return the uri. While this works for a small database with a few thousand records, at some point you will hit the dreaded "XDMP-EXPNTREECACHEFULL" error. In essence, this means you have tried to return more documents than would fit into memory for the transaction. So putting on your MarkLogic black belt, you construct a complex cts:query using nested cts:element-query's to simulate a path structure such as:

cts:uris((),(),
  cts:element-query(xs:QName("p:parent"),
    cts:element-query(xs:QName("p:outer"),
      cts:element-query(xs:QName("p:last"), cts:and-query(()))
  ))
)

WOW that is complicated and also incorrect, as it returns both doc1 and doc2. So why did this happen? Well, the short answer is that MarkLogic resolves the cts:query "unfiltered", relying on indexes in memory and not the fragments themselves. To resolve this correctly the index must determine that p:parent is a parent element of p:outer who has a child element of p:last. Sure, you could try to tinker with positions and proximity, but even then may not yield the correct result. So how come we can do this in XPath, but not perform the same thing using cts:query? To answer this question, we will look deeper into a handy function called xdmp:plan. From the documentation the xdmp:plan function states the following:

xdmp:plan(
   $expression as item()*,
   [$maximum as xs:double?]
) as element()

Returns an XML element recording information about how the given expression will be processed by the index. The information is a structured representation of the information provided in the error log when query trace is enabled. The query will be processed up to the point of getting an estimate of the number of fragments returned by the index.

So let's dig a bit deeper into what is inside the plan by wrapping our XPath Expression with the xdmp:plan function.

xdmp:plan(/p:parent/p:outer/p:last)

The output is an XML Fragment with the following information:

<qry:query-plan xmlns:qry="http://marklogic.com/cts/query">
  <qry:info-trace>xdmp:eval("declare namespace p = &amp;quot;p&amp;quot;;&amp;#10;xdmp:plan(/p:parent/p:o...", (), &lt;options xmlns="xdmp:eval"&gt;&lt;database&gt;14817900035712326498&lt;/database&gt;&lt;root&gt;c:\users\gvidal\w...&lt;/options&gt;)</qry:info-trace>
  <qry:info-trace>Analyzing path: fn:collection()/p:parent/p:outer/p:last</qry:info-trace>
  <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
  <qry:info-trace>Step 2 is searchable: p:parent</qry:info-trace>
  <qry:info-trace>Step 3 is searchable: p:outer</qry:info-trace>
  <qry:info-trace>Step 4 is searchable: p:last</qry:info-trace>
  <qry:info-trace>Path is fully searchable.</qry:info-trace>
  <qry:info-trace>Gathering constraints.</qry:info-trace>
  <qry:info-trace>Executing search.</qry:info-trace>
  <qry:final-plan>
    <qry:and-query>
      <qry:term-query weight="0">
  <qry:key>4523426088818201359</qry:key>
  <qry:annotation>descendant(doc-root(element(p:parent),doc-kind(document)) )</qry:annotation>
      </qry:term-query>
      <qry:term-query weight="0">
  <qry:key>11698328636857559070</qry:key>
  <qry:annotation>descendant(element-child(p:parent/p:outer))</qry:annotation>
      </qry:term-query>
      <qry:term-query weight="0">
  <qry:key>17573168699309579415</qry:key>
  <qry:annotation>element-child(p:outer/p:last)</qry:annotation>
      </qry:term-query>
    </qry:and-query>
  </qry:final-plan>
  <qry:info-trace>Selected 1 fragment</qry:info-trace>
  <qry:result estimate="1"/>
</qry:query-plan>

Now if you notice from the output above the query is full resolvable from indexes denoted by the following lines:

<qry:info-trace>Analyzing path: fn:collection()/p:parent/p:outer/p:last</qry:info-trace>
  <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
  <qry:info-trace>Step 2 is searchable: p:parent</qry:info-trace>
  <qry:info-trace>Step 3 is searchable: p:outer</qry:info-trace>
  <qry:info-trace>Step 4 is searchable: p:last</qry:info-trace>
  <qry:info-trace>Path is fully searchable.</qry:info-trace>
  <qry:info-trace>Gathering constraints.</qry:info-trace>
  <qry:info-trace>Executing search.</qry:info-trace>

Well this is all well and good but how does this resolve my issue? The simple answer is that it doesn't. But what is returned after does. Once each step in the plan is resolvable, then the end result is the query plan itself. Now if you notice from the excerpt below, the qry:final-plan expresses a series of qry:term-query elements that define a qry:key.

<qry:final-plan>
    <qry:and-query>
      <qry:term-query weight="0">
        <qry:key>4523426088818201359</qry:key>
        <qry:annotation>descendant(doc-root(element(p:parent),doc-kind(document)) )</qry:annotation>
      </qry:term-query>
      <qry:term-query weight="0">
        <qry:key>11698328636857559070</qry:key>
        <qry:annotation>descendant(element-child(p:parent/p:outer))</qry:annotation>
      </qry:term-query>
      <qry:term-query weight="0">
        <qry:key>17573168699309579415</qry:key>
        <qry:annotation>element-child(p:outer/p:last)</qry:annotation>
      </qry:term-query>
    </qry:and-query>
  </qry:final-plan>

These keys actually resolve to term key's in the Universal Index. Terms within the Universal Index cover both words and structure. Each of the term-query's annotations describe what each key represents. You will notice the first key descendant(doc-root(element(p:parent), doc-kind(document)) ) represents the doc() axis to the p:parent element and the next key descendant(element-child(p:parent/p:outer)) represents the relationship between the p:parent and p:outer element, until you get the final key element-child(p:outer/p:last) which completes the path step between the p:outer and p:last elements.

Okay, this is getting more interesting, but we still have not seen how to resolve the problem. So now we are going to go into undocumented territory and hack the plan FTW.

A little known feature outside of MarkLogic's walls is a function called cts:term-query(xs:unsignedLong), which resolves a query based on a term key. Now if we take the keys from the plan above, we can craft a cts:query to combine all of those term keys into a single composable query. Since the results of the plan are xml this is as simple as the following statement:

cts:uris((),(),
  cts:and-query(
    xdmp:plan(/p:parent/p:outer/p:last)//*:key/cts:term-query(.)
  )
)
[returns]
doc1

Whoa!!!!. Is that for real? Indeed it is. So if that is true, what other things can we query using this method?

How about finding all uris for a given root element?

cts:uris((),(),
  cts:and-query(
    xdmp:plan(/p:parent)//*:key/cts:term-query(.)
  )
)
[Returns]
doc1
doc2

What about all binary documents?

cts:uris((),(),
  cts:and-query(
    xdmp:plan(/binary())//*:key/cts:term-query(.)
  )
)
[Returns all binary document uris]

What about all documents that don't have the /p:parent/p:outer/p:last path?

cts:uris((),(),
  cts:not-query(cts:and-query(
    xdmp:plan(/p:parent/p:outer/p:last)//*:key/cts:term-query(.)
  ))
)
[Returns]
All documents in database not 'doc1'

Why did this not work? We wanted all documents that had p:parent that did not have the p:outer/p:last element. The simple answer is by using a not-query you inverted the query to return all documents that did not resolve to each step in the plan,including all p:parent elements So head scratching how can we fix this?

I will get into another neat and unknown feature of a structure called map:map. Maps are mutable key/value structures, that perform extremely fast hash insert/lookup operations. The map:map structure has been available for quite some time (since MarkLogic 5) and most lexicon functions (cts:uris, cts:element-x-values) support maps as an alternative output to list sequences. But what is unknown about these structures is they support operators such as (+, -, *, div, mod) to mutate and combine maps together. Again, I am not giving due justice to this topic, but will revisit in upcoming blog posts.

So for the purposes of solving our original problem, we will use maps to compute the difference (map - map) of two cts:uris calls. Revisiting our original example, we wanted to return all p:parent documents who did not have the p:outer/p:last element. The solution is returned using the following code :

map:keys(
   cts:uris((),("map"), cts:element-query(xs:QName("p:parent"), cts:and-query(()))) 
   -  
   cts:uris((), ("map"),
      cts:and-query(xdmp:plan(/p:parent/p:outer/p:last)//qry:term-query/qry:key ! cts:term-query(.))
))
[Returns]
doc2

Which translates to:

cts:uris((), ("map"), cts:element-query(xs:QName("p:parent"),cts:and-query(())))

Return all uris that match (/p:parent) as a map:map

- (: Notice the minus sign :)
cts:uris((), ("map"),
   cts:and-query(xdmp:plan(/p:parent/p:outer/p:last)//qry:term-query/qry:key ! cts:term-query(.))

Return the difference (-) of all uris that have match (/p:parent/p:outer/p:last) as a map:map

map:keys($map1 - $map2)

The outer map:keys flattens the map back to a sequence of uri values.

Well that was quite a lot to digest and I am exposing quite a bit of juju and dark magic, but you can see that this provides a powerful tool in your arsenal of using MarkLogic in ways never possible before. Good Luck and Happy Coding.

DISCLAIMER

(BTW. The techniques in this article, including the use of cts:term-query(), may or not be sanctioned by MarkLogic and are subject to change in the product. So use at your own RISK!!!. But hey "No Risk No Reward".)

MarkLogic Version Manager

by Dave Cassel

If you do development work with different versions of MarkLogic, you've probably set up virtual machines. This has the advantage of complete separation among the different versions, but it can be a hassle. Matt Pileggi, part of MarkLogic's Vanguard team, put together the MarkLogic Version Manager. mlvm is an open source tool he uses to switch among versions of MarkLogic he has installed on his laptop, without using virtual machines. 

Matt was inspired to write mlvm after using the Node Version Manager, which solves the same problem for working with multiple versions of Node.js. Right now, mlvm is a Mac-only tool, but Matt would welcome contributors to help with this or other improvements

Matt's mlvm is a development tool that's off to a good start -- check it out on GitHub! 

blogroll Blogroll