Understanding map:map operators, aggregates and use cases

by Gary Vidal

In this article I am going to go into a comprehensive discussion on maps, operators and show even more functionality exposed through maps. In a previous post Returning Lexicon Values using XPath Expressions, I alluded to the fact that map:map supports operators and used it to compute the difference of two maps to filtering results for processing. I want to provide a more in-depth discussion into how maps work and then delve deeper into powerful features provided by maps.

Introduction to Maps

To begin, let's formally define some basic constructs for how a map:map works. Maps are in-memory key/value structures, introduced to MarkLogic in version 5. Out of the box, maps provide the ability to perform fast in-memory inserts/updates/lookups/existence checks. Maps are also mutable structures, so you can change them without creating copies like you would changing XML Node types. This allows all operations to execute very efficiently and with side-effects, not common in functional programming languages like XQuery.

The basic operations you can perform with maps are well documented on the MarkLogic map functions page.

Map Operations

  • map:map - Creates a new map or creates a map with data from an xml serialization of a map:map.
  • map:put - Puts a value by key into a map
  • map:get - Gets a value from a map by key
  • map:keys - Returns all the keys present in a map.
  • map:remove - Removes a value by key from a map
  • map:count - Returns the count of the keys in the map
  • map:clear - Clears the map of all key/values

Introduced in MarkLogic 7

  • map:new - Creates a new map:map, but accepts a sequence of existing or map:entry(k,v). This is a very composable and convenient way to join multiple maps together.
  • map:entry - Create map:map with a single key/value structure.

Lexicon Support for Maps.

Maps are also supported as output for many lexicon based functions including

  • Scalar lexicon functions (cts:element-*-values,cts:values) - Returns a map where the key and the value are the same.
  • value-co-occurrence functions (cts:element-*-value-co-occurrences, cts:value-co-occurences) - Returns a map where the key is equal to the first tuple and the value is a sequence of the second tuple.

Maps by Example

Let's now walk through various examples of using maps, to get a better understanding of how and why to use them. The first example sticks a series of different value types inside a map, then walks the keys to describe each value.

xquery version "1.0-ml";
let $map := map:map()
let $puts := (
  map:put($map, "a", "a"),
  map:put($map, "b", <node>Some node</node>),
  map:put($map, "c", (1,2,3,4,5)),
  map:put($map, "d", function() {"Hello World"})
)
for $key in map:keys($map)
return
  fn:concat("Key:", $key, " is ", xdmp:describe(map:get($map, $key)))

Executed in Query Console, this returns

Key:c is (1, 2, 3, ...)
Key:b is <node>Some node</node>
Key:d is function() as item()*
Key:a is "a"

As you can see from the example, a map can flexibly store values, nodes and even functions.

Passing Maps By Reference

Another feature of maps is they can be passed around by reference. This allows sharing information between different modules/transactions and maintain a single instance across them. In the example below, We are going to take attendance by allowing multiple spawned functions to add entries to a global map across seperate transactions. In the final function we will check if a value is present and answering accordingly.

xquery version "1.0-ml";
let $map := map:map()
let $foo := xdmp:spawn-function(function() {
  map:put($map, "foo", "Foo is here")
})
let $bar := xdmp:spawn-function(function() {
  map:put($map,"bar", "Bar is hear(yawn)")
})
let $baz := xdmp:spawn-function(function() {
  if(map:contains($map, "bar")) then 
    map:put($map, "baz", "Baz is here, only if bar is here.")
  else map:put($map, "baz", "Baz is here, but why is bar always late")
})
return
  $map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="bar">
    <map:value xsi:type="xs:string">Bar is hear(yawn)</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, only if bar is here.</map:value>
  </map:entry>
</map:map>

But be aware of the fact that each spawn-function call is "non-blocking" for the return, so it could return before all results that come back. In the next example, we will have the "bar" function sleep for 1s before it executes its map:put.

...
let $bar := xdmp:spawn-function(function() {
  xdmp:sleep(1000),
  map:put($map,"bar", "I am lazy bar")
})
...
return
  $map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, but why is bar always late</map:value>
  </map:entry>
</map:map>

(As you can see "baz" is pretty upset bar is not present.

To provide "blocking" you can pass result=true option to xdmp:spawn-function or use xdmp:invoke-function in its stead.)

Maps and JSON

Maps are also directly serializable to json using xdmp:to-json. In fact, map:map and json:object in MarkLogic are something like cousins, as they can be used interchangeably support identity/casting between types. A fundamental difference is that json:object maintains key order, but map:map does not. So in cases where you care about the ordering of the keys, you can use a json:object and all puts will preserve the order. As you can see from the example below, you can compose a json:object and use map functions to populate it with data and render the output to json. The following object composes the same json structure using map:map and json:object:

let $json-object := map:map()
let $puts := (
  map:put($json-object, "name", "Gary Vidal"),
  map:put($json-object, "age", 40),
  map:put($json-object, "birthdate", xs:date("1974-09-09"))
)
return
  xdmp:to-json($json-object)

Returns:
{"birthdate":"1974-09-09", "name":"Gary Vidal", "age":40}
let $json-object := json:object()
...
return
  xdmp:to-json($json-object)

Returns:
{"name":"Gary Vidal", "age":40, "birthdate":"1974-09-09"}

In the example above, the order is preserved using json:object, where the map:map is not.

Map Operators

Map operators were formally introduced in MarkLogic 7, but have been available since MarkLogic 5. The documentation for map operators can be found at Map Operators.

Operator Description
+ Computes the union (distinct) of two maps, such as (ex. $map1 + $map2).
- means the difference of two maps (think of it as set difference) (ex $map1 - $map2). This operator also works as an unary operator. So, -B has the keys and values reversed (-$map1)
* means intersection (ex. $map1 * $map2) where only the keys present in both maps are returned.
div means inference. So A div B would consists of keys from the map A, and values from the map B, where A's value is equal to B's key or simply a join.
mod (ex. $map1 mod $map2) is equivalent to -A div B

Now that you have a basic understanding of the operators let's apply them to some examples.

Union (Distinct) ($map + $map)

xquery version "1.0-ml";
let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b")
))
let $map2 := map:new((
  map:entry("a", "b"),
  map:entry("b", "b"),
  map:entry("c", "c")
))
return
  $map1 + $map2

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="b">
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
  <map:entry key="c">
    <map:value xsi:type="xs:string">c</map:value>
  </map:entry>
  <map:entry key="a">
    <map:value xsi:type="xs:string">a</map:value>
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
</map:map>

As you can see all keys from $map1 and $map2 are combined and only the distinct values are returned. It's important to understand this distinction, because if you are counting the values after you union, you will get the distinct union's count not a merge of the 2 maps where duplicate values are repeated.

Difference ($map - $map)

In the example below we want to compute the difference between two maps.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
  $map1 - $map2

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="e">
    <map:value xsi:type="xs:string">e</map:value>
  </map:entry>
</map:map>

Wait! Why did it return only the entry for key:'e'? This is due to the ordering of the difference, that only computes the difference of keys in $map1 not present in $map2. So to compute all differences, you must do a bit more math to solve, but the answer is quite simple.

  ($map1 - $map2) + ($map2 - $map1)

Returns keys (cde)

Inversion (-$map)

Inversion of a map is quite simple as you are simple inverting the map:map, so each value becomes a key and every key becomes a value. Since all keys are strings, you will lose type if your values are non-string types. The string function will be computed for all non-string values during inversion.

xquery version "1.0-ml";
let $map := map:new((
  map:entry("a", 1),
  map:entry("b", ("v1", "v2")),
  map:entry("c", function() {"Hello World"}),
  map:entry("d", <node>Some node</node>)
))
return -$map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
<map:entry key="v2">
<map:value xsi:type="xs:string">b</map:value>
</map:entry>
<map:entry key="function() as item()*">
<map:value xsi:type="xs:string">c</map:value>
</map:entry>
<map:entry key="&lt;node&gt;Some node&lt;/node&gt;">
<map:value xsi:type="xs:string">d</map:value>
</map:entry>
<map:entry key="1">
<map:value xsi:type="xs:string">a</map:value>
</map:entry>
<map:entry key="v1">
<map:value xsi:type="xs:string">b</map:value>
</map:entry>
</map:map>

Intersects Operator ($map * $map)

In the following example, we are assuming we have key/values present in both maps and only want those key/values that intersect.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
   $map1 * $map2

Returns keys (a, b)

It is important to note that the intersects operation is computed on key and value, so in cases where both maps share the same key, but not the same value for that key, then the keys do not intersect.

Inference/Join Operator ($map1 div $map2)

For inferencing/join we will focus on a more practical example of joining students names to test scores. In the example below each user is assigned an id noted as the value of the map:entry. Another map stores the id and all the scores for each test. You can see the scores are now joined directly to the name via its id value.

For real-world use case, this could easily be stored in MarkLogic as xml/json fragments, with range indexes enabled for (name, id, score). This is where cts:value-co-occurrences should be used to return map as output for (name, id) and (id, score).


let $map1 := map:new((
   map:entry("jenny", "a1"),
   map:entry("bob"  , "b1"),
   map:entry("tom"  , "c1"),
   map:entry("rick" , "d1")
))

let $map2 := map:new((
  map:entry("a1", (90,95,100,88)),
  map:entry("b1", (77,68,82,60)),
  map:entry("c1", (0,0,85,89))
))
return
 $map1 div $map2

Returns

 <map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">90</map:value>
    <map:value xsi:type="xs:integer">95</map:value>
    <map:value xsi:type="xs:integer">100</map:value>
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="tom">
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">85</map:value>
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="bob">
    <map:value xsi:type="xs:integer">77</map:value>
    <map:value xsi:type="xs:integer">68</map:value>
    <map:value xsi:type="xs:integer">82</map:value>
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

mod Operators ($map1 mod $map2)

I am going to skip the example for the mod operator since it equates to (-$map1 div $map2).

Combining Techniques to Solve Complex Issues

So we learned a little bit about all the operators, now lets start to combine them to solve more complicated problems.

Diffing Data

One problem I have encountered very often is determining the difference between 2 structures. Let's consider 2 documents that have similar structures, but have differences in ordering or values. We want to create a diff-gram that determines what inserts or updates need to occur between 2 documents. Using the difference and intersects operators we can compose a complete diff-gram that runs quite efficiently. In the example below, we take 2 nodes and then iterate each node and put each path inside a map:map, then compute the difference using map operators.

xquery version "1.0-ml";
let $version1 := 
  <node>
    <last-modifed>2001-01-01</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Bob Franco</author>
    <author>Billy Bob Thornton</author>
    <added>I am added</added>  
  </node>
let $version2 := 
  <node>
    <last-modifed>2001-01-12</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Billy Bob Thornton</author>
    <author>James Franco</author>
    <added1>I am added1</added1>
  </node>
let $version1-map := map:map()
let $version2-map := map:map()
let $_ := ( 
  (:Map values paths to maps:)
  $version1/element() ! map:put($version1-map, xdmp:path(.), fn:data(.)),
  $version2/element() ! map:put($version2-map, xdmp:path(.), fn:data(.))
)
let $same    := $version1-map * $version2-map
let $inserts := $version1-map - $version2-map
let $deletes := $version2-map - $version1-map
return 
  <diff>{(
    map:keys($same)    ! <same path="{.}">{map:get($same,.)}</same>,
    map:keys($deletes) ! <delete path="{.}">{map:get($deletes,.)}</delete>,
    map:keys($inserts) ! <insert path="{.}">{map:get($inserts,.)}</insert>
  )}</diff>

Returns

<diff>
  <same path="/node/subtitle">Same ole title</same>
  <same path="/node/title">Here is a title</same>
  <delete path="/node/last-modifed">2001-01-12</delete>
  <delete path="/node/added1">I am added1</delete>
  <delete path="/node/author[2]">James Franco</delete>
  <delete path="/node/author[1]">Billy Bob Thornton</delete>
  <insert path="/node/last-modifed">2001-01-01</insert>
  <insert path="/node/added">I am added</insert>
  <insert path="/node/author[2]">Billy Bob Thornton</insert>
  <insert path="/node/author[1]">Bob Franco</insert>
</diff>

Creating Fast 2-Way Lookup Tables

When working with Excel (2007+) in MarkLogic, you often will need to convert between R1C1 and the index value of a column and vice-versa. To calculate the column name over 255 columns over every row can be expensive, so computing this lookup table only once can drastically improve the performance of an application. In the example below, you will note that I only compute the table once and use the inversion (-) operator to create the reverse direction.

xquery version "1.0-ml";
declare variable $ALPHA-INDEX-MAP := 
  let $map := map:map()
  let $alpha := ("", (65 to 65+25 ) ! fn:codepoints-to-string(.))
  let $calcs := 
    for $c1 in $alpha
    for $c2 in $alpha
    for $c3 in $alpha[2 to fn:last()]
    where $c1 = "" or fn:not($c2 = "")
    return 
      ($c1 || $c2|| $c3)
  let $_ := for $col at $pos in $calcs return map:put($map, $col, $pos)
  return
    $map
;  
declare variable $INDEX-ALPHA-MAP := -$ALPHA-INDEX-MAP;

(: Which index corresponds to ZA? :)
map:get($ALPHA-INDEX-MAP, "ZA"),
(: Which alpha corresponds to 32? :)
map:get($INDEX-ALPHA-MAP, "32")

Aggregating map:map data

Starting with MarkLogic 7, maps now support aggregate functions such as min/max/sum/avg. To perform aggregates, we will use built-in functions corresponding to the aggregate we want to apply and pass the map:map as the argument. Let's look at our student scores example and assume that we are getting values from MarkLogic lexicon functions like cts:value-co-occurrences, but we will illustrate examples with sample code below:

let $student-scores := map:new((
  map:entry("jenny", (90,95,100,88)),
  map:entry("bobby", (77,68,82,60)),
  map:entry("rick",  (0,0,85,89))
))
return
  $student-scores

Now we want to compute the average score per student

 fn:avg($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:decimal">43.5</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:decimal">93.25</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:decimal">71.75</map:value>
  </map:entry>
</map:map>

Now lets compute the max score per student.

fn:max($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">100</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">82</map:value>
  </map:entry>
</map:map>

The same can be done for min

fn:min($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">0</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

... And Sum

fn:sum($student-scores)
<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">174</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">373</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">287</map:value>
  </map:entry>
</map:map>

And finally count?

fn:count($student-scores)
returns
1

Well that was not expected ... or was it? The count function is ambiguous in what should it count. Should it count the values of the map by key or the map itself? A simple solution is as below and satisfies our need to count the number of tests (although not as efficient as using an aggregate function):

 map:new(
   map:keys($student-scores) ! map:entry(., fn:count(map:get($student-scores, .)))
 )

Returns

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
</map:map>

I hope this helps you understand the potential power of using map:map operators and aggregates. In the future, I will be writing a series of articles on harvesting the power of maps.

Stay Tuned ...

Comments