Blog(RSS)

A UDF for Ranged Buckets

by Dave Cassel

Last week I wrote a blog post about working with Ranged Buckets. To summarize the problem, we have data that look like this:

<doc>
  <lo>2</lo>
  <hi>9</hi>
  <id>1154</id>
</doc>

We want to build a facet with buckets like 0-4, 5-8, 9-12, 13-16, and 17-20. The "lo" and "hi" values in the sample document represent a range, so the document should be counted for the 0-4, 5-8, and 9-12 buckets, even though no value from 5-8 appears in the document. 

In my earlier post, I showed how to solve this problem using a normal custom constraint. Today, I took a crack at it with a more involved technique -- a User Defined Function. Also referred to as "Aggregate User Defined Functions", UDFs let MarkLogic application developers write C++ code to implement map/reduce jobs. For me, this took some effort as I haven't written much meaningful C++ since I came to MarkLogic about 5 years ago (the notable exception being the other UDF that I wrote). I got through it, though, and I found some interesting results. (Feel free to suggest improvements to the code.)

Implementation

I'll refer you to the documentation for the general background on UDFs, but essentially, you need to think about four functions.

start

The start function handles any arguments used to customize this run of the UDF. In my case, I needed to pass in the buckets that I wanted to use. I dynamically allocate an array of buckets that I'll use throughout the job. 

map

Two range indexes get passed in -- one for the "lo" element and one for the "hi" element. The map function gets called for each forest stand in the database, examining the values in the input range indexes. When two indexes are passed in, the map function sees the values as tuples. For instance, the values in the sample document above show up as the tuple (2, 9). Always check the frequency of that tuple, in case the same pair occurs in multiple documents. Once this function has been called for a stand, we know the counts for each bucket for the values in that particular stand. 

reduce

The reduce function combines the per-stand counts, aggregating them until a set of values for the entire database is known. My implementation just needed to add the counts for each bucket. 

finish

The last step is to organize the results in a way that they can be sent back to XQuery. The finish function builds a map, using "0-4" as the key for the first bucket and the count as the value. 

Encoding and Decoding

When working in a cluster, the encode and decode functions are important too. For my simple tests, I implemented them but used the UDF on a single MarkLogic instance, so these functions weren't called. 

Deploying

Building the UDF is pretty simple using the Makefile provided by MarkLogic. I customized the two places where the name needed to match my filename, but otherwise left it alone. 

After compiling, I uploaded the UDF to MarkLogic using Query Console. I exported the workspace and that's available on GitHub

You can call a UDF using the /v1/values endpoint, but I decided to wrap it in a custom constraint to provide a straightforward comparison with the custom constraint built in the previous post. After all, the goal is to provide a facet. A custom constraint requires some XML for the search options and some XQuery

The Results

I figured UDFs would be more interesting with multiple forests, as mapping a job to a single forest that has just one stand doesn't gain any parallelism. With that in mind, I bumped my database up to four forests, then to six, and compared my UDF implementation with the two-function approach I described in the previous approach. I tested with the same 100,000 documents used in the previous post. 

Median Seconds 4 forests 6 forests
UDF 0.002898 0.002858
two-function 0.003909 0.004261

The numbers are the median seconds returned in the facet-resolution-time part of the response to /v1/search?options=udf or /v1/search?options=startfinish. A couple of things jump out at me. First, the UDF out-performed the two-function XQuery custom facet. Second, the UDF had a very slight improvement while moving from four forests to six -- slight enough that let's call it even. The two-function approach, however, increased a noticable amount. 

Thoughts on UDFs

When should you reach for a UDF? When your data don't support directly getting your values, it might be worthwhile. For instance, with ranged buckets we can't simply do a facet on "lo" or "hi", because we wouldn't represent the values in between. Writing a UDF is more complicated and more dangerous than other approaches, but appears to have some performance benefits. 

There is usually an alternative. For instance, in this case I could have supplemented my data such that the sample document would have all values from two through nine inclusive, allowing me to use a standard facet. That leads to the tradeoff -- do I want to spend a little more time at ingest and take up a little more space, or do I want to dynamically compute the values I need? The answer to that question is certainly application specific, but UDFs provide a handy (and sharp!) tool to work with. 

 


[1] You'll find the MarkLogic Makefile for UDFs in /opt/MarkLogic/Samples/NativePlugins/ (Linux), ~/Library/MarkLogic/Samples/NativePlugins/ (Mac), or C:\Program Files\MarkLogic\Samples\NativePlugins\ (Windows). 

Working with Ranged Buckets

by Dave Cassel

One of my colleagues ran into an interesting problem a while ago. He had data with high and low values for some field, and he wanted to display bucketed facets on those values. Let's take a look at how to implement that.

Note: all the code for this post is available at my ranged-bucket GitHub repo, so you're welcome to clone and follow along.

The Data

To illustrate the point, let's look at some sample data.

<doc>
  <lo>2</lo>
  <hi>9</hi>
  <id>1154</id>
</doc>

This represents a document whose valid values range from 2 to 9. Now suppose we want to get a bucketed facet on these documents, showing how many fall into ranges like 0-4, 5-8, 9-12, etc. The first observation is this is different from how we usually do facets or buckets. The sample document should be counted for the 5-8 bucket, even though no value from five to eight appears in the document.

The next observation is that a document may well fall into more than one bucket. The example document will be represented in three of the buckets we've specified so far.

Generating Data

We need some data to work with, so let's generate some. The repository has a Query Console workspace that you can import, with a buffer to generate sample data with "lo" values ranging from zero to 10 (inclusive) and "hi" values ranging from zero to twenty. The high value is a random number added to the low, ensuring that the high is always greater than the low.

The Code

To implement this, two approaches occurred to me: a custom constraint facet and a UDF. This post shows the custom constraint approach; I'll return to the UDF another time.

Custom Constraint

To implement a custom constraint facet, there are three functions we need to know about. The first is used when someone selects a facet value, or otherwise makes use of a constraint -- the function parses the request and turns it into a cts:query. This function is important for any constraint, whether used as a facet or not.

The text part of the incoming request is expected to look like "5-8", or some other pair of numbers. These are split and used to build an and-query.

To make a custom constraint work as a facet, you need to implement functions to return values and counts. These are split into start-facet and finish-facet functions. The job of the start function is to make the lexicon API calls needed to identify the values; the finish function formats the results as the Search API and REST API expect.

You're not technically required to implement the start function -- you can make the lexicon calls in the finish function if you want. That's actually a simple way to get started. You will get some performance improvement if you split the work properly, however. To illustrate this, I implemented both ways. I'll only show the split code here, but you can see the single-function approach at GitHub.

Here's the split implementation:

You can see the call to xdmp:estimate() in the start function. The values returned end up in the $start parameter to the finish function. Why split them up this way? Because MarkLogic can do some of this work in the background, allowing for a faster overall response.

Sidebar: why estimate and not count?

Note that what you return from the start function is important. In my first attempt, my start function constructed elements with attributes for count, hi, and low, then the finish function pulled out what it needed to make the search:facet-value elements. That was (not surprisingly) slower than just doing everything in the finish function. My revised implementation just returns the results of the xdmp:estimate() calls. The finish function already knows what order they will be in, so it's able to map those to the correct hi-lo values to construct the search:facet-values.

It's fair to ask how much difference the one-function versus two-function approaches makes. I generated 100,000 sample documents and ran some simple tests on my MacBook Pro (MarkLogic 7.0-4.1). (I should caveat this by saying I didn't trouble to shut everything else down, I just wanted to get an idea about the difference.) I threw in a single-value, standard bucket facet for comparison. Each approach was run as a single facet, making calls through the REST API.

Approach Median Facet-resolution Time
Two function 0.003622 sec
One function 0.009369 sec
Single-value buckets 0.001049 sec

Take these as rough numbers, but if you'd like to run more precise tests, you can get the full code and configuration from GitHub.

Understanding map:map operators, aggregates and use cases

by Gary Vidal

In this article I am going to go into a comprehensive discussion on maps, operators and show even more functionality exposed through maps. In a previous post Returning Lexicon Values using XPath Expressions, I alluded to the fact that map:map supports operators and used it to compute the difference of two maps to filtering results for processing. I want to provide a more in-depth discussion into how maps work and then delve deeper into powerful features provided by maps.

Introduction to Maps

To begin, let's formally define some basic constructs for how a map:map works. Maps are in-memory key/value structures, introduced to MarkLogic in version 5. Out of the box, maps provide the ability to perform fast in-memory inserts/updates/lookups/existence checks. Maps are also mutable structures, so you can change them without creating copies like you would changing XML Node types. This allows all operations to execute very efficiently and with side-effects, not common in functional programming languages like XQuery.

The basic operations you can perform with maps are well documented on the MarkLogic map functions page.

Map Operations

  • map:map - Creates a new map or creates a map with data from an xml serialization of a map:map.
  • map:put - Puts a value by key into a map
  • map:get - Gets a value from a map by key
  • map:keys - Returns all the keys present in a map.
  • map:remove - Removes a value by key from a map
  • map:count - Returns the count of the keys in the map
  • map:clear - Clears the map of all key/values

Introduced in MarkLogic 7

  • map:new - Creates a new map:map, but accepts a sequence of existing or map:entry(k,v). This is a very composable and convenient way to join multiple maps together.
  • map:entry - Create map:map with a single key/value structure.

Lexicon Support for Maps.

Maps are also supported as output for many lexicon based functions including

  • Scalar lexicon functions (cts:element-*-values,cts:values) - Returns a map where the key and the value are the same.
  • value-co-occurrence functions (cts:element-*-value-co-occurrences, cts:value-co-occurences) - Returns a map where the key is equal to the first tuple and the value is a sequence of the second tuple.

Maps by Example

Let's now walk through various examples of using maps, to get a better understanding of how and why to use them. The first example sticks a series of different value types inside a map, then walks the keys to describe each value.

xquery version "1.0-ml";
let $map := map:map()
let $puts := (
  map:put($map, "a", "a"),
  map:put($map, "b", <node>Some node</node>),
  map:put($map, "c", (1,2,3,4,5)),
  map:put($map, "d", function() {"Hello World"})
)
for $key in map:keys($map)
return
  fn:concat("Key:", $key, " is ", xdmp:describe(map:get($map, $key)))

Executed in Query Console, this returns

Key:c is (1, 2, 3, ...)
Key:b is <node>Some node</node>
Key:d is function() as item()*
Key:a is "a"

As you can see from the example, a map can flexibly store values, nodes and even functions.

Passing Maps By Reference

Another feature of maps is they can be passed around by reference. This allows sharing information between different modules/transactions and maintain a single instance across them. In the example below, We are going to take attendance by allowing multiple spawned functions to add entries to a global map across seperate transactions. In the final function we will check if a value is present and answering accordingly.

xquery version "1.0-ml";
let $map := map:map()
let $foo := xdmp:spawn-function(function() {
  map:put($map, "foo", "Foo is here")
})
let $bar := xdmp:spawn-function(function() {
  map:put($map,"bar", "Bar is hear(yawn)")
})
let $baz := xdmp:spawn-function(function() {
  if(map:contains($map, "bar")) then 
    map:put($map, "baz", "Baz is here, only if bar is here.")
  else map:put($map, "baz", "Baz is here, but why is bar always late")
})
return
  $map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="bar">
    <map:value xsi:type="xs:string">Bar is hear(yawn)</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, only if bar is here.</map:value>
  </map:entry>
</map:map>

But be aware of the fact that each spawn-function call is "non-blocking" for the return, so it could return before all results that come back. In the next example, we will have the "bar" function sleep for 1s before it executes its map:put.

...
let $bar := xdmp:spawn-function(function() {
  xdmp:sleep(1000),
  map:put($map,"bar", "I am lazy bar")
})
...
return
  $map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="foo">
    <map:value xsi:type="xs:string">Foo is here</map:value>
  </map:entry>
  <map:entry key="baz">
    <map:value xsi:type="xs:string">Baz is here, but why is bar always late</map:value>
  </map:entry>
</map:map>

(As you can see "baz" is pretty upset bar is not present.

To provide "blocking" you can pass result=true option to xdmp:spawn-function or use xdmp:invoke-function in its stead.)

Maps and JSON

Maps are also directly serializable to json using xdmp:to-json. In fact, map:map and json:object in MarkLogic are something like cousins, as they can be used interchangeably support identity/casting between types. A fundamental difference is that json:object maintains key order, but map:map does not. So in cases where you care about the ordering of the keys, you can use a json:object and all puts will preserve the order. As you can see from the example below, you can compose a json:object and use map functions to populate it with data and render the output to json. The following object composes the same json structure using map:map and json:object:

let $json-object := map:map()
let $puts := (
  map:put($json-object, "name", "Gary Vidal"),
  map:put($json-object, "age", 40),
  map:put($json-object, "birthdate", xs:date("1974-09-09"))
)
return
  xdmp:to-json($json-object)

Returns:
{"birthdate":"1974-09-09", "name":"Gary Vidal", "age":40}
let $json-object := json:object()
...
return
  xdmp:to-json($json-object)

Returns:
{"name":"Gary Vidal", "age":40, "birthdate":"1974-09-09"}

In the example above, the order is preserved using json:object, where the map:map is not.

Map Operators

Map operators were formally introduced in MarkLogic 7, but have been available since MarkLogic 5. The documentation for map operators can be found at Map Operators.

Operator Description
+ Computes the union (distinct) of two maps, such as (ex. $map1 + $map2).
- means the difference of two maps (think of it as set difference) (ex $map1 - $map2). This operator also works as an unary operator. So, -B has the keys and values reversed (-$map1)
* means intersection (ex. $map1 * $map2) where only the keys present in both maps are returned.
div means inference. So A div B would consists of keys from the map A, and values from the map B, where A's value is equal to B's key or simply a join.
mod (ex. $map1 mod $map2) is equivalent to -A div B

Now that you have a basic understanding of the operators let's apply them to some examples.

Union (Distinct) ($map + $map)

xquery version "1.0-ml";
let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b")
))
let $map2 := map:new((
  map:entry("a", "b"),
  map:entry("b", "b"),
  map:entry("c", "c")
))
return
  $map1 + $map2

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="b">
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
  <map:entry key="c">
    <map:value xsi:type="xs:string">c</map:value>
  </map:entry>
  <map:entry key="a">
    <map:value xsi:type="xs:string">a</map:value>
    <map:value xsi:type="xs:string">b</map:value>
  </map:entry>
</map:map>

As you can see all keys from $map1 and $map2 are combined and only the distinct values are returned. It's important to understand this distinction, because if you are counting the values after you union, you will get the distinct union's count not a merge of the 2 maps where duplicate values are repeated.

Difference ($map - $map)

In the example below we want to compute the difference between two maps.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
  $map1 - $map2

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="e">
    <map:value xsi:type="xs:string">e</map:value>
  </map:entry>
</map:map>

Wait! Why did it return only the entry for key:'e'? This is due to the ordering of the difference, that only computes the difference of keys in $map1 not present in $map2. So to compute all differences, you must do a bit more math to solve, but the answer is quite simple.

  ($map1 - $map2) + ($map2 - $map1)

Returns keys (cde)

Inversion (-$map)

Inversion of a map is quite simple as you are simple inverting the map:map, so each value becomes a key and every key becomes a value. Since all keys are strings, you will lose type if your values are non-string types. The string function will be computed for all non-string values during inversion.

xquery version "1.0-ml";
let $map := map:new((
  map:entry("a", 1),
  map:entry("b", ("v1", "v2")),
  map:entry("c", function() {"Hello World"}),
  map:entry("d", <node>Some node</node>)
))
return -$map

Returns

<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
<map:entry key="v2">
<map:value xsi:type="xs:string">b</map:value>
</map:entry>
<map:entry key="function() as item()*">
<map:value xsi:type="xs:string">c</map:value>
</map:entry>
<map:entry key="&lt;node&gt;Some node&lt;/node&gt;">
<map:value xsi:type="xs:string">d</map:value>
</map:entry>
<map:entry key="1">
<map:value xsi:type="xs:string">a</map:value>
</map:entry>
<map:entry key="v1">
<map:value xsi:type="xs:string">b</map:value>
</map:entry>
</map:map>

Intersects Operator ($map * $map)

In the following example, we are assuming we have key/values present in both maps and only want those key/values that intersect.

let $map1 := map:new((
   map:entry("a", "a"),
   map:entry("b", "b"),
   map:entry("e", "e")
))
let $map2 := map:new((
  map:entry("a", "a"),
  map:entry("b", "b"),
  map:entry("c", "c"),
  map:entry("d", "d")

))
return
   $map1 * $map2

Returns keys (a, b)

It is important to note that the intersects operation is computed on key and value, so in cases where both maps share the same key, but not the same value for that key, then the keys do not intersect.

Inference/Join Operator ($map1 div $map2)

For inferencing/join we will focus on a more practical example of joining students names to test scores. In the example below each user is assigned an id noted as the value of the map:entry. Another map stores the id and all the scores for each test. You can see the scores are now joined directly to the name via its id value.

For real-world use case, this could easily be stored in MarkLogic as xml/json fragments, with range indexes enabled for (name, id, score). This is where cts:value-co-occurrences should be used to return map as output for (name, id) and (id, score).


let $map1 := map:new((
   map:entry("jenny", "a1"),
   map:entry("bob"  , "b1"),
   map:entry("tom"  , "c1"),
   map:entry("rick" , "d1")
))

let $map2 := map:new((
  map:entry("a1", (90,95,100,88)),
  map:entry("b1", (77,68,82,60)),
  map:entry("c1", (0,0,85,89))
))
return
 $map1 div $map2

Returns

 <map:map xmlns:map="http://marklogic.com/xdmp/map" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">90</map:value>
    <map:value xsi:type="xs:integer">95</map:value>
    <map:value xsi:type="xs:integer">100</map:value>
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="tom">
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">0</map:value>
    <map:value xsi:type="xs:integer">85</map:value>
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="bob">
    <map:value xsi:type="xs:integer">77</map:value>
    <map:value xsi:type="xs:integer">68</map:value>
    <map:value xsi:type="xs:integer">82</map:value>
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

mod Operators ($map1 mod $map2)

I am going to skip the example for the mod operator since it equates to (-$map1 div $map2).

Combining Techniques to Solve Complex Issues

So we learned a little bit about all the operators, now lets start to combine them to solve more complicated problems.

Diffing Data

One problem I have encountered very often is determining the difference between 2 structures. Let's consider 2 documents that have similar structures, but have differences in ordering or values. We want to create a diff-gram that determines what inserts or updates need to occur between 2 documents. Using the difference and intersects operators we can compose a complete diff-gram that runs quite efficiently. In the example below, we take 2 nodes and then iterate each node and put each path inside a map:map, then compute the difference using map operators.

xquery version "1.0-ml";
let $version1 := 
  <node>
    <last-modifed>2001-01-01</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Bob Franco</author>
    <author>Billy Bob Thornton</author>
    <added>I am added</added>  
  </node>
let $version2 := 
  <node>
    <last-modifed>2001-01-12</last-modifed>
    <title>Here is a title</title>
    <subtitle>Same ole title</subtitle>
    <author>Billy Bob Thornton</author>
    <author>James Franco</author>
    <added1>I am added1</added1>
  </node>
let $version1-map := map:map()
let $version2-map := map:map()
let $_ := ( 
  (:Map values paths to maps:)
  $version1/element() ! map:put($version1-map, xdmp:path(.), fn:data(.)),
  $version2/element() ! map:put($version2-map, xdmp:path(.), fn:data(.))
)
let $same    := $version1-map * $version2-map
let $inserts := $version1-map - $version2-map
let $deletes := $version2-map - $version1-map
return 
  <diff>{(
    map:keys($same)    ! <same path="{.}">{map:get($same,.)}</same>,
    map:keys($deletes) ! <delete path="{.}">{map:get($deletes,.)}</delete>,
    map:keys($inserts) ! <insert path="{.}">{map:get($inserts,.)}</insert>
  )}</diff>

Returns

<diff>
  <same path="/node/subtitle">Same ole title</same>
  <same path="/node/title">Here is a title</same>
  <delete path="/node/last-modifed">2001-01-12</delete>
  <delete path="/node/added1">I am added1</delete>
  <delete path="/node/author[2]">James Franco</delete>
  <delete path="/node/author[1]">Billy Bob Thornton</delete>
  <insert path="/node/last-modifed">2001-01-01</insert>
  <insert path="/node/added">I am added</insert>
  <insert path="/node/author[2]">Billy Bob Thornton</insert>
  <insert path="/node/author[1]">Bob Franco</insert>
</diff>

Creating Fast 2-Way Lookup Tables

When working with Excel (2007+) in MarkLogic, you often will need to convert between R1C1 and the index value of a column and vice-versa. To calculate the column name over 255 columns over every row can be expensive, so computing this lookup table only once can drastically improve the performance of an application. In the example below, you will note that I only compute the table once and use the inversion (-) operator to create the reverse direction.

xquery version "1.0-ml";
declare variable $ALPHA-INDEX-MAP := 
  let $map := map:map()
  let $alpha := ("", (65 to 65+25 ) ! fn:codepoints-to-string(.))
  let $calcs := 
    for $c1 in $alpha
    for $c2 in $alpha
    for $c3 in $alpha[2 to fn:last()]
    where $c1 = "" or fn:not($c2 = "")
    return 
      ($c1 || $c2|| $c3)
  let $_ := for $col at $pos in $calcs return map:put($map, $col, $pos)
  return
    $map
;  
declare variable $INDEX-ALPHA-MAP := -$ALPHA-INDEX-MAP;

(: Which index corresponds to ZA? :)
map:get($ALPHA-INDEX-MAP, "ZA"),
(: Which alpha corresponds to 32? :)
map:get($INDEX-ALPHA-MAP, "32")

Aggregating map:map data

Starting with MarkLogic 7, maps now support aggregate functions such as min/max/sum/avg. To perform aggregates, we will use built-in functions corresponding to the aggregate we want to apply and pass the map:map as the argument. Let's look at our student scores example and assume that we are getting values from MarkLogic lexicon functions like cts:value-co-occurrences, but we will illustrate examples with sample code below:

let $student-scores := map:new((
  map:entry("jenny", (90,95,100,88)),
  map:entry("bobby", (77,68,82,60)),
  map:entry("rick",  (0,0,85,89))
))
return
  $student-scores

Now we want to compute the average score per student

 fn:avg($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:decimal">43.5</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:decimal">93.25</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:decimal">71.75</map:value>
  </map:entry>
</map:map>

Now lets compute the max score per student.

fn:max($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">89</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">100</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">82</map:value>
  </map:entry>
</map:map>

The same can be done for min

fn:min($student-scores)

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">0</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">88</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">60</map:value>
  </map:entry>
</map:map>

... And Sum

fn:sum($student-scores)
<map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:integer">174</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:integer">373</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:integer">287</map:value>
  </map:entry>
</map:map>

And finally count?

fn:count($student-scores)
returns
1

Well that was not expected ... or was it? The count function is ambiguous in what should it count. Should it count the values of the map by key or the map itself? A simple solution is as below and satisfies our need to count the number of tests (although not as efficient as using an aggregate function):

 map:new(
   map:keys($student-scores) ! map:entry(., fn:count(map:get($student-scores, .)))
 )

Returns

 <map:map xmlns:map="http://marklogic.com/xdmp/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <map:entry key="rick">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="jenny">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
  <map:entry key="bobby">
    <map:value xsi:type="xs:unsignedLong">4</map:value>
  </map:entry>
</map:map>

I hope this helps you understand the potential power of using map:map operators and aggregates. In the future, I will be writing a series of articles on harvesting the power of maps.

Stay Tuned ...

blogroll Blogroll