Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Lazy Evaluation Magic

by Joe Crean

Laziness is rarely ever viewed as a good quality, but when used appropriately, lazy evaluation can be a really great technique!

Lazy evaluation is the ability to process data only when needed. The example presented here shows how a call to cts:search() can be processed but actually evaluated at a later point when it is directly bound into an expression. This is particularly useful when we want to process a very long sequence (e.g. the search matches millions of documents) and we don't really want to access all members of the sequence. Lazy Evaluation can be very useful however it is not always desired. For example, if we need to read out the contents of a range index, we may explicitly want to read all the contents of the index into memory. See Result Streams from Range Indexes for an example of lazy evolution in action.

We recently implemented a common request for one of our customers: to produce a sequence of search results for “child” documents grouped by their “parent” document, and return only the most relevant hit for each parent. To clarify further, let's imagine that the parent documents are chapters with tables of contents and the child documents are sections with the actual content. We want to search all sections and receive a list of chapters whose children match the query. This list is in descending relevance order. Each chapter with a hit should only appear once in the list.

Child documents can be identified by an element is-chapter containing a Boolean value set to false. Each child document has a parent chapter whose id is stored in the element chapter-id.

To help you visualize this, let's look at some sample XML.

Here is a sample of a chapter document:

And here is a sample section document. Note that it has an element chapter-id which identifies the chapter to which it belongs:

And here is an XQuery implementation of this requirement that makes use of lazy evaluation:

In the and-query we are performing a word search in all "non-chapter” docs (is-chapter = false). We then set up the final call of the query which returns a list of parents of the matched children in most relevant order. 

Profiling demonstrates how lightning-fast the query can be executed, and how an increased amount of distinct values (the chapter-id's)can be found with remarkably little overhead even though the actual search query could match many documents.

In order to get the first 10 distinct values in our test case, for example, it needed to look at the first 12 search hits (the inline function $p above is called 12 times). To retrieve the first 20 distinct values, it needed to examine the first 31 search hits. 

But shouldn't there be massive performance penalty for retrieving all search results and then extracting the distinct values? Luckily, lazy evaluation and some useful features of the MarkLogic Search API can prevent that.

The call to cts:search returns an iterator and the profiler shows that this function is called just once. The call to fn:map also returns an iterator (although a profiler bug shows this as being invoked many times, it is only invoked once). The function fn:distinct-values takes advantage of these iterators and uses them to fill up the sequence of unique values. Thanks to these iterators, the query does not need to initially retrieve all search results, and in turn is super-efficient!

In summary, lazy evolution can be a very useful technique when selectively processing large result sets!!!

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

Comments

The commenting feature on this page is enabled by a third party. Comments posted to this page are publicly visible.
  • Lazy Evaluation Magic we have to follow this. <a href=http://afsid.org/>Gclub</a>
  • Thanks for this nice example of lazy evaluation! We've had a similar requirement of grouping documents. It needs to be said that in case of chapters containing lots of documents this solution for the grouping problem can be slow. Worst case example: a chapter contains hundreds of documents and these documents make up the first few hundred search results. Then all these documents have to be loaded from disk to be pumped through the map function.