Range Index Scoring

by Adam Fowler

A new feature of MarkLogic 7′s search API is range index scoring – affecting relevancy based on a value within a document. Here I detail a couple of use cases

Range index scoring allows you to determine relevancy by values in a document, rather than matching values against a term exactly.

A good use case of this is for ratings. A higher rating should show nearer the top of search results.

A second use case of that of distance from the centre point of a geospatial query. Just like you get on hotel search websites.

We can now do these directly in MarkLogic without any special voodoo from a developer. Just set up the search options and perform a query. Easy!

Show me!

Below is the feature in action:


This uses MLJS for rendering results, but the functionality is in core MarkLogic, not MLJS. MarkLogic also calculates a heatmap on the fly. This calculated data is passed to heatmap-openlayers.js. Much more efficient than just sending lots of data to heatmap.js, especially for thousands of visible points.

Note that the MLJS widgets interact with each other – hovering over a marker on the map highlights it in the search results list with a different background colour.

Isn’t this like sorting?

In a word, no.

Sorting is based purely on a value in a document. By changing relevancy scores you can combine different search terms. For example, you could have rating and distance and a word query all contributing to the relevancy score. A result which is a little further but a much higher rating may trump one that’s dead centre on the map, but with a low rating.

How does it work?

Under the hood you provide a set of options and a query. I’ve documented the REST search options I’m using, and the search query I’m sending, and the results I’m getting back raw within a Gist. Go have a read, it’s pretty straight forward. (I tend to go overkill in setting search options though!)

In Summary

Ever wanted to tweak relevancy by values in a document? Now you can! Go have a read of this new V7 feature, download MarkLogic today too, and have a play!

This post original appeared on Adam's blog and we thank him for his permission to re-post here. The post is related to several other of his posts:

Partial Document Updates with the REST API

by David Cassel

I just got my first taste of one of the new features in MarkLogic 7: the ability to do partial document updates through the built-in REST API. It made me happy.

You may be aware that the REST API was introduced in MarkLogic 6. That version handles search and full-document CRUD operations. However, if you wanted to supplement a document with a little piece of information, you had two choices:

  1. maintain the entire document on the client side, update it there, then PUT the entire document to /v1/documents, replacing the entire document
  2. write an extension to the REST API that could do a more surgical update on the server side

I tended to do the second. MarkLogic 7 includes an new feature that provides a third choice, that will largely replace the second one:

  1. send a PATCH command to /v1/documents, specifying what the update should be

There is now a chapter in the REST API developers guide dedicated to this rich feature.

Deleting a comment

I’ve been working on a demo that has event data. When you click on an event, you can record a comment on it. My next task was to let the user delete a comment, which I can specify within an event by an id on the comment. The site is implemented using AngularJS, so here’s the code:

That’s it. Not a line of XQuery needed, just out-of-the-box functionality. Cool.

This article first appeared as a post on David's blog.

Result Streams from Range Indexes

by Bradley Mann

The other day I needed to write an XQuery script to collect all the values from a range index and group them into "sets" of 1000. I ended up with something like this:

This query performs fine on a small set of values (5000), but when we increase the number of values pulled from the range index, we see that this call

let $group := $values[(($i * $groupsize) + 1) to (($i + 1) * $groupsize)]

quickly becomes the long pole in the tent. In fact, for a sample size of 50,000 values (50 groups), 91% of the execution time is taken by this one call, 2.3 seconds for just 50 calls. Increasing the sample size to values above 1,000,000 and it's clear that this query will no longer even run in a reasonable amount of time. So what's going on here? Shouldn't sequence accesses be lightning fast?

As it turns out, our cts:element-values() call isn't doing exactly what one might initially think. Rather than returning an in-memory sequence of values, it actually returns a stream, which is loaded lazily as needed. This optimization limits memory use in the (common) situations where you don't need the entire sequence. In my case, though, it doesn't help. 50 sequence accesses are actually 50 stream accesses, each time streaming the results from the first item (grabbing items at the end of the stream takes longer).

In order to get around this issue, there's a handy function called xdmp:eager(), which avoid lazy evaluation. But there's another easy "trick" that will reliably ensure you're working with an in-memory sequence rather than a stream. Simply, drop into a "sub-flwor" statement to generate a sequence from the return value:

let $values := for $i in cts:element-values(xs:QName("sample:value")) return $i

Now, $values is no longer a stream, but rather an in-memory sequence. Accessing subsequences within it is now much faster, particularly values at the end of the sequence.

This made my day.

blogroll Blogroll