Facets

If you want to know how many mails have a PowerPoint attachment that's pretty easy to write:

But what if you want to know what extensions are out there, the full list that people have sent? Users often like to see this in search results because it lets them see "facets" of their results and drill in. Someone unfamiliar with MarkLogic might write this:

It's perfectly valid code, but it operates through brute force by loading document after document, similar to count(), and it won't finish within the 10 second window. It might also return an XDMP-EXPANDEDTREECACHEFULL error which tells you memory has filled up while running the request. The right solution isn't to grow your memory sizes, it's to take a wholly different index-based approach:

This uses a MarkLogic extension function. It's a bit longer to type, but it returns in about 60 milliseconds. It uses what's called an element-attribute range index to extract the values without touching disk (an index I've configured already for you). You specify the element name and attribute name (as XML QNames which stands for "qualified names" which means a name plus optional namespace prefix) and it returns the distinct values.

You can also request frequency counts and/or ask for results in frequency order:

The optional third argument given here specifies the starting position. We pass "" to indicate we want to start at the beginning. The fourth argument controls the execution options. We pass "frequency-order" so more frequent items are returned first. The cts:frequency() call at the end reports the number of documents having the given $item.

There's an optional fifth argument also, not yet shown, which lets you limit the returned values based on documents matching a particular query constraint. This is how MarkMail manages the bottom left corner of its search results page for any query you type. More on that in the next section.

Constraints

Text Search

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.