Facets
If you want to know how many mails have a PowerPoint attachment that's pretty easy to write:
But what if you want to know what extensions are out there, the full list that people have sent? Users often like to see this in search results because it lets them see "facets" of their results and drill in. Someone unfamiliar with MarkLogic might write this:
It's perfectly valid code, but it operates through brute force by loading
document after document, similar to count()
, and it won't finish within the 10
second window. It might also return an XDMP-EXPANDEDTREECACHEFULL
error which
tells you memory has filled up while running the request. The right solution
isn't to grow your memory sizes, it's to take a wholly different index-based
approach:
This uses a MarkLogic extension function. It's a bit longer to type, but it returns in about 60 milliseconds. It uses what's called an element-attribute range index to extract the values without touching disk (an index I've configured already for you). You specify the element name and attribute name (as XML QNames which stands for "qualified names" which means a name plus optional namespace prefix) and it returns the distinct values.
You can also request frequency counts and/or ask for results in frequency order:
The optional third argument given here specifies the starting position. We
pass ""
to indicate we want to start at the beginning. The fourth argument
controls the execution options. We pass "frequency-order"
so more frequent
items are returned first. The cts:frequency()
call at the end reports the
number of documents having the given $item
.
There's an optional fifth argument also, not yet shown, which lets you limit the returned values based on documents matching a particular query constraint. This is how MarkMail manages the bottom left corner of its search results page for any query you type. More on that in the next section.
Constraints
Text Search
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.