Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

MarkLogic World 2019

Learn how to simplify data integration & build innovative applications. Join us in Washington D.C. May 14-15!

Find Out More

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Facets

If you want to know how many mails have a PowerPoint attachment that's pretty easy to write:

But what if you want to know what extensions are out there, the full list that people have sent? Users often like to see this in search results because it lets them see "facets" of their results and drill in. Someone unfamiliar with MarkLogic might write this:

It's perfectly valid code, but it operates through brute force by loading document after document, similar to count(), and it won't finish within the 10 second window. It might also return an XDMP-EXPANDEDTREECACHEFULL error which tells you memory has filled up while running the request. The right solution isn't to grow your memory sizes, it's to take a wholly different index-based approach:

This uses a MarkLogic extension function. It's a bit longer to type, but it returns in about 60 milliseconds. It uses what's called an element-attribute range index to extract the values without touching disk (an index I've configured already for you). You specify the element name and attribute name (as XML QNames which stands for "qualified names" which means a name plus optional namespace prefix) and it returns the distinct values.

You can also request frequency counts and/or ask for results in frequency order:

The optional third argument given here specifies the starting position. We pass "" to indicate we want to start at the beginning. The fourth argument controls the execution options. We pass "frequency-order" so more frequent items are returned first. The cts:frequency() call at the end reports the number of documents having the given $item.

There's an optional fifth argument also, not yet shown, which lets you limit the returned values based on documents matching a particular query constraint. This is how MarkMail manages the bottom left corner of its search results page for any query you type. More on that in the next section.

Constraints

Text Search

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

Comments

The commenting feature on this page is enabled by a third party. Comments posted to this page are publicly visible.