Data Platform

ProgressBlogs Learning with xdmp:query-trace()

Learning with xdmp:query-trace()

by Evan Lenz

Posted on July 20, 2011 0 Comments

Early on my MarkLogic learning path, one tool I found useful to understanding how MarkLogic evaluates queries is the xdmp:query-trace() function, as it has helped me understand why a query runs fast or slow. As an example, in my last post on Good XML design and performance, I claimed that the following query would run fast, by leveraging MarkLogic’s Universal Index:

//group[@type eq 'widget']

It sure seems like it should be fast, and based on what I had read about MarkLogic internals, it certainly sounds like it would. But just to be extra paranoid, I ran a test in Query Console. The first step was to generate some sample data, using the following query:

for $n in (1 to 300) return
xdmp:document-insert(concat("/group",$n,".xml"),
  document {
    let $pos := ($n mod 3) + 1
    let $type := ("widget","person","place")[$pos] return
    <group type="{$type}">stuff</group>
  }
)

A third of the documents will contain a <group> with type=”widget”, a third with type=”person”, and a third with type=”place”. After loading the documents, I ran my test query in conjunction with xdmp:query-trace():

xdmp:query-trace(true()),
//group[@type eq 'widget']

Passing true() to xdmp:query-trace() tells the server to output information to the error log about how it plans to run any searchable expressions it encounters in the following code, specifically what constraints are used and how many fragments are selected from the index for filtering. What I wanted to make sure is that MarkLogic would retrieve only those documents that I was interested in. If it selected 300 fragments (all the docs I loaded), that means it would have to look in each one before filtering out two-thirds of them (the ones whose @type value is something other than “widget”). Instead, the number I wanted to see was 100 (just the “widget” ones). Looking in the error log, this is what I saw (not including the timestamp and line number info):

Analyzing path: fn:collection()/descendant::group[@type eq "widget"]
Step 1 is searchable: fn:collection()
Step 2 is searchable: descendant::group[@type eq "widget"]
Path is fully searchable.
Gathering constraints.
Comparison contributed hash value constraint: group/@type = "widget"
Step 2 predicate 1 contributed 1 constraint: @type eq "widget"
Comparison contributed hash value constraint: group/@type = "widget"
Step 2 predicate 1 contributed 1 constraint: @type eq "widget"
Step 2 contributed 2 constraints: descendant::group[@type eq "widget"]
Executing search.
Selected 100 fragments to filter

Fortunately, I could tell from the output that the index magic was indeed doing its job, since it only selected 100 fragments (documents), i.e. the ones that contain “widget”. And I could see that the XPath predicate, @type eq 'widget', is successfully interpreted as a constraint that can be resolved from the index. Yay! I can write with confidence!

Being paranoid, I wanted to do another test so I used the following query to generate some sample data (very similar to the above one):

for $n in (1 to 300) return
xdmp:document-insert(concat("/logfile",$n,".xml"),
  document {
    let $pos := ($n mod 3) + 1
    let $host := concat("host",$pos) return
    <logfile host="{$host}"/>
  }
)

Here’s the test query:

xdmp:query-trace(true()),
//logfile[@host eq 'host1']

And here’s the line I saw (and was hoping to see) at the end of the Error Log:

Selected 100 fragments to filter

Because of the small data set, the two examples here are fast regardless of what constraint is used (resolvable from the index or not). But when I’m dealing with millions of documents, I want to make sure that I’m effectively using the index. Using a small test data set with xdmp:query-trace() is one way to find out whether the index is being leveraged effectively, and thus whether my queries will scale.

Experimenting with xdmp:query-trace() (and the related xdmp:plan() function) are great ways to learn from the “bottom up”. For “top-down” learning, I highly recommend Jason Hunter’s paper “Inside MarkLogic”.

What about you? What functions or tools have you found helpful for learning MarkLogic? Feel free to comment below.

MarkLogic

Evan Lenz

View all posts from Evan Lenz on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments

Comments are disabled in preview mode.

Topics

More From Progress

Shadow Analytics: Why You Can’t Afford to Leave It Unchecked

Then, Now and Beyond: The Future of Back Office Software

2022 Progress Data Connectivity Report

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Country/Territory

Blog

MarkLogic

Semaphore

OpenEdge

DataDirect

Sitefinity

Telerik

Kendo UI

Corticon

DataDirect

MOVEit

Chef

Flowmon

Kemp LoadMaster

WhatsUp Gold

Telerik

Kendo UI

Fiddler

Test Studio

MOVEit

WS_FTP

Learning with xdmp:query-trace()

Evan Lenz

Comments

Topics

Sitefinity Training and Certification Now Available.

More From Progress

Latest Stories in Your Inbox

Latest Stories
in Your Inbox