[MarkLogic Dev General] XPath performance with attribute lookup [
was Re: [MarkLogic Dev General] ReIndexing takes too long ]
michael.blakeley at marklogic.com
Mon Mar 2 13:20:16 PST 2009
The best way to understand the performance of an absolute XPath
expression is to trace its evaluation:
//*[@id eq 'a1234']
In ErrorLog.txt I see:
2009-03-02 12:58:21.340 Info: Docs: line 2: Analyzing path:
collection()/descendant::*[@id eq "a1234"]
2009-03-02 12:58:21.340 Info: Docs: line 2: Step 1 is searchable:
2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 predicate 1 is
conditionally searchable: @id eq "a1234"
2009-03-02 12:58:21.340 Info: Docs: line 2: Step 2 is conditionally
searchable:descendant::*[@id eq "a1234"]
2009-03-02 12:58:21.340 Info: Docs: line 2: First step of path is
2009-03-02 12:58:21.340 Info: Docs: line 2: Gathering constraints.
2009-03-02 12:58:21.340 Info: Docs: line 2: Executing search.
2009-03-02 12:58:21.390 Info: Docs: line 2: Selected 30017 fragments to
The crucial step is "Gathering constrains": none were found. So the
server has to scan all the fragments in the database (in my case, 30,017
of them) to find the matching fragments (0, in my case). This could
drive a fair amount of disk I/O.
This happens because the server indexes attributes as
element-attribute-value tuples. For best performance we should query
that way, too. In most situations it is straightforward to enumerate all
the possible parent elements:
//(a|b|c|d)[@id eq 'a1234']
Now the query trace shows the constraints being used, and performance is
2009-03-02 13:00:41.174 Info: Docs: line 2: Comparison contributed hash
value constraint: a/@id = "a1234"
2009-03-02 13:00:41.181 Info: Docs: line 2: Comparison contributed hash
value constraint: d/@id = "a1234"
ps - It's generally considered polite to start a new thread for a new
subject, or at least change the subject line
On 2009-03-02 11:26, Paul Vanderveen wrote:
> I have an XPath/XQuery to find links in a set of documents that looks like
> I am searching on somewhere around 15,000 documents, and this seems to take
> several seconds to execute. I was wondering if there is a way to index the
> ID attribute so that this can be accomplished much faster, or if there is a
> better way to find the element that matches a specified ID.
> Paul Vanderveen
> General mailing list
> General at developer.marklogic.com
More information about the General