[MarkLogic Dev General] Advice on improving "join" on attribute performance

Michael Blakeley mike at blakeley.com
Fri Mar 16 06:23:33 PDT 2012


XPath requires evaluation of every predicate for every context item. If you are spending too much time in a predicate, refactor to remove the constant terms.

for $x in collection('/db/content')//(elementa|elementb|elementc|elementd|elemente|elementf)
let $xid := $x/@localisedtextid/string()
let $e := doc('/db/language/lang_en.xml')//text[@id=$xid]
order by $x/@id
return <localisedtext quest="{$e}"/>

As Mike mentioned, a range index might help with the "order by" portion. Note that I did not use $xid for the order-by, because that might interfere with range index utilization.

Also, it's best to avoid '//' when possible, and instead state the paths explicitly.

-- Mike

On 16 Mar 2012, at 12:26 , Nick Tuckett wrote:

> I'm evaluating MarkLogic as a possible way to store and access around 25Mb (and growing) of fairly complex XML data. For one particular type of common query for my application, I'm seeing drastically different performance between MarkLogic and eXist.  I would be very grateful for any feedback or advice on how to improve this performance
> 
> One common feature of this data are attributes containing identifying values that reference other elements in the collection - an example of this is for referencing localised text from a common XML file. I have been using a fairly simple query to benchmark performance that looks like this:
> 
> for $x in collection('/db/content')//(elementa|elementb|elementc|elementd|elemente|elementf)
> let $e := doc('/db/language/lang_en.xml')//text[@id=$x/@localisedtextid]
> order by $x/@id
> return 
> <localisedtext quest="{$e}"/> 
> 
> The benchmark content has around 2500 instances for this particular case. With everything else constant (hardware, OS, content) I see drastically different performance between MarkLogic and eXist. The former takes around 59 seconds to return the data for all instances, the latter takes 8 seconds.
> 
> As I understand it, MarkLogic sets up indexing automatically, including indexing on element-attribute pairs. To match this, I created an explicit equivalent index for eXist for the text/@id pair for use in this case.
> 
> For MarkLogic, running the query with the profiler showed that around 75% of the execution time went on '@id = $x/@localisedtextid', and query tracing produced the following output:
> 
> Initial part of query:
> 
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: xdmp:eval("xdmp:query-trace(true()),&#10;for $x in collection('/db/content...", (), <options xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Analyzing path for $x: fn:collection("/db/content")/descendant-or-self::node()/(elementa|elementb|elementc|elementd|elemente|elementf)
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 is searchable: fn:collection("/db/content")
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 2 does not use indexes: descendant-or-self::node()
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 3 is searchable: (elementa|elementb|elementc|elementd|elemente|elementf)
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Path is fully searchable.
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Gathering constraints.
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 contributed 1 constraint: fn:collection("/db/ content")
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:48: Step 3 contributed 1 constraint: elementa
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:62: Step 3 contributed 1 constraint: elementb
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:81: Step 3 contributed 1 constraint: elementc
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:110: Step 3 contributed 1 constraint: elementd
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:122: Step 3 contributed 1 constraint: elemente
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:132: Step 3 contributed 1 constraint: elementf
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Executing search.
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Selected 8 fragments to filter.
> 
> Iterated part of query (repeat N times...)
> 
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: xdmp:eval("xdmp:query-trace(true()),&#10;for $x in collection('/db/content...", (), <options xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Analyzing path: fn:doc("/db/language/lang_en.xml")/descendant::text[@id = xs:untypedAtomic("ElementName143001")]
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 1 is searchable: fn:doc("/db/language/lang_en.xml")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 is searchable: descendant::text[@id = xs:untypedAtomic("ElementName143001")]
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Path is fully searchable.
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Gathering constraints.
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:10: Step 1 contributed 1 constraint: fn:doc("/db/language/lang_en.xml")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison contributed hash value constraint: text/@id = xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison contributed hash value constraint: text/@id = xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 contributed 2 constraints: descendant::text[@id = xs:untypedAtomic("ElementName143001")]
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Executing search.
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Selected 1 fragment to filter
> 
> Query meters:
> 
>   <qm:query-meters xsi:schemaLocation="http://marklogic.com/xdmp/query-meters query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>     <qm:elapsed-time>PT56.655659S</qm:elapsed-time>
>     <qm:requests>0</qm:requests>
>     <qm:list-cache-hits>1043</qm:list-cache-hits>
>     <qm:list-cache-misses>0</qm:list-cache-misses>
>     <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
>     <qm:expanded-tree-cache-hits>519</qm:expanded-tree-cache-hits>
>     <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>     <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
>     <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
>     <qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
>     <qm:value-cache-hits>6672643</qm:value-cache-hits>
>     <qm:value-cache-misses>6673683</qm:value-cache-misses>
>     <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
>     <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
>     <qm:link-cache-hits>0</qm:link-cache-hits>
>     <qm:link-cache-misses>0</qm:link-cache-misses>
>     <qm:filter-hits>0</qm:filter-hits>
>     <qm:filter-misses>0</qm:filter-misses>
>     <qm:fragments-added>0</qm:fragments-added>
>     <qm:fragments-deleted>0</qm:fragments-deleted>
>     <qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
>     <qm:fs-program-cache-misses>1</qm:fs-program-cache-misses>
>     <qm:db-program-cache-hits>0</qm:db-program-cache-hits>
>     <qm:db-program-cache-misses>0</qm:db-program-cache-misses>
>     <qm:env-program-cache-hits>0</qm:env-program-cache-hits>
>     <qm:env-program-cache-misses>0</qm:env-program-cache-misses>
>     <qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cache-hits>
>     <qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-cache-misses>
>     <qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cache-hits>
>     <qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-cache-misses>
>     <qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
>     <qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
>     <qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
>     <qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
>     <qm:fragments>
>       <qm:fragment>
> 	<qm:root>contents</qm:root>
> 	<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:fragment>
>       <qm:fragment>
> 	<qm:root>database</qm:root>
> 	<qm:expanded-tree-cache-hits>8</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:fragment>
>     </qm:fragments>
>     <qm:documents>
>       <qm:document>
> 	<qm:uri>/db/content/file1.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file2.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/language/lang_en.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file3.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file4.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file5.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file6.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file7.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>       <qm:document>
> 	<qm:uri>/db/content/file8.xml</qm:uri>
> 	<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> 	<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
>       </qm:document>
>     </qm:documents>
>     <qm:hosts/>
>   </qm:query-meters>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list