[MarkLogic Dev General] Advice on improving "join" on attribute performance

Mike Sokolov sokolov at ifactory.com
Fri Mar 16 06:16:42 PDT 2012


Nick - you can also create range indexes explicitly in MarkLogic, and 
these will really help with the performance of joins, just as they do in 
eXist.

-Mike

On 03/16/2012 08:26 AM, Nick Tuckett wrote:
> I'm evaluating MarkLogic as a possible way to store and access around 
> 25Mb (and growing) of fairly complex XML data. For one particular type 
> of common query for my application, I'm seeing drastically different 
> performance between MarkLogic and eXist.  I would be very grateful for 
> any feedback or advice on how to improve this performance
>
> One common feature of this data are attributes containing identifying 
> values that reference other elements in the collection - an example of 
> this is for referencing localised text from a common XML file. I have 
> been using a fairly simple query to benchmark performance that looks 
> like this:
>
> for $x in 
> collection('/db/content')//(elementa|elementb|elementc|elementd|elemente|elementf)
> let $e := doc('/db/language/lang_en.xml')//text[@id=$x/@localisedtextid]
> order by $x/@id
> return
> <localisedtext quest="{$e}"/>
>
> The benchmark content has around 2500 instances for this particular 
> case. With everything else constant (hardware, OS, content) I see 
> drastically different performance between MarkLogic and eXist. The 
> former takes around 59 seconds to return the data for all instances, 
> the latter takes 8 seconds.
>
> As I understand it, MarkLogic sets up indexing automatically, 
> including indexing on element-attribute pairs. To match this, I 
> created an explicit equivalent index for eXist for the text/@id pair 
> for use in this case.
>
> For MarkLogic, running the query with the profiler showed that around 
> 75% of the execution time went on '@id = $x/@localisedtextid', and 
> query tracing produced the following output:
>
> Initial part of query:
>
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: 
> xdmp:eval("xdmp:query-trace(true()),&#10;for $x in 
> collection('/db/content...", (), <options 
> xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Analyzing path 
> for $x: 
> fn:collection("/db/content")/descendant-or-self::node()/(elementa|elementb|elementc|elementd|elemente|elementf)
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 is 
> searchable: fn:collection("/db/content")
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 2 does not 
> use indexes: descendant-or-self::node()
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 3 is 
> searchable: (elementa|elementb|elementc|elementd|elemente|elementf)
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Path is fully 
> searchable.
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Gathering 
> constraints.
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 
> contributed 1 constraint: fn:collection("/db/ content")
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:48: Step 3 
> contributed 1 constraint: elementa
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:62: Step 3 
> contributed 1 constraint: elementb
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:81: Step 3 
> contributed 1 constraint: elementc
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:110: Step 3 
> contributed 1 constraint: elementd
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:122: Step 3 
> contributed 1 constraint: elemente
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:132: Step 3 
> contributed 1 constraint: elementf
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Executing search.
> 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Selected 8 
> fragments to filter.
>
> Iterated part of query (repeat N times...)
>
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: 
> xdmp:eval("xdmp:query-trace(true()),&#10;for $x in 
> collection('/db/content...", (), <options 
> xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Analyzing path: 
> fn:doc("/db/language/lang_en.xml")/descendant::text[@id = 
> xs:untypedAtomic("ElementName143001")]
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 1 is 
> searchable: fn:doc("/db/language/lang_en.xml")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 is 
> searchable: descendant::text[@id = xs:untypedAtomic("ElementName143001")]
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Path is fully 
> searchable.
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Gathering 
> constraints.
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:10: Step 1 
> contributed 1 constraint: fn:doc("/db/language/lang_en.xml")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison 
> contributed hash value constraint: text/@id = 
> xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 
> 1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison 
> contributed hash value constraint: text/@id = 
> xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 
> 1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 
> contributed 2 constraints: descendant::text[@id = 
> xs:untypedAtomic("ElementName143001")]
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Executing search.
> 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Selected 1 
> fragment to filter
>
> Query meters:
>
> <qm:query-meters 
> xsi:schemaLocation="http://marklogic.com/xdmp/query-meters 
> query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters" 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> <qm:elapsed-time>PT56.655659S</qm:elapsed-time>
> <qm:requests>0</qm:requests>
> <qm:list-cache-hits>1043</qm:list-cache-hits>
> <qm:list-cache-misses>0</qm:list-cache-misses>
> <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
> <qm:expanded-tree-cache-hits>519</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
> <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
> <qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
> <qm:value-cache-hits>6672643</qm:value-cache-hits>
> <qm:value-cache-misses>6673683</qm:value-cache-misses>
> <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
> <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
> <qm:link-cache-hits>0</qm:link-cache-hits>
> <qm:link-cache-misses>0</qm:link-cache-misses>
> <qm:filter-hits>0</qm:filter-hits>
> <qm:filter-misses>0</qm:filter-misses>
> <qm:fragments-added>0</qm:fragments-added>
> <qm:fragments-deleted>0</qm:fragments-deleted>
> <qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
> <qm:fs-program-cache-misses>1</qm:fs-program-cache-misses>
> <qm:db-program-cache-hits>0</qm:db-program-cache-hits>
> <qm:db-program-cache-misses>0</qm:db-program-cache-misses>
> <qm:env-program-cache-hits>0</qm:env-program-cache-hits>
> <qm:env-program-cache-misses>0</qm:env-program-cache-misses>
> <qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cache-hits>
> <qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-cache-misses>
> <qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cache-hits>
> <qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-cache-misses>
> <qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
> <qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
> <qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
> <qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
> <qm:fragments>
> <qm:fragment>
> <qm:root>contents</qm:root>
> <qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:fragment>
> <qm:fragment>
> <qm:root>database</qm:root>
> <qm:expanded-tree-cache-hits>8</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:fragment>
> </qm:fragments>
> <qm:documents>
> <qm:document>
> <qm:uri>/db/content/file1.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file2.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/language/lang_en.xml</qm:uri>
> <qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file3.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file4.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file5.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file6.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file7.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> <qm:document>
> <qm:uri>/db/content/file8.xml</qm:uri>
> <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> </qm:document>
> </qm:documents>
> <qm:hosts/>
> </qm:query-meters>
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120316/b7cddd95/attachment-0001.html 


More information about the General mailing list