[MarkLogic Dev General] Advice on improving "join" on attribute performance

Nick Tuckett nick.tuckett at playfish.com
Fri Mar 16 05:26:01 PDT 2012


I'm evaluating MarkLogic as a possible way to store and access around 25Mb
(and growing) of fairly complex XML data. For one particular type of common
query for my application, I'm seeing drastically different performance
between MarkLogic and eXist.  I would be very grateful for any feedback or
advice on how to improve this performance

One common feature of this data are attributes containing identifying
values that reference other elements in the collection - an example of this
is for referencing localised text from a common XML file. I have been using
a fairly simple query to benchmark performance that looks like this:

for $x in
collection('/db/content')//(elementa|elementb|elementc|elementd|elemente|elementf)
let $e := doc('/db/language/lang_en.xml')//text[@id=$x/@localisedtextid]
order by $x/@id
return
<localisedtext quest="{$e}"/>

The benchmark content has around 2500 instances for this particular case.
With everything else constant (hardware, OS, content) I see drastically
different performance between MarkLogic and eXist. The former takes around
59 seconds to return the data for all instances, the latter takes 8 seconds.

As I understand it, MarkLogic sets up indexing automatically, including
indexing on element-attribute pairs. To match this, I created an explicit
equivalent index for eXist for the text/@id pair for use in this case.

For MarkLogic, running the query with the profiler showed that around 75%
of the execution time went on '@id = $x/@localisedtextid', and query
tracing produced the following output:

Initial part of query:

2012-03-16 11:35:35.743 Info: App-Services: at 2:10:
xdmp:eval("xdmp:query-trace(true()),&#10;for $x in
collection('/db/content...", (), <options
xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Analyzing path for $x:
fn:collection("/db/content")/descendant-or-self::node()/(elementa|elementb|elementc|elementd|elemente|elementf)
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 is searchable:
fn:collection("/db/content")
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 2 does not use
indexes: descendant-or-self::node()
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 3 is
searchable: (elementa|elementb|elementc|elementd|elemente|elementf)
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Path is fully
searchable.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Gathering constraints.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 contributed 1
constraint: fn:collection("/db/ content")
2012-03-16 11:35:35.743 Info: App-Services: at 2:48: Step 3 contributed 1
constraint: elementa
2012-03-16 11:35:35.743 Info: App-Services: at 2:62: Step 3 contributed 1
constraint: elementb
2012-03-16 11:35:35.743 Info: App-Services: at 2:81: Step 3 contributed 1
constraint: elementc
2012-03-16 11:35:35.743 Info: App-Services: at 2:110: Step 3 contributed 1
constraint: elementd
2012-03-16 11:35:35.743 Info: App-Services: at 2:122: Step 3 contributed 1
constraint: elemente
2012-03-16 11:35:35.743 Info: App-Services: at 2:132: Step 3 contributed 1
constraint: elementf
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Executing search.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Selected 8 fragments
to filter.

Iterated part of query (repeat N times...)

2012-03-16 11:35:35.743 Info: App-Services: at 3:48:
xdmp:eval("xdmp:query-trace(true()),&#10;for $x in
collection('/db/content...", (), <options
xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Analyzing path:
fn:doc("/db/language/lang_en.xml")/descendant::text[@id =
xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 1 is searchable:
fn:doc("/db/language/lang_en.xml")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 is searchable:
descendant::text[@id = xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Path is fully
searchable.
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Gathering constraints.
2012-03-16 11:35:35.743 Info: App-Services: at 3:10: Step 1 contributed 1
constraint: fn:doc("/db/language/lang_en.xml")
2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison contributed
hash value constraint: text/@id = xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 1
contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison contributed
hash value constraint: text/@id = xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 1
contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 contributed 2
constraints: descendant::text[@id = xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Executing search.
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Selected 1 fragment to
filter

Query meters:

  <qm:query-meters xsi:schemaLocation="
http://marklogic.com/xdmp/query-meters query-meters.xsd" xmlns:qm="
http://marklogic.com/xdmp/query-meters" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance">
    <qm:elapsed-time>PT56.655659S</qm:elapsed-time>
    <qm:requests>0</qm:requests>
    <qm:list-cache-hits>1043</qm:list-cache-hits>
    <qm:list-cache-misses>0</qm:list-cache-misses>
    <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
    <qm:expanded-tree-cache-hits>519</qm:expanded-tree-cache-hits>
    <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
    <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
    <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
    <qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
    <qm:value-cache-hits>6672643</qm:value-cache-hits>
    <qm:value-cache-misses>6673683</qm:value-cache-misses>
    <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
    <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
    <qm:link-cache-hits>0</qm:link-cache-hits>
    <qm:link-cache-misses>0</qm:link-cache-misses>
    <qm:filter-hits>0</qm:filter-hits>
    <qm:filter-misses>0</qm:filter-misses>
    <qm:fragments-added>0</qm:fragments-added>
    <qm:fragments-deleted>0</qm:fragments-deleted>
    <qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
    <qm:fs-program-cache-misses>1</qm:fs-program-cache-misses>
    <qm:db-program-cache-hits>0</qm:db-program-cache-hits>
    <qm:db-program-cache-misses>0</qm:db-program-cache-misses>
    <qm:env-program-cache-hits>0</qm:env-program-cache-hits>
    <qm:env-program-cache-misses>0</qm:env-program-cache-misses>

<qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cache-hits>

<qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-cache-misses>

<qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cache-hits>

<qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-cache-misses>
    <qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
    <qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
    <qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
    <qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
    <qm:fragments>
      <qm:fragment>
<qm:root>contents</qm:root>
<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:fragment>
      <qm:fragment>
<qm:root>database</qm:root>
<qm:expanded-tree-cache-hits>8</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:fragment>
    </qm:fragments>
    <qm:documents>
      <qm:document>
<qm:uri>/db/content/file1.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file2.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/language/lang_en.xml</qm:uri>
<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file3.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file4.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file5.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file6.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file7.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
      <qm:document>
<qm:uri>/db/content/file8.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
      </qm:document>
    </qm:documents>
    <qm:hosts/>
  </qm:query-meters>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120316/0f4f4c3d/attachment-0001.html 


More information about the General mailing list