[MarkLogic Dev General] Result set size and performance

Patrick Force patrickf at arc90.com
Wed Jun 20 15:55:45 PDT 2007


We are attempting to deal with issues in result set sizes, timeouts, and
performance in general.  I believe because our particular use of MarkLogic
doesn't exactly follow the norm, we have been plagued by some recurring
roadblocks.

A common issue for our query result sets (and possibly documents) have been
that they are too large for MarkLogic to handle in a single
return.  A basic XQuery example to illustrate:

for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
return $policy

That query will max out the expanded tree cache (We've maxed out the tree
cache setting in the admin), which is a common occurrence, and to the best
of my knowledge is due to the return size.  To give you an idea, each
document being returned by $policy is about 116 KB or so.  So, we try to
limit the result to just the root node attributes:

for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
return <InsurancePolicy> {$policy/@*} </InsurancePolicy>

Which takes quite a while or usually times out (50,000 or so in that result
set currently), and actually isn't close enough to what we actually need to
do.  In reality, we need the result set to also return some things w/in the
InsurancePolicy document, not just at the root node attribute level.  So,
we've been able to utilize document properties to limit the amount of data
we have to travel over:

let $properties := xdmp:collection-properties('/c/pxquote/policies/active')
for $property in $properties
return $property

A single document property result is about 4KB.  For 50,000 records, the
expanded tree cache fills up on that call.  So for certain needs, like
updates on larger result sets, we've used index access and outside code to
increment and run result sets in batches like (Currently run in
Coldfusion/Java connection to MarkLogic):

let $properties :=
(xdmp:collection-properties('/c/pxquote/policies/active'))[1 to 100]
for $property in $properties
return xdmp:node-replace($property//attributes,
<attributes>newvalues</attribute>)

Delay and perform a page refresh w/ URL parameters for instructions on where
the new index begins and where it ends like ?begin=101&end=200, and then
run:

let $properties :=
(xdmp:collection-properties('/c/pxquote/policies/active'))[101 to 200]
for $property in $properties
return xdmp:node-replace($property//attributes,
<attributes>newvalues</attribute>)

We've battled with all of this for quite some time now, finding small
answers along the way, but none of it seems to address our long-term need,
since most of the acceptable answers we've discovered above don't really buy
us much as result set sizes increase.

Basically we're looking for tricks, methods or built-in MarkLogic
functionality that can help us deal with our larger result sizes.  If this
means returning results in batches, actual combined function calls in
MarkLogic that could help, whatever might guide us in the right direction
from our current understanding.  Thanks and any ideas or suggestions would
be greatly appreciated.

Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20070620/710f90ac/attachment.html


More information about the General mailing list