[MarkLogic Dev General] Result set size and performance

Michael Blakeley michael.blakeley at marklogic.com
Wed Jun 20 16:18:30 PDT 2007


Patrick,

I'd suggest that you step back from the query, and tell us about the 
task you are trying to solve. Perhaps a different query will be more 
suitable to the task.

-- Mike

Patrick Force wrote:
> We are attempting to deal with issues in result set sizes, timeouts, and
> performance in general.  I believe because our particular use of MarkLogic
> doesn't exactly follow the norm, we have been plagued by some recurring
> roadblocks.
> 
> A common issue for our query result sets (and possibly documents) have been
> that they are too large for MarkLogic to handle in a single
> return.  A basic XQuery example to illustrate:
> 
> for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
> return $policy
> 
> That query will max out the expanded tree cache (We've maxed out the tree
> cache setting in the admin), which is a common occurrence, and to the best
> of my knowledge is due to the return size.  To give you an idea, each
> document being returned by $policy is about 116 KB or so.  So, we try to
> limit the result to just the root node attributes:
> 
> for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
> return <InsurancePolicy> {$policy/@*} </InsurancePolicy>
> 
> Which takes quite a while or usually times out (50,000 or so in that result
> set currently), and actually isn't close enough to what we actually need to
> do.  In reality, we need the result set to also return some things w/in the
> InsurancePolicy document, not just at the root node attribute level.  So,
> we've been able to utilize document properties to limit the amount of data
> we have to travel over:
> 
> let $properties := xdmp:collection-properties('/c/pxquote/policies/active')
> for $property in $properties
> return $property
> 
> A single document property result is about 4KB.  For 50,000 records, the
> expanded tree cache fills up on that call.  So for certain needs, like
> updates on larger result sets, we've used index access and outside code to
> increment and run result sets in batches like (Currently run in
> Coldfusion/Java connection to MarkLogic):
> 
> let $properties :=
> (xdmp:collection-properties('/c/pxquote/policies/active'))[1 to 100]
> for $property in $properties
> return xdmp:node-replace($property//attributes,
> <attributes>newvalues</attribute>)
> 
> Delay and perform a page refresh w/ URL parameters for instructions on where
> the new index begins and where it ends like ?begin=101&end=200, and then
> run:
> 
> let $properties :=
> (xdmp:collection-properties('/c/pxquote/policies/active'))[101 to 200]
> for $property in $properties
> return xdmp:node-replace($property//attributes,
> <attributes>newvalues</attribute>)
> 
> We've battled with all of this for quite some time now, finding small
> answers along the way, but none of it seems to address our long-term need,
> since most of the acceptable answers we've discovered above don't really buy
> us much as result set sizes increase.
> 
> Basically we're looking for tricks, methods or built-in MarkLogic
> functionality that can help us deal with our larger result sizes.  If this
> means returning results in batches, actual combined function calls in
> MarkLogic that could help, whatever might guide us in the right direction
> from our current understanding.  Thanks and any ideas or suggestions would
> be greatly appreciated.
> 
> Patrick
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4532 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20070620/627fc77f/smime.bin


More information about the General mailing list