[MarkLogic Dev General] Result set size and performance
Michael Blakeley
michael.blakeley at marklogic.com
Wed Jun 20 16:18:30 PDT 2007
Patrick,
I'd suggest that you step back from the query, and tell us about the
task you are trying to solve. Perhaps a different query will be more
suitable to the task.
-- Mike
Patrick Force wrote:
> We are attempting to deal with issues in result set sizes, timeouts, and
> performance in general. I believe because our particular use of MarkLogic
> doesn't exactly follow the norm, we have been plagued by some recurring
> roadblocks.
>
> A common issue for our query result sets (and possibly documents) have been
> that they are too large for MarkLogic to handle in a single
> return. A basic XQuery example to illustrate:
>
> for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
> return $policy
>
> That query will max out the expanded tree cache (We've maxed out the tree
> cache setting in the admin), which is a common occurrence, and to the best
> of my knowledge is due to the return size. To give you an idea, each
> document being returned by $policy is about 116 KB or so. So, we try to
> limit the result to just the root node attributes:
>
> for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
> return <InsurancePolicy> {$policy/@*} </InsurancePolicy>
>
> Which takes quite a while or usually times out (50,000 or so in that result
> set currently), and actually isn't close enough to what we actually need to
> do. In reality, we need the result set to also return some things w/in the
> InsurancePolicy document, not just at the root node attribute level. So,
> we've been able to utilize document properties to limit the amount of data
> we have to travel over:
>
> let $properties := xdmp:collection-properties('/c/pxquote/policies/active')
> for $property in $properties
> return $property
>
> A single document property result is about 4KB. For 50,000 records, the
> expanded tree cache fills up on that call. So for certain needs, like
> updates on larger result sets, we've used index access and outside code to
> increment and run result sets in batches like (Currently run in
> Coldfusion/Java connection to MarkLogic):
>
> let $properties :=
> (xdmp:collection-properties('/c/pxquote/policies/active'))[1 to 100]
> for $property in $properties
> return xdmp:node-replace($property//attributes,
> <attributes>newvalues</attribute>)
>
> Delay and perform a page refresh w/ URL parameters for instructions on where
> the new index begins and where it ends like ?begin=101&end=200, and then
> run:
>
> let $properties :=
> (xdmp:collection-properties('/c/pxquote/policies/active'))[101 to 200]
> for $property in $properties
> return xdmp:node-replace($property//attributes,
> <attributes>newvalues</attribute>)
>
> We've battled with all of this for quite some time now, finding small
> answers along the way, but none of it seems to address our long-term need,
> since most of the acceptable answers we've discovered above don't really buy
> us much as result set sizes increase.
>
> Basically we're looking for tricks, methods or built-in MarkLogic
> functionality that can help us deal with our larger result sizes. If this
> means returning results in batches, actual combined function calls in
> MarkLogic that could help, whatever might guide us in the right direction
> from our current understanding. Thanks and any ideas or suggestions would
> be greatly appreciated.
>
> Patrick
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4532 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20070620/627fc77f/smime.bin
More information about the General
mailing list