[MarkLogic Dev General] Result set size and performance

Jason Hunter jhunter at marklogic.com
Wed Jun 20 16:42:23 PDT 2007


Patrick Force wrote:
> We are attempting to deal with issues in result set sizes, timeouts, and 
> performance in general.  I believe because our particular use of 
> MarkLogic doesn't exactly follow the norm, we have been plagued by some 
> recurring roadblocks.
> 
> A common issue for our query result sets (and possibly documents) have 
> been that they are too large for MarkLogic to handle in a single 
> return.  A basic XQuery example to illustrate:
> 
> for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
> return $policy

I expect (haven't tested of course) that this will work better if you 
write it the shorter way:

collection('/c/pxquote/policies/active')/InsurancePolicy

The server doesn't do a lot of automated query rewriting yet.  In most 
cases the shorter (and arguably more elegant) way is the way that's fastest.

> That query will max out the expanded tree cache (We've maxed out the 
> tree cache setting in the admin), which is a common occurrence, and to 
> the best of my knowledge is due to the return size.  To give you an 
> idea, each document being returned by $policy is about 116 KB or so. 
>  So, we try to limit the result to just the root node attributes:
> 
> for $policy in collection('/c/pxquote/policies/active')/InsurancePolicy
> return <InsurancePolicy> {$policy/@*} </InsurancePolicy>

I expect (again, haven't tested) that this will work better if you write 
it like this:

for $policy in
  data(collection('/c/pxquote/policies/active')/InsurancePolicy/@*)
return <InsurancePolicy> {$policy} </InsurancePolicy>

By doing the data() you allow the server to convert the attribute to a 
scalar.

Hopefully this gets you started down the right path.  My plane's about 
to take off, so I need to send and run.  Others may be able to help you 
with your later queries, although some of them seemed to be workarounds 
that (I think) the above should help you avoid.

I think Mike's suggestion to take a step back is a good one.  There's 
often more than one way to attack a problem -- some good, some bad, and 
some great.  The rewrites above will I think allow your query to run 
without consuming so much memory but they will still not be indexed 
optimized because you're doing a huge amount of iteration.  Things like 
lexicons may be able to do more for you.  But what's best depends on 
what you're trying to *really do*.

-jh-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4506 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20070620/f5a8983f/smime-0001.bin


More information about the General mailing list