[MarkLogic Dev General] my link resolver is slow

Danny Sokolsky Danny.Sokolsky at marklogic.com
Wed Aug 8 11:48:17 PDT 2012


Hi Mike,

It is hard to answer a question like this in generalities.  But here are a few random ideas, probably most of which you have tried:

Have you tried profiling the query?  That can point you to hot spots fairly quickly.  I'm not totally sure what you mean by "the breakdown reported by the optimizer".  Do you mean in xdmp:plan?  xdmp:query-trace?  Something else?

What version of MarkLogic are you running (xdmp:version() )?

Are you using range index lookups to find the links (with a cts:query param, for example)?

When you say you are doing node replaces, do you mean you are writing each document multiple times?  That can get expensive, and it is often faster to create a new version of the document in memory and then write the document once.  There is a library to do in-memory node-replaces too if you don't feel like writing that yourself.

-Danny



-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Mike Sokolov
Sent: Wednesday, August 08, 2012 10:41 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] my link resolver is slow

I've written some code to resolve links in a batch process; the links 
can point to a number of different element/@id in any document, and we 
are trying to record the destination document uri with the link so it 
can be rendered quickly at run-time, and missing links won't be rendered 
at all.

Basically the process is: for each of some batch of documents, for each 
of its links, search for the matching document, and replace the link 
with an element having a uri attribute pointing to that document.

Overall, this process is running much slower than I had expected.  I've 
been examining the query using the profiler, and after doing some 
optimization of the searches, I find something a bit strange.  The 
breakdown reported by the optimizer doesn't seem to account for the 
total time.  It looks to me as if all the searches are completing fairly 
quickly, based on logging statements that indicate all the documents in 
the batch have been "processed", and then the query just seems to hang 
for a while before returning.  It seems to spend about 90% of the total 
time in this second stage.  My assumption is this time is spent 
performing the updates, committing, indexing, writing a journal file, or 
something like that.

My question is: should I expect this to be reflected in the optimizer?  
And is there some way I can figure out why it is taking so long, and 
what I can do about it?  Maybe inserting a node would be faster than 
replacing?  I've tried a tree-walk rather than lots of node-replaces, 
but that actually seemed quite a bit slower.

Thanks for any suggestions!

-- 
Michael Sokolov
Engineering Director
www.ifactory.com

_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list