[MarkLogic Dev General] my link resolver is slow

Ron Hitchens ron at ronsoft.com
Wed Aug 8 14:02:26 PDT 2012


   If you're looking at lots of documents, but only making
updates to one or a few, you may be incurring a lot of unnecessary
locking.  Even if there is no contention for the locked documents,
it takes resources to acquire, track and release them, especially
in a cluster.

   If you can structure your code so that it runs mostly as a
read-only query (if xdmp:timestamp returns an empty sequence,
you're an update even if you never make any actual changes) it
will run lock-free.  You can use xdmp:eval or xdmp:invoke to do
the actual updates in a separate transaction, which will only
need to lock the document(s) that it's updating.

@Geert: Yes, updates within a transaction are cumulative.  That's
why you will get a conflicting update exception if you change the 
same node more than once.

On Aug 8, 2012, at 9:06 PM, Geert Josten wrote:

> Hi Mike,
> 
> Another 2 cents: are you running a lot of parallel batches? Could they be
> interfering with each other, for instance through directory locking?
> 
> @Danny: aren't node replaces within a single request cumulated
> automatically?
> 
> Kind regards,
> Geert
> 
>> -----Oorspronkelijk bericht-----
>> Van: general-bounces at developer.marklogic.com [mailto:general-
>> bounces at developer.marklogic.com] Namens Danny Sokolsky
>> Verzonden: woensdag 8 augustus 2012 20:48
>> Aan: MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] my link resolver is slow
>> 
>> Hi Mike,
>> 
>> It is hard to answer a question like this in generalities.  But here are
> a few
>> random ideas, probably most of which you have tried:
>> 
>> Have you tried profiling the query?  That can point you to hot spots
> fairly quickly.
>> I'm not totally sure what you mean by "the breakdown reported by the
>> optimizer".  Do you mean in xdmp:plan?  xdmp:query-trace?  Something
> else?
>> 
>> What version of MarkLogic are you running (xdmp:version() )?
>> 
>> Are you using range index lookups to find the links (with a cts:query
> param, for
>> example)?
>> 
>> When you say you are doing node replaces, do you mean you are writing
> each
>> document multiple times?  That can get expensive, and it is often faster
> to
>> create a new version of the document in memory and then write the
> document
>> once.  There is a library to do in-memory node-replaces too if you don't
> feel like
>> writing that yourself.
>> 
>> -Danny
>> 
>> 
>> 
>> -----Original Message-----
>> From: general-bounces at developer.marklogic.com [mailto:general-
>> bounces at developer.marklogic.com] On Behalf Of Mike Sokolov
>> Sent: Wednesday, August 08, 2012 10:41 AM
>> To: MarkLogic Developer Discussion
>> Subject: [MarkLogic Dev General] my link resolver is slow
>> 
>> I've written some code to resolve links in a batch process; the links
>> can point to a number of different element/@id in any document, and we
>> are trying to record the destination document uri with the link so it
>> can be rendered quickly at run-time, and missing links won't be rendered
>> at all.
>> 
>> Basically the process is: for each of some batch of documents, for each
>> of its links, search for the matching document, and replace the link
>> with an element having a uri attribute pointing to that document.
>> 
>> Overall, this process is running much slower than I had expected.  I've
>> been examining the query using the profiler, and after doing some
>> optimization of the searches, I find something a bit strange.  The
>> breakdown reported by the optimizer doesn't seem to account for the
>> total time.  It looks to me as if all the searches are completing fairly
>> quickly, based on logging statements that indicate all the documents in
>> the batch have been "processed", and then the query just seems to hang
>> for a while before returning.  It seems to spend about 90% of the total
>> time in this second stage.  My assumption is this time is spent
>> performing the updates, committing, indexing, writing a journal file, or
>> something like that.
>> 
>> My question is: should I expect this to be reflected in the optimizer?
>> And is there some way I can figure out why it is taking so long, and
>> what I can do about it?  Maybe inserting a node would be faster than
>> replacing?  I've tried a tree-walk rather than lots of node-replaces,
>> but that actually seemed quite a bit slower.
>> 
>> Thanks for any suggestions!
>> 
>> --
>> Michael Sokolov
>> Engineering Director
>> www.ifactory.com
>> 
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

---
Ron Hitchens {mailto:ron at ronsoft.com}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown






More information about the General mailing list