[MarkLogic Dev General] my link resolver is slow

Geert Josten geert.josten at dayon.nl
Wed Aug 8 13:06:14 PDT 2012


Hi Mike,

Another 2 cents: are you running a lot of parallel batches? Could they be
interfering with each other, for instance through directory locking?

@Danny: aren't node replaces within a single request cumulated
automatically?

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: general-bounces at developer.marklogic.com [mailto:general-
> bounces at developer.marklogic.com] Namens Danny Sokolsky
> Verzonden: woensdag 8 augustus 2012 20:48
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] my link resolver is slow
>
> Hi Mike,
>
> It is hard to answer a question like this in generalities.  But here are
a few
> random ideas, probably most of which you have tried:
>
> Have you tried profiling the query?  That can point you to hot spots
fairly quickly.
> I'm not totally sure what you mean by "the breakdown reported by the
> optimizer".  Do you mean in xdmp:plan?  xdmp:query-trace?  Something
else?
>
> What version of MarkLogic are you running (xdmp:version() )?
>
> Are you using range index lookups to find the links (with a cts:query
param, for
> example)?
>
> When you say you are doing node replaces, do you mean you are writing
each
> document multiple times?  That can get expensive, and it is often faster
to
> create a new version of the document in memory and then write the
document
> once.  There is a library to do in-memory node-replaces too if you don't
feel like
> writing that yourself.
>
> -Danny
>
>
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com [mailto:general-
> bounces at developer.marklogic.com] On Behalf Of Mike Sokolov
> Sent: Wednesday, August 08, 2012 10:41 AM
> To: MarkLogic Developer Discussion
> Subject: [MarkLogic Dev General] my link resolver is slow
>
> I've written some code to resolve links in a batch process; the links
> can point to a number of different element/@id in any document, and we
> are trying to record the destination document uri with the link so it
> can be rendered quickly at run-time, and missing links won't be rendered
> at all.
>
> Basically the process is: for each of some batch of documents, for each
> of its links, search for the matching document, and replace the link
> with an element having a uri attribute pointing to that document.
>
> Overall, this process is running much slower than I had expected.  I've
> been examining the query using the profiler, and after doing some
> optimization of the searches, I find something a bit strange.  The
> breakdown reported by the optimizer doesn't seem to account for the
> total time.  It looks to me as if all the searches are completing fairly
> quickly, based on logging statements that indicate all the documents in
> the batch have been "processed", and then the query just seems to hang
> for a while before returning.  It seems to spend about 90% of the total
> time in this second stage.  My assumption is this time is spent
> performing the updates, committing, indexing, writing a journal file, or
> something like that.
>
> My question is: should I expect this to be reflected in the optimizer?
> And is there some way I can figure out why it is taking so long, and
> what I can do about it?  Maybe inserting a node would be faster than
> replacing?  I've tried a tree-walk rather than lots of node-replaces,
> but that actually seemed quite a bit slower.
>
> Thanks for any suggestions!
>
> --
> Michael Sokolov
> Engineering Director
> www.ifactory.com
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list