[MarkLogic Dev General] my link resolver is slow

Geert Josten geert.josten at dayon.nl
Wed Aug 8 16:14:06 PDT 2012


Another two cents:

Keeping track where you have been and where not is not trivial. I
struggled with something similar too recently, though in your case link
targets could (in theory at least) disappear, not in mine. Do you take
that into account too? Means you may need to revisit existing links once
in a while to recheck them. In my case I could keep track with an extra
attribute (or something like that), you on the other hand may need to
store a timestamp for each link, and schedule recheck processes..

You do know you can pass in multiple element names into an
element-attribute-value-query, right? Just to be sure.. ;-P

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: Mike Sokolov [mailto:sokolov at ifactory.com]
> Verzonden: woensdag 8 augustus 2012 22:34
> Aan: MarkLogic Developer Discussion
> CC: Geert Josten
> Onderwerp: Re: [MarkLogic Dev General] my link resolver is slow
>
> Thanks -
>
> yes, I am pretty sure the node-replaces are accumulated and written all
> at once?  I did try creating an entire new document by walking the tree,
> and using element constructors, a typeswitch in a recursive function,
> and so on, and it was slower than using node-replace multiple times.
>
> After some testing, I think my measurements may have been confused and
> in fact the slow thing is just performing 100's of searches (one per
> link) for each document we update.  I am still confused by the profiler
> output (this is in CQ, using the Profile button), but I am now thinking
> I may just be misinterpreting its output based on some other tests.
>
> I am not running parallel batches, although I think I should be.  I have
> a separate issue preventing that at the moment, which is that I want to
> know which documents were updated so I can flush them from my front-end
> application's cache.  At the moment I rely on reading the query result
> to do that, but I think we may need to switch over to decouple that
> process so we can run these updates in parallel using the task server.
>
> We have some ideas about how to speed up the querying - possibly using a
> range index, although we are just doing element-attribute-value queries,
> so I wouldn't have thought that would help.  However, one test seemed to
> show that searching a *single* element/@id pair was quite a bit faster
> than searching the 10 pairs we have to search in order to match ids on
> all of the elements on which they occur (can't tell you how nice it
> would be to have an index on //*/@id!, sort of like the id() function is
> supposed to do).
>
> So I think we'll pursue speeding up the searches and running jobs in
> parallel.
>
> Thanks for your help!
>
> -Mike
>
> On 08/08/2012 04:06 PM, Geert Josten wrote:
> > Hi Mike,
> >
> > Another 2 cents: are you running a lot of parallel batches? Could they
be
> > interfering with each other, for instance through directory locking?
> >
> > @Danny: aren't node replaces within a single request cumulated
> > automatically?
> >
> > Kind regards,
> > Geert
> >
> >
> >> -----Oorspronkelijk bericht-----
> >> Van: general-bounces at developer.marklogic.com [mailto:general-
> >> bounces at developer.marklogic.com] Namens Danny Sokolsky
> >> Verzonden: woensdag 8 augustus 2012 20:48
> >> Aan: MarkLogic Developer Discussion
> >> Onderwerp: Re: [MarkLogic Dev General] my link resolver is slow
> >>
> >> Hi Mike,
> >>
> >> It is hard to answer a question like this in generalities.  But here
are
> >>
> > a few
> >
> >> random ideas, probably most of which you have tried:
> >>
> >> Have you tried profiling the query?  That can point you to hot spots
> >>
> > fairly quickly.
> >
> >> I'm not totally sure what you mean by "the breakdown reported by the
> >> optimizer".  Do you mean in xdmp:plan?  xdmp:query-trace?  Something
> >>
> > else?
> >
> >> What version of MarkLogic are you running (xdmp:version() )?
> >>
> >> Are you using range index lookups to find the links (with a cts:query
> >>
> > param, for
> >
> >> example)?
> >>
> >> When you say you are doing node replaces, do you mean you are writing
> >>
> > each
> >
> >> document multiple times?  That can get expensive, and it is often
faster
> >>
> > to
> >
> >> create a new version of the document in memory and then write the
> >>
> > document
> >
> >> once.  There is a library to do in-memory node-replaces too if you
don't
> >>
> > feel like
> >
> >> writing that yourself.
> >>
> >> -Danny
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: general-bounces at developer.marklogic.com [mailto:general-
> >> bounces at developer.marklogic.com] On Behalf Of Mike Sokolov
> >> Sent: Wednesday, August 08, 2012 10:41 AM
> >> To: MarkLogic Developer Discussion
> >> Subject: [MarkLogic Dev General] my link resolver is slow
> >>
> >> I've written some code to resolve links in a batch process; the links
> >> can point to a number of different element/@id in any document, and
we
> >> are trying to record the destination document uri with the link so it
> >> can be rendered quickly at run-time, and missing links won't be
rendered
> >> at all.
> >>
> >> Basically the process is: for each of some batch of documents, for
each
> >> of its links, search for the matching document, and replace the link
> >> with an element having a uri attribute pointing to that document.
> >>
> >> Overall, this process is running much slower than I had expected.
I've
> >> been examining the query using the profiler, and after doing some
> >> optimization of the searches, I find something a bit strange.  The
> >> breakdown reported by the optimizer doesn't seem to account for the
> >> total time.  It looks to me as if all the searches are completing
fairly
> >> quickly, based on logging statements that indicate all the documents
in
> >> the batch have been "processed", and then the query just seems to
hang
> >> for a while before returning.  It seems to spend about 90% of the
total
> >> time in this second stage.  My assumption is this time is spent
> >> performing the updates, committing, indexing, writing a journal file,
or
> >> something like that.
> >>
> >> My question is: should I expect this to be reflected in the
optimizer?
> >> And is there some way I can figure out why it is taking so long, and
> >> what I can do about it?  Maybe inserting a node would be faster than
> >> replacing?  I've tried a tree-walk rather than lots of node-replaces,
> >> but that actually seemed quite a bit slower.
> >>
> >> Thanks for any suggestions!
> >>
> >> --
> >> Michael Sokolov
> >> Engineering Director
> >> www.ifactory.com
> >>
> >> _______________________________________________
> >> General mailing list
> >> General at developer.marklogic.com
> >> http://developer.marklogic.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> General at developer.marklogic.com
> >> http://developer.marklogic.com/mailman/listinfo/general
> >>
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://developer.marklogic.com/mailman/listinfo/general
> >


More information about the General mailing list