[MarkLogic Dev General] How to improve performance of update query?

Geert Josten geert.josten at dayon.nl
Thu Aug 30 12:17:30 PDT 2012


Hi Tomo,



I think xdmp:spawn is indeed the way to go, certainly if you don’t need to
wait for response from xdmp:eval. Spawned tasks are typically processed in
parallel in bunches of 16 or 32. I’d also recommend not creating a separate
task for each document, but do so for let’s say batches of a 100. That
reduced the total number of disk writes. You might also want to push
calculation work into the spawned tasks, doing as little as possible in the
main task. That makes sure as much as possible is done in parallel..



Kind regards,

Geert



*Van:* general-bounces at developer.marklogic.com [mailto:
general-bounces at developer.marklogic.com] *Namens *Tomo Simeonov
*Verzonden:* donderdag 30 augustus 2012 21:02
*Aan:* General at developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] How to improve performance of update
query?



Hi,



I have been working on a query for updating elements in the database.

I am providing many values and then  for the elements that have any of this
values I am doing node-replace. Those elements can be in any of the
documents in the database not in just one.

It doesn't run very fast if I want to update more than 100 elements in one
go and this is just a test, later one a could end up updating thousands.



What I am doing

1) Search for elements that have one of those values - it has good
performance as it uses indexes

2) I am creating map in which I am storing maps - each stored map contains
the nodes that need to be changed in one specific document

3) For each document to be updated I am creating separate transaction with
xdmp:eval and provide the correct map, in the transaction I am doing the
node-replaces as I do not want to lock all the documents I am searching
through.



This works almost perfectly until xdmp:eval part is reache. All evals are
executed in serial not in parallel which is causing the bad performance.

My question is - is there a way to execute does dmp:evals in parallel (
xdmp:spawn is not  an option for now)?

Also is there  a better way to do many updates to nodes across many
documents?



Thank you,

Tomo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120830/de0d729d/attachment-0001.html 


More information about the General mailing list