[MarkLogic Dev General] How to improve performance of update query?

Tomo Simeonov tomosimeonov at yahoo.com
Thu Aug 30 12:59:36 PDT 2012

Hi Geert,

I need to know when is finished as other logic depends on it and this is why I cannot use spawn for now. 
Could you explain why there will be more disk writes if I am using separate transaction for each document? 


 From: Geert Josten <geert.josten at dayon.nl>
To: Tomo Simeonov <tomosimeonov at yahoo.com>; MarkLogic Developer Discussion <general at developer.marklogic.com> 
Sent: Thursday, August 30, 2012 8:17 PM
Subject: RE: [MarkLogic Dev General] How to improve performance of update query?

Hi Tomo,
I think xdmp:spawn is indeed the way to go, certainly if you don’t need to wait for response from xdmp:eval. Spawned tasks are typically processed in parallel in bunches of 16 or 32. I’d also recommend not creating a separate task for each document, but do so for let’s say batches of a 100. That reduced the total number of disk writes. You might also want to push calculation work into the spawned tasks, doing as little as possible in the main task. That makes sure as much as possible is done in parallel..
Kind regards,
Van:general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] Namens Tomo Simeonov
Verzonden: donderdag 30 augustus 2012 21:02
Aan: General at developer.marklogic.com
Onderwerp: [MarkLogic Dev General] How to improve performance of update query?
I have been working on a query for updating elements in the database.
I am providing many values and then  for the elements that have any of this values I am doing node-replace. Those elements can be in any of the documents in the database not in just one.
It doesn't run very fast if I want to update more than 100 elements in one go and this is just a test, later one a could end up updating thousands.
What I am doing 
1) Search for elements that have one of those values - it has good performance as it uses indexes 
2) I am creating map in which I am storing maps - each stored map contains the nodes that need to be changed in one specific document
3) For each document to be updated I am creating separate transaction with xdmp:eval and provide the correct map, in the transaction I am doing the node-replaces as I do not want to lock all the documents I am searching through.
This works almost perfectly until xdmp:eval part is reache. All evals are executed in serial not in parallel which is causing the bad performance.
My question is - is there a way to execute does dmp:evals in parallel ( xdmp:spawn is not  an option for now)?
Also is there  a better way to do many updates to nodes across many documents? 
Thank you, 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120830/cc14690f/attachment.html 

More information about the General mailing list