[MarkLogic Dev General] Need help with mass updates

Danny Sokolsky Danny.Sokolsky at marklogic.com
Mon May 6 13:53:13 PDT 2013


You can also use xdmp:spawn to update a batch at a time.  You would then need two modules, the xdmp:spawn module, which typically would have an external variable that you would use to pass in the URLs to process, and another module that figures out the batches and then passes them off to the spawn module.

-Danny

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Brent Hartwig
Sent: Monday, May 06, 2013 1:46 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Need help with mass updates

Hi, Gary,

When this is all in one transaction, it doesn't matter how you break it up.  CORB<http://marklogic.github.io/corb/index.html> is built for this purpose.  You provide two queries.  One selects the documents to process.  The other processes the documents, one at a time.  Each document is processed in a transaction of its own.

For the first query, it's good to come up with a way to only select unprocessed documents, unless you wish to reprocess all.  This allows for the process to be interrupted but pick up where it left off, later.

CORB is a Java program.  You get to configure the number of threads.

I couldn't say if there's now a standard feature that supersedes CORB.

-Brent

From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com] On Behalf Of Gary Larsen
Sent: Monday, May 06, 2013 4:36 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Need help with mass updates

Hi,

I have a query to update documents, but when there are many I get the dreaded XDMP-EXPNTREECACHEFULL error.   I've had luck avoiding this error when returning large result sets by processing the docs in segments [$start to $end], but it does not seem to help with the updates.

Is there a trick to performing mass updates?  Any advice would be appreciated.

xquery version "1.0-ml";
declare default element namespace 'http://developer.envisn.com/xmlns/envisn/netvisn/';

let $cq := cts:collection-query('audit_history')

let $incr := 100
let $size := xdmp:estimate(cts:search(doc(), $cq, 'unfiltered'))
let $segs := ceiling($size div $incr) return

for $x in (1 to $segs)
     let $start :=  (($x -1) * $incr) +1
     let $end := $start + $incr -1

     for $d in cts:search(doc(), $cq, 'unfiltered')[$start to $end]
         let $lk := $d/auditHistory/lookupInfo
         let  $loc := element auditParentDisplayPath {$lk/parentDisplayPath/text() },
                $name := element auditDefaultName {$lk/defaultName/text() },
                $class := element auditObjectClass {$lk/objectClass/text() }  return

         (xdmp:node-replace($lk/parentDisplayPath, $loc),
          xdmp:node-replace($lk/defaultName, $name),
          xdmp:node-replace($lk/objectClass, $class),

          for $u in $d/auditHistory//Action/user
            let $uname :=  element auditUserName {$u/username/text() }  return
            xdmp:node-replace($u/username, $uname)
          )
Thanks,

Gary Larsen
Envisn Inc.
508-259-6465

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20130506/47906367/attachment-0001.html 


More information about the General mailing list