[MarkLogic Dev General] CORB: Sleep during configurable hours and process 1 forest at a time

Hartwig, Brent (CL Tech Sv) Brent.Hartwig at cengage.com
Tue Sep 23 12:58:26 PDT 2008


Hi, Mike,

Thank you for the quick and informative response -- I expect the XQuery sorting URIs by forest will be very helpful. I'm not sure when I will post an outcome but did want to pass on my thanks.

-Brent

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Michael Blakeley
Sent: Tuesday, September 23, 2008 12:32 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] CORB: Sleep during configurable hours and process 1 forest at a time

Brent,

Those are interesting ideas: I'll add them to my list of potential
enhancements to Corb. The Corb source code is fairly simple, and I
welcome patches.

Meanwhile, you can implement something like that per-Forest idea without
changing any Java code: just provide your own uris-module, as mentioned
at http://developer.marklogic.com/svn/corb/trunk/README.html

(: simple URIS-MODULE example :)
let $uris := cts:uris('', 'document')
return (count($uris), $uris)

I gather that you already know about this mechanism, but let's explore
it further. To limit to a forest, you could write:

(: process the first forest :)
let $forest-id := xdmp:database-forests(xdmp:database())[1]
let $uris := cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)

This means running corb with a different uris-module for each forest.
Alternatively, you could process the uris for each forest in series:

(: process forests in series :)
let $uris :=
   for $forest-id in xdmp:database-forests(xdmp:database())
   return cts:uris('', 'document', (), 1.0, $forest-id)
return (count($uris), $uris)

If you can formulate a query to select documents that haven't yet been
processed, you can include that as the third argument to cts:uris():
that should allow your code to resume processing quickly, if you
interrupt corb for peak-hour processing.

-- Mike

Hartwig, Brent (CL Tech Sv) wrote:
> Hello,
>
> Has anyone extended Corb to sleep during configurable periods or process one forest at a time?
>
> We need to modify every object in our ML instance. Multiple merges are saturating the IO channel. To keep production stable and usable, we intend to put the job to sleep during peak hours and only process one forest at a time. Each processed URI will go into a collection, allowing us to verify all are processed. Preliminary approaches are described below. Your thoughts and experience are welcome. Thank you in advance.
>
> Sleep: Nothing too concerning here (but tried & true is always better). We're planning to work around backups, peak hours and allow time for system resources to recover before peak hours resume.
>
> Forest: Corb can obtain a list of forests from the specified database via Session.getContentbaseMetaData().getForestIds() and iterate in serial. The queue would be populated once per forest by substituting the forest ID within the provided URIS-MODULE. The initial implementation may impose some usage constraints.
>
> -Brent
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general


More information about the General mailing list