[MarkLogic Dev General] "Hot Swapping" large data sets.

Danny Sokolsky Danny.Sokolsky at marklogic.com
Wed Mar 17 13:52:14 PST 2010


You might consider creating a CPF process in which the last step either updates the document to put it in a different collection (as Wayne suggested) or where the last step creates a new document (possibly in a different directory, as you are already using directories).  CPF handles a lot of the complexity of building a resilient content-processing application.

-Danny

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Lee, David
Sent: Wednesday, March 17, 2010 1:38 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] "Hot Swapping" large data sets.

Thanks.  I'm using directories currently ... I wish I could just rename them but nope,
but I could make "what directory to use" a variable which I set in a document somewhere.
Good idea.
Not sure about collections ... need to look more into  those.

Point in time queries ... interesting, I thought about those but never tried them.

Thanks for the ideas !


From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Wayne Feick
Sent: Wednesday, March 17, 2010 3:54 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets.

I'd suggest looking into collections or directories to constrain queries to one set or the other such that one is the live set you're serving up and the other is the set you're updating.

You might also consider using the Library Services API where the updates operate on the most recent version of each document while you serve up whatever version was most recent at some fixed point in time before the updates began.

The third approach would be to use point in time queries (you'll need to set the merge timestamp on the database's merge policy page) such that you're serving up content from a fixed commit timestamp before your changes while your update process is actively changing the database. We don't generally recommend people use point in time queries since there is almost always a better way to do what they want, but this particular case is the one situation where it makes sense to consider it.

Wayne.


On Wed, 2010-03-17 at 05:23 -0700, Lee, David wrote:
I need to be updating some largish (1G+) sets of documents fairly atomically.

That is, I'd like to update all the documents and perform some operations like adding properties etc,

then all at once make the updates visible.   The update process could take several hours.

Currently this document set shares the same forest as other document sets.

Its not possible to split these up because the app needs cross-query across all the document sets.



Any suggestions on how to accomplish this ?















----------------------------------------

David A. Lee

Senior Principal Software Engineer

Epocrates, Inc.

dlee at epocrates.com<mailto:dlee at epocrates.com>

812-482-5224



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20100317/56b2b463/attachment.html


More information about the General mailing list