[MarkLogic Dev General] "Hot Swapping" large data sets.
Keith L. Breinholt
BreinholtKL at ldschurch.org
Thu Mar 18 08:34:11 PST 2010
Another way to allow you to load and update sets and then only make them visible when you are done is to load the content with a unique URI privilege that is assigned to your loader/enricher program.
Then when you are done and the content is ready you can add that privilege to the role of any users/applications that need to see it. That way only completed content is visible and it appears 'instantaneously' when the privilege is added to the role.
Keith L. Breinholt
breinholtkl at ldschurch.org<mailto:breinholtkl at ldschurch.org>
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Jason Hunter
Sent: Thursday, March 18, 2010 12:10 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets.
On Mar 17, 2010, at 5:23 AM, Lee, David wrote:
I need to be updating some largish (1G+) sets of documents fairly atomically.
That is, I'd like to update all the documents and perform some operations like adding properties etc,
then all at once make the updates visible. The update process could take several hours.
Currently this document set shares the same forest as other document sets.
Its not possible to split these up because the app needs cross-query across all the document sets.
Any suggestions on how to accomplish this ?
What happens if you try loading everything as part of a single XCC call passing the large array of files?
If you want to follow Wayne's advice on using collections, I suppose you'd want to put each batch of docs in a uniquely named collection. Then you can run your queries against fn:collection($seq) when $seq is the sequence of collections that have been loaded so far. Or, perhaps more simply, you can do a cts:not-query() against the cts:collection-query("latest") and thus exclude the most recent batch but allow all other docs that were loaded before. It keeps the new collection in the dark basically. Handy, efficient, and if each batch gets its own ID then you can easily exclude any batch.
Point-in-time would do something similar, and is suitable if you're always doing just one bulk load at a time. Then you can use the point in time to control the visibility.
NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General