[MarkLogic Dev General] "Hot Swapping" large data sets.
jhunter at marklogic.com
Wed Mar 17 22:09:55 PST 2010
On Mar 17, 2010, at 5:23 AM, Lee, David wrote:
> I need to be updating some largish (1G+) sets of documents fairly atomically.
> That is, I'd like to update all the documents and perform some operations like adding properties etc,
> then all at once make the updates visible. The update process could take several hours.
> Currently this document set shares the same forest as other document sets.
> Its not possible to split these up because the app needs cross-query across all the document sets.
> Any suggestions on how to accomplish this ?
What happens if you try loading everything as part of a single XCC call passing the large array of files?
If you want to follow Wayne's advice on using collections, I suppose you'd want to put each batch of docs in a uniquely named collection. Then you can run your queries against fn:collection($seq) when $seq is the sequence of collections that have been loaded so far. Or, perhaps more simply, you can do a cts:not-query() against the cts:collection-query("latest") and thus exclude the most recent batch but allow all other docs that were loaded before. It keeps the new collection in the dark basically. Handy, efficient, and if each batch gets its own ID then you can easily exclude any batch.
Point-in-time would do something similar, and is suitable if you're always doing just one bulk load at a time. Then you can use the point in time to control the visibility.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General