[MarkLogic Dev General] Processing Large Documents?
mike at blakeley.com
Mon Feb 20 09:15:04 PST 2012
Ignore forests and stands for now. Those are physical storage artifacts, completely orthogonal to collections.
One difference you may note to existdb is that a document can be in many collections at the same time. As I understand it, existdb collections act sort of like filesystem directories. MarkLogic treats them more like tags, and uses document URIs like '/a/b/c.xml' to provide hierarchies above the document level.
Have you considered stepping back a bit and doing more denormalization work in SQL? Could you generate a relational view that represents your denormalized document structure, at least approximately? If so, may find that XML easier to import into MarkLogic - either with InfoStudio or with http://marklogic.github.com/recordloader/
MarkLogic does have support for includes: http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/dev_guide/mod-docs.xml - but I think you would be better served to export a denormalized view and work from that. SQL has fairly good syntax for denormalization, while the XQuery data model works best with XML that is already denormalized.
You might also find http://resources.marklogic.com/library/media/inside-marklogic helpful. Google has indexed a copy of what looks like the same paper at http://www.odbms.org/download/inside-marklogic-server.pdf too.
On 20 Feb 2012, at 07:53 , Todd Gochenour wrote:
> Day three. President's day. I will first chunk the data for each row as this will improve concurrency. I gather I will need to generate random document names for each chunk and put these documents in a collection using the name of the database as the folder name. I see the terms Forest and Stands. I assume this is new terminology for collections. With eXistDB, my queries work across collection/subcollection boundaries transparently. I'm assuming this is true with stands in a forest, so my document will be found no matter which stand it resides in.
> Second I will de-normalize the data so as to convert primary/foreign key relationships into structure. The database has a naming convention which I can exploit to automate this task (i.e. path /xyz/usr_id maps to /usr/id). For relationships between primary documents, I will research MarkLogic's linking. I never manged to get <xs:include/> to work like I wanted in eXistDB and so I had to perform these joins programatically in XQuery. Perhaps MarkLogic has a clever way to do this automatically. We will see.
> Thanks Geert and Damon for keeping me honest.
> General mailing list
> General at developer.marklogic.com
More information about the General