[MarkLogic Dev General] Large documents

Thomas Goossens thomgooss at gmail.com
Wed Jun 30 10:58:58 PDT 2010


Thanks Geert, Danny, Michael, Kelly  for your answers,

I admit that it is  not a good idea to handle huge XML documents that are
not more than a long list of entities -which could be stored in individual
documents.. It is not what I was thinking of.
There are however some cases where one can want to store very large
documents, at least temporarily. For example in aeronautics there are AMM
(Aircraft Maintenance Manuals) which can reach the gigabyte. They are indeed
the result of an assembly of smaller parts. But one can want to store such
documents and do some processing on them.

Anyway from your answers I understand that it is not an easy task to do that
with ML. Fragments seem really too complex and limiting. And if one has to
tune some memory limit, I suppose it means that this limit depends on the
size of the document [so in practice it can mean administrator overhead].

>From all this I conclude that MarkLogic is a high-end CMS, rather than a
general-purpose XML database. Am I correct?

FYI, I have been able to store the 110 Mb XMark document without
difficulties in the following products: BaseX, Qizx, eXist, MonetDb, Sedna,
and even BDB XML.

Cheers

On Wed, Jun 30, 2010 at 7:52 AM, Geert Josten <Geert.Josten at daidalos.nl>wrote:

> Hi Thomas,
>
> The question how has already been sufficiently answered by the Mark Logic
> experts. But it might be worthwhile to elaborate a bit more on *why*
> MarkLogic Server works this way.
>
> You should know that MarkLogic Server is highly focussed on searching.
> Michael mentions that MarkLogic Server is document-oriented. What he means
> by that is that when you do a search, MarkLogic Server can return results
> quickest if you have documents as search results. To be more accurate, the
> search indexes are fragment-based, so that is why you can use fragmentation
> to optimize searching without necessarily needing to physically split your
> large doc into smaller ones. It also loads one fragment at a time into
> memory if it needs further processing for rendering. That is the reason why
> you don't want to put a 30 mb doc as a single fragment into its database. To
> put it short: it is always best to fragment your content to match the way(s)
> you want to search through your content, and to keep fragments small to keep
> rendering of search results light and quick..
>
> Kind regards,
> Geert
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20100630/efb18de2/attachment.html 


More information about the General mailing list