[MarkLogic Dev General] Large documents

Michael Blakeley michael.blakeley at marklogic.com
Wed Jun 30 11:05:33 PDT 2010


MarkLogic Server is a general-purpose database.

You can store 110-MB documents in MarkLogic Server. You can store 
documents that are even larger than that. There is nothing particularly 
difficult about it.

-- Mike

On 2010-06-30 10:58, Thomas Goossens wrote:
> Thanks Geert, Danny, Michael, Kelly  for your answers,
>
> I admit that it is  not a good idea to handle huge XML documents that are not more than a long list of entities -which could be stored in individual documents.. It is not what I was thinking of.
> There are however some cases where one can want to store very large documents, at least temporarily. For example in aeronautics there are AMM (Aircraft Maintenance Manuals) which can reach the gigabyte. They are indeed the result of an assembly of smaller parts. But one can want to store such documents and do some processing on them.
>
> Anyway from your answers I understand that it is not an easy task to do that with ML. Fragments seem really too complex and limiting. And if one has to tune some memory limit, I suppose it means that this limit depends on the size of the document [so in practice it can mean administrator overhead].
>
>  From all this I conclude that MarkLogic is a high-end CMS, rather than a general-purpose XML database. Am I correct?
>
> FYI, I have been able to store the 110 Mb XMark document without difficulties in the following products: BaseX, Qizx, eXist, MonetDb, Sedna, and even BDB XML.
>
> Cheers
>
> On Wed, Jun 30, 2010 at 7:52 AM, Geert Josten<Geert.Josten at daidalos.nl<mailto:Geert.Josten at daidalos.nl>>  wrote:
> Hi Thomas,
>
> The question how has already been sufficiently answered by the Mark Logic experts. But it might be worthwhile to elaborate a bit more on *why* MarkLogic Server works this way.
>
> You should know that MarkLogic Server is highly focussed on searching. Michael mentions that MarkLogic Server is document-oriented. What he means by that is that when you do a search, MarkLogic Server can return results quickest if you have documents as search results. To be more accurate, the search indexes are fragment-based, so that is why you can use fragmentation to optimize searching without necessarily needing to physically split your large doc into smaller ones. It also loads one fragment at a time into memory if it needs further processing for rendering. That is the reason why you don't want to put a 30 mb doc as a single fragment into its database. To put it short: it is always best to fragment your content to match the way(s) you want to search through your content, and to keep fragments small to keep rendering of search results light and quick..
>
> Kind regards,
> Geert
>
>



More information about the General mailing list