[MarkLogic Dev General] Loading xml files in mark logic server
Jason Hunter
jhunter at marklogic.com
Tue Apr 26 10:00:10 PDT 2011
The extra space is for the indexes.
-jh-
On Apr 26, 2011, at 9:51 AM, Rajesh Marklogic wrote:
> Hi Damon,
>
> Using Record loader, i could upload the million xml documents successfully. The total size of the document is 40 mb, but the forest size is increased to 70 mb.
>
> Any idea why the forest size is double than actual file size?
>
> Thanks and Regards
>
> Rajesh Govindan
>
> On Tue, Apr 19, 2011 at 11:28 PM, Damon Feldman <Damon.Feldman at marklogic.com> wrote:
> Rajesh,
>
> Each module invoke such as yours below runs as a single transaction with all the data in memory. For thousands of XML documents, you should break the work up into smaller chunks.
>
> The InformationStudio flows available in version 4.2 will do this automatically, and also provide a nice GUI for viewing progress, unloading the data later, and checking on errors.
>
> Also, the Java-based RecordLoader utility (http://developer.marklogic.com/code/recordloader, http://marklogic.github.com/recordloader/tutorial.html) will insert documents in smaller chunks. It does not provide all the power of InformationStudio, but can be faster in some instances.
>
> Yours,
> Damon
>
> From: general-bounces at developer.marklogic.com [general-bounces at developer.marklogic.com] On Behalf Of Rajesh Marklogic [rajesh.marklogic at gmail.com]
> Sent: Tuesday, April 19, 2011 1:03 PM
> To: general at developer.marklogic.com
> Subject: [MarkLogic Dev General] Loading xml files in mark logic server
>
> Hi
>
> We are trying to load 14 million xml files in Mark logic database. The below xdmp:document-load script could load maximum 5000 xml files at a time. Anything more than 5000 xml files threw Memory exceptions.
>
> xquery version "1.0-ml";
>
> let $files:=xdmp:filesystem-directory("/filePath/")
> for $filepath in $files//dir:entry[1 to 5000]
> return (xdmp:document-load($filepath//dir:pathname,
> <options xmlns="xdmp:document-load">
> <uri>{$filepath//dir:filename/text()}</uri>
> <permissions>{xdmp:default-permissions()}</permissions>
> <format>xml</format>
> <repair>none</repair>
> </options>))
>
>
> Is there any configuration changes required in admin setting to load all the 14 million xml files in 3 to 4 hours?. The total size of the content will be around 4GB and we have Unix server with 250 GB memory (RAM)
>
> It would be great, if you suggest an best approach to load all the 14 million xml files in the time frame of 3-4 hours.
>
> Thanks and Regards
>
> Rajesh
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20110426/efed47d6/attachment.html
More information about the General
mailing list