[MarkLogic Dev General] Loading xml files in mark logic server

Jason Hunter jhunter at marklogic.com
Tue Apr 26 10:00:10 PDT 2011


The extra space is for the indexes.

-jh-

On Apr 26, 2011, at 9:51 AM, Rajesh Marklogic wrote:

> Hi Damon,
> 
> Using Record loader, i could upload the million xml documents successfully. The total size of the document is 40 mb, but the forest size is increased to 70 mb.
> 
> Any idea  why the forest size is double than actual file size?
> 
> Thanks and Regards
> 
> Rajesh Govindan
> 
> On Tue, Apr 19, 2011 at 11:28 PM, Damon Feldman <Damon.Feldman at marklogic.com> wrote:
> Rajesh,
>  
> Each module invoke such as yours below runs as a single transaction with all the data in memory. For thousands of XML documents, you should break the work up into smaller chunks.
>  
> The InformationStudio flows available in version 4.2 will do this automatically, and also provide a nice GUI for viewing progress, unloading the data later, and checking on errors.
>  
> Also, the Java-based RecordLoader utility (http://developer.marklogic.com/code/recordloader, http://marklogic.github.com/recordloader/tutorial.html) will insert documents in smaller chunks. It does not provide all the power of InformationStudio, but can be faster in some instances.
>  
> Yours,
> Damon
>  
> From: general-bounces at developer.marklogic.com [general-bounces at developer.marklogic.com] On Behalf Of Rajesh Marklogic [rajesh.marklogic at gmail.com]
> Sent: Tuesday, April 19, 2011 1:03 PM
> To: general at developer.marklogic.com
> Subject: [MarkLogic Dev General] Loading xml files in mark logic server
> 
> Hi 
> 
> We are trying to load 14 million xml files in Mark logic database. The below xdmp:document-load script could load maximum 5000 xml files at a time.  Anything more than 5000 xml files threw Memory exceptions.
> 
> xquery version "1.0-ml";
> 
> let $files:=xdmp:filesystem-directory("/filePath/")
> for $filepath in $files//dir:entry[1 to 5000]
> return (xdmp:document-load($filepath//dir:pathname,
> <options xmlns="xdmp:document-load">          
>        <uri>{$filepath//dir:filename/text()}</uri>       
>        <permissions>{xdmp:default-permissions()}</permissions>        
>       <format>xml</format>
>        <repair>none</repair>       
>     </options>)) 
> 
> 
> Is there any configuration changes required in admin setting to load all the 14 million xml files in 3 to 4 hours?. The total size of the content will be around 4GB and we have Unix server with 250 GB memory (RAM)
> 
> It would be great, if you suggest an best  approach to load all the 14 million xml files in the time frame of 3-4 hours.
> 
> Thanks and Regards
> 
> Rajesh 
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20110426/efed47d6/attachment.html 


More information about the General mailing list