[MarkLogic Dev General] Loading xml files in mark logic server

Rajesh Marklogic rajesh.marklogic at gmail.com
Tue Apr 26 09:47:05 PDT 2011


Hi Danny,

I could load the documents using info:load, it worked fine first time. But
second time, it stopped after loading 100,000 records and again i tried, it
stopped after loading 65K records.

Every time, i cleared the forest before loading the documents.

Can you help me to figure out the above problem. I couldn't find the log
file in opt/marklogic/logs (working in unix server).

Thanks and Regards

Rajesh

On Tue, Apr 19, 2011 at 11:31 PM, Danny Sokolsky <
Danny.Sokolsky at marklogic.com> wrote:

> You might also try using info:load, which loads things in batches.
>
>
>
>
> http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2doc/apidoc/info.xml&category=Information%20Studio&function=info:load
>
>
>
> -Danny
>
>
>
> *From:* general-bounces at developer.marklogic.com [mailto:
> general-bounces at developer.marklogic.com] *On Behalf Of *Damon Feldman
> *Sent:* Tuesday, April 19, 2011 10:59 AM
> *To:* General MarkLogic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] Loading xml files in mark logic
> server
>
>
>
> Rajesh,
>
>
>
> Each module invoke such as yours below runs as a single transaction with
> all the data in memory. For thousands of XML documents, you should break the
> work up into smaller chunks.
>
>
>
> The InformationStudio flows available in version 4.2 will do this
> automatically, and also provide a nice GUI for viewing progress, unloading
> the data later, and checking on errors.
>
>
>
> Also, the Java-based RecordLoader utility (
> http://developer.marklogic.com/code/recordloader,
> http://marklogic.github.com/recordloader/tutorial.html) will insert
> documents in smaller chunks. It does not provide all the power of
> InformationStudio, but can be faster in some instances.
>
>
>
> Yours,
>
> Damon
>
>
> ------------------------------
>
> *From:* general-bounces at developer.marklogic.com [
> general-bounces at developer.marklogic.com] On Behalf Of Rajesh Marklogic [
> rajesh.marklogic at gmail.com]
> *Sent:* Tuesday, April 19, 2011 1:03 PM
> *To:* general at developer.marklogic.com
> *Subject:* [MarkLogic Dev General] Loading xml files in mark logic server
>
> Hi
>
>
>
> We are trying to load 14 million xml files in Mark logic database. The
> below xdmp:document-load script could load maximum 5000 xml files at a time.
>  Anything more than 5000 xml files threw Memory exceptions.
>
>
>
> xquery version "1.0-ml";
>
>
>
> let $files:=xdmp:filesystem-directory("/filePath/")
>
> for $filepath in $files//dir:entry[1 to 5000]
>
> return (xdmp:document-load($filepath//dir:pathname,
>
> <options xmlns="xdmp:document-load">
>
>        <uri>{$filepath//dir:filename/text()}</uri>
>
>        <permissions>{xdmp:default-permissions()}</permissions>
>
>       <format>xml</format>
>
>        <repair>none</repair>
>
>     </options>))
>
>
>
>
>
> Is there any configuration changes required in admin setting to load all
> the 14 million xml files in 3 to 4 hours?. The total size of the content
> will be around 4GB and we have Unix server with 250 GB memory (RAM)
>
>
>
> It would be great, if you suggest an best  approach to load all the 14
> million xml files in the time frame of 3-4 hours.
>
>
>
> Thanks and Regards
>
>
>
> Rajesh
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20110426/65e19b54/attachment.html 


More information about the General mailing list