[MarkLogic Dev General] xqsync throughput

Hsiao Su Hsiao.Su at marklogic.com
Wed Mar 14 15:01:13 PDT 2012


Yes, xqsync does retrieve all of the URIs using a dedicated thread.  It's not completely done up-front though.  Other threads would start to do the actual reading/writing once there are enough URIs.  But, reading/writing are much slower compared to retrieving URIs.  So if you have a lot of URIs, they'd all be stored in JVM's memory, and may overwhelm the garbage collector.

You can try a newer version of xqsync here:

http://marklogic.github.com/xqsync/

Newer versions of xqsync would store URIs in a temporary file (in TMP_DIR, or specified via URI_QUEUE_FILE).  This would help with memory pressure, if that's your bottleneck.

Hsiao "Shao" Su
Senior Performance Engineer
MarkLogic Corporation
Hsiao.Su at marklogic.com
Phone: +1 650 287 2545 
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.


-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Mike Sokolov
Sent: Wednesday, March 14, 2012 1:27 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] xqsync throughput

Thanks for the suggestions, Mike.  I discovered that

DINPUT_QUERY_CACHABLE

wasn't true, so I am trying that now; the process kept failing to retrieve uris, so maybe if we fetch them all up front?

I looked at the networking a bit - pings are ~ 0.15 ms and I am seeing sustained transfer rates as high as 84MB/s using scp - I think I'd get more with larger files.  Also the servers don't seem busy - I am running xqsync on the destination box, which I suppose might not be ideal, but uses less network anyway - it is maxing out one of the cpus during the initial fetch of all the uris (over 10m of them) now that cachable=true.  Maybe there is a problem deep paging into the cts:uris query when it is not cached?

I'll report back once the data actually starts transferring

-Mike


On 03/14/2012 10:14 AM, Michael Blakeley wrote:
> I would expect better than that. What is the document rate?
>
> You may not have enough client threads to keep the servers busy. What does the utilization look like on both sides?
>
> You may also be memory-limited in the JVM at some point, especially if the documents are big. If so, the JVM will spend a lot of time running the garbage collector. You can check that idea with the '-verbose:gc' option.
>
> Could there be a network limitation other than bandwidth? You might check that by exporting to packages instead, and see what that performance looks like. I have seen some cases where there was a slow hop on the network, or where a firewall was limiting performance.
>
> -- Mike
>
> On 14 Mar 2012, at 13:38 , Mike Sokolov wrote:
>
>    
>> I wonder if anyone has a rough guide to what sort of transfer speeds can
>> be expected using xqsync to transfer a database from one node to
>> another.  I have two quite beefy servers on the same LAN (at least
>> 100Mb/s ~ 12MB/s), and I'm only getting ~30kB/sec.  I was hoping to get
>> a few orders of magnitude more, but am I smoking crack?  Is there
>> something I could be doing or not doing that might be limiting the speed
>> somehow?
>>
>> This is my setup:
>>
>> java -cp ${BIN}/xqsync.jar:$BIN/xcc.jar:$BIN/xstream.jar:$BIN/xpp3.jar
>> -Xmx1024m \
>>   -DINPUT_CONNECTION_STRING=$SRCDB \
>>   -DOUTPUT_CONNECTION_STRING=$DSTDB \
>>   -DSKIP_EXISTING=true \
>>   -DCOPY_COLLECTIONS=false \
>>   -DCOPY_PERMISSIONS=false \
>>   -DCOPY_PROPERTIES=true \
>>   -DCOPY_QUALITY=false \
>>   -DINPUT_BATCH_SIZE=10 \
>>   -DINPUT_QUERY_CACHABLE \
>>   -DTHREADS=8  \
>>      com.marklogic.ps.xqsync.XQSync
>>
>> These are the startup messages from the log:
>>
>> INFO: XQSync starting: version 2009-03-10.1 on 1.6.0_26 (Java(TM) SE
>> Runtime Environment)
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSync main
>> INFO: XCC version = 3.2-7
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>> INFO: starting pool of 8 threads, queue size = 10000
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.Monitor run
>> INFO: starting
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>> INFO: output version info: client 3.2-7, server 4.1-11
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>> INFO: input version info: client 3.2-7, server 4.1-11
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>> queueFromInputConnection
>> INFO: buffer size = 0, caching = false
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager getUrisRequest
>> INFO: listing all documents (with uri lexicon)
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>> queueFromInputConnection
>>
>> The connector is a bit old: Can I expect any substantial improvement
>> from updating that?
>>
>> -- 
>> Michael Sokolov
>> Engineering Director
>> www.ifactory.com
>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>      
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>    
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list