[MarkLogic Dev General] xqsync throughput

Mike Sokolov sokolov at ifactory.com
Wed Mar 14 13:26:40 PDT 2012


Thanks for the suggestions, Mike.  I discovered that

DINPUT_QUERY_CACHABLE

wasn't true, so I am trying that now; the process kept failing to retrieve uris, so maybe if we fetch them all up front?

I looked at the networking a bit - pings are ~ 0.15 ms and I am seeing sustained transfer rates as high as 84MB/s using scp - I think I'd get more with larger files.  Also the servers don't seem busy - I am running xqsync on the destination box, which I suppose might not be ideal, but uses less network anyway - it is maxing out one of the cpus during the initial fetch of all the uris (over 10m of them) now that cachable=true.  Maybe there is a problem deep paging into the cts:uris query when it is not cached?

I'll report back once the data actually starts transferring

-Mike


On 03/14/2012 10:14 AM, Michael Blakeley wrote:
> I would expect better than that. What is the document rate?
>
> You may not have enough client threads to keep the servers busy. What does the utilization look like on both sides?
>
> You may also be memory-limited in the JVM at some point, especially if the documents are big. If so, the JVM will spend a lot of time running the garbage collector. You can check that idea with the '-verbose:gc' option.
>
> Could there be a network limitation other than bandwidth? You might check that by exporting to packages instead, and see what that performance looks like. I have seen some cases where there was a slow hop on the network, or where a firewall was limiting performance.
>
> -- Mike
>
> On 14 Mar 2012, at 13:38 , Mike Sokolov wrote:
>
>    
>> I wonder if anyone has a rough guide to what sort of transfer speeds can
>> be expected using xqsync to transfer a database from one node to
>> another.  I have two quite beefy servers on the same LAN (at least
>> 100Mb/s ~ 12MB/s), and I'm only getting ~30kB/sec.  I was hoping to get
>> a few orders of magnitude more, but am I smoking crack?  Is there
>> something I could be doing or not doing that might be limiting the speed
>> somehow?
>>
>> This is my setup:
>>
>> java -cp ${BIN}/xqsync.jar:$BIN/xcc.jar:$BIN/xstream.jar:$BIN/xpp3.jar
>> -Xmx1024m \
>>   -DINPUT_CONNECTION_STRING=$SRCDB \
>>   -DOUTPUT_CONNECTION_STRING=$DSTDB \
>>   -DSKIP_EXISTING=true \
>>   -DCOPY_COLLECTIONS=false \
>>   -DCOPY_PERMISSIONS=false \
>>   -DCOPY_PROPERTIES=true \
>>   -DCOPY_QUALITY=false \
>>   -DINPUT_BATCH_SIZE=10 \
>>   -DINPUT_QUERY_CACHABLE \
>>   -DTHREADS=8  \
>>      com.marklogic.ps.xqsync.XQSync
>>
>> These are the startup messages from the log:
>>
>> INFO: XQSync starting: version 2009-03-10.1 on 1.6.0_26 (Java(TM) SE
>> Runtime Environment)
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSync main
>> INFO: XCC version = 3.2-7
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>> INFO: starting pool of 8 threads, queue size = 10000
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.Monitor run
>> INFO: starting
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>> INFO: output version info: client 3.2-7, server 4.1-11
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>> INFO: input version info: client 3.2-7, server 4.1-11
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>> queueFromInputConnection
>> INFO: buffer size = 0, caching = false
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager getUrisRequest
>> INFO: listing all documents (with uri lexicon)
>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>> queueFromInputConnection
>>
>> The connector is a bit old: Can I expect any substantial improvement
>> from updating that?
>>
>> -- 
>> Michael Sokolov
>> Engineering Director
>> www.ifactory.com
>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>      
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>    


More information about the General mailing list