[MarkLogic Dev General] xqsync throughput

Michael Blakeley mike at blakeley.com
Wed Mar 14 16:52:02 PDT 2012


If xqsync is failing at that point, I doubt INPUT_QUERY_CACHEABLE will help. How many documents are you trying to process? 1-GB may not be enough JVM space.

-- Mike

On 14 Mar 2012, at 20:26 , Mike Sokolov wrote:

> Thanks for the suggestions, Mike.  I discovered that
> 
> DINPUT_QUERY_CACHABLE
> 
> wasn't true, so I am trying that now; the process kept failing to retrieve uris, so maybe if we fetch them all up front?
> 
> I looked at the networking a bit - pings are ~ 0.15 ms and I am seeing sustained transfer rates as high as 84MB/s using scp - I think I'd get more with larger files.  Also the servers don't seem busy - I am running xqsync on the destination box, which I suppose might not be ideal, but uses less network anyway - it is maxing out one of the cpus during the initial fetch of all the uris (over 10m of them) now that cachable=true.  Maybe there is a problem deep paging into the cts:uris query when it is not cached?
> 
> I'll report back once the data actually starts transferring
> 
> -Mike
> 
> 
> On 03/14/2012 10:14 AM, Michael Blakeley wrote:
>> I would expect better than that. What is the document rate?
>> 
>> You may not have enough client threads to keep the servers busy. What does the utilization look like on both sides?
>> 
>> You may also be memory-limited in the JVM at some point, especially if the documents are big. If so, the JVM will spend a lot of time running the garbage collector. You can check that idea with the '-verbose:gc' option.
>> 
>> Could there be a network limitation other than bandwidth? You might check that by exporting to packages instead, and see what that performance looks like. I have seen some cases where there was a slow hop on the network, or where a firewall was limiting performance.
>> 
>> -- Mike
>> 
>> On 14 Mar 2012, at 13:38 , Mike Sokolov wrote:
>> 
>>   
>>> I wonder if anyone has a rough guide to what sort of transfer speeds can
>>> be expected using xqsync to transfer a database from one node to
>>> another.  I have two quite beefy servers on the same LAN (at least
>>> 100Mb/s ~ 12MB/s), and I'm only getting ~30kB/sec.  I was hoping to get
>>> a few orders of magnitude more, but am I smoking crack?  Is there
>>> something I could be doing or not doing that might be limiting the speed
>>> somehow?
>>> 
>>> This is my setup:
>>> 
>>> java -cp ${BIN}/xqsync.jar:$BIN/xcc.jar:$BIN/xstream.jar:$BIN/xpp3.jar
>>> -Xmx1024m \
>>>  -DINPUT_CONNECTION_STRING=$SRCDB \
>>>  -DOUTPUT_CONNECTION_STRING=$DSTDB \
>>>  -DSKIP_EXISTING=true \
>>>  -DCOPY_COLLECTIONS=false \
>>>  -DCOPY_PERMISSIONS=false \
>>>  -DCOPY_PROPERTIES=true \
>>>  -DCOPY_QUALITY=false \
>>>  -DINPUT_BATCH_SIZE=10 \
>>>  -DINPUT_QUERY_CACHABLE \
>>>  -DTHREADS=8  \
>>>     com.marklogic.ps.xqsync.XQSync
>>> 
>>> These are the startup messages from the log:
>>> 
>>> INFO: XQSync starting: version 2009-03-10.1 on 1.6.0_26 (Java(TM) SE
>>> Runtime Environment)
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSync main
>>> INFO: XCC version = 3.2-7
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>> INFO: starting pool of 8 threads, queue size = 10000
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.Monitor run
>>> INFO: starting
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>> INFO: output version info: client 3.2-7, server 4.1-11
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>> INFO: input version info: client 3.2-7, server 4.1-11
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>>> queueFromInputConnection
>>> INFO: buffer size = 0, caching = false
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager getUrisRequest
>>> INFO: listing all documents (with uri lexicon)
>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>>> queueFromInputConnection
>>> 
>>> The connector is a bit old: Can I expect any substantial improvement
>>> from updating that?
>>> 
>>> -- 
>>> Michael Sokolov
>>> Engineering Director
>>> www.ifactory.com
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>>     
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>   
> 



More information about the General mailing list