[MarkLogic Dev General] xqsync throughput

Mike Sokolov sokolov at ifactory.com
Wed Mar 14 17:35:20 PDT 2012


Once I set INPUT_QUERY_CACHABLE=true, I ran out of memory (it's around 
11M docs).  Then I increased the max heap to 4000MB, which seems to be 
enough - no more failures.  Throughput increased to 88kBps, which is 
better, but let's see how far we can take this.  At this point the 
things I have it in mind to try are:

1) upgrade xqsync (it's not clear if that will help since the URI 
fetching seems to be working OK now with more memory, but maybe there is 
some other magic goodness in there?)
2) add more threads

-Mike

On 3/14/2012 7:52 PM, Michael Blakeley wrote:
> If xqsync is failing at that point, I doubt INPUT_QUERY_CACHEABLE will help. How many documents are you trying to process? 1-GB may not be enough JVM space.
>
> -- Mike
>
> On 14 Mar 2012, at 20:26 , Mike Sokolov wrote:
>
>> Thanks for the suggestions, Mike.  I discovered that
>>
>> DINPUT_QUERY_CACHABLE
>>
>> wasn't true, so I am trying that now; the process kept failing to retrieve uris, so maybe if we fetch them all up front?
>>
>> I looked at the networking a bit - pings are ~ 0.15 ms and I am seeing sustained transfer rates as high as 84MB/s using scp - I think I'd get more with larger files.  Also the servers don't seem busy - I am running xqsync on the destination box, which I suppose might not be ideal, but uses less network anyway - it is maxing out one of the cpus during the initial fetch of all the uris (over 10m of them) now that cachable=true.  Maybe there is a problem deep paging into the cts:uris query when it is not cached?
>>
>> I'll report back once the data actually starts transferring
>>
>> -Mike
>>
>>
>> On 03/14/2012 10:14 AM, Michael Blakeley wrote:
>>> I would expect better than that. What is the document rate?
>>>
>>> You may not have enough client threads to keep the servers busy. What does the utilization look like on both sides?
>>>
>>> You may also be memory-limited in the JVM at some point, especially if the documents are big. If so, the JVM will spend a lot of time running the garbage collector. You can check that idea with the '-verbose:gc' option.
>>>
>>> Could there be a network limitation other than bandwidth? You might check that by exporting to packages instead, and see what that performance looks like. I have seen some cases where there was a slow hop on the network, or where a firewall was limiting performance.
>>>
>>> -- Mike
>>>
>>> On 14 Mar 2012, at 13:38 , Mike Sokolov wrote:
>>>
>>>
>>>> I wonder if anyone has a rough guide to what sort of transfer speeds can
>>>> be expected using xqsync to transfer a database from one node to
>>>> another.  I have two quite beefy servers on the same LAN (at least
>>>> 100Mb/s ~ 12MB/s), and I'm only getting ~30kB/sec.  I was hoping to get
>>>> a few orders of magnitude more, but am I smoking crack?  Is there
>>>> something I could be doing or not doing that might be limiting the speed
>>>> somehow?
>>>>
>>>> This is my setup:
>>>>
>>>> java -cp ${BIN}/xqsync.jar:$BIN/xcc.jar:$BIN/xstream.jar:$BIN/xpp3.jar
>>>> -Xmx1024m \
>>>>   -DINPUT_CONNECTION_STRING=$SRCDB \
>>>>   -DOUTPUT_CONNECTION_STRING=$DSTDB \
>>>>   -DSKIP_EXISTING=true \
>>>>   -DCOPY_COLLECTIONS=false \
>>>>   -DCOPY_PERMISSIONS=false \
>>>>   -DCOPY_PROPERTIES=true \
>>>>   -DCOPY_QUALITY=false \
>>>>   -DINPUT_BATCH_SIZE=10 \
>>>>   -DINPUT_QUERY_CACHABLE \
>>>>   -DTHREADS=8  \
>>>>      com.marklogic.ps.xqsync.XQSync
>>>>
>>>> These are the startup messages from the log:
>>>>
>>>> INFO: XQSync starting: version 2009-03-10.1 on 1.6.0_26 (Java(TM) SE
>>>> Runtime Environment)
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSync main
>>>> INFO: XCC version = 3.2-7
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>>> INFO: starting pool of 8 threads, queue size = 10000
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.Monitor run
>>>> INFO: starting
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>>> INFO: output version info: client 3.2-7, server 4.1-11
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>>> INFO: input version info: client 3.2-7, server 4.1-11
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>>>> queueFromInputConnection
>>>> INFO: buffer size = 0, caching = false
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager getUrisRequest
>>>> INFO: listing all documents (with uri lexicon)
>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>>>> queueFromInputConnection
>>>>
>>>> The connector is a bit old: Can I expect any substantial improvement
>>>> from updating that?
>>>>
>>>> -- 
>>>> Michael Sokolov
>>>> Engineering Director
>>>> www.ifactory.com
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> General at developer.marklogic.com
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>



More information about the General mailing list