[MarkLogic Dev General] xqsync throughput

Mike Sokolov sokolov at ifactory.com
Thu Mar 15 14:55:14 PDT 2012


For anyone who's curious: I upgraded xqsync and the xcc connector and 
doubled the number of threads (to 16), and I'm now getting 100-115 
kB/s.  I'm leaving it at that.  Even if I could get it to go faster, I 
don't know want to dominate the server too much.  Thanks for the help.

-Mike

On 03/14/2012 08:35 PM, Mike Sokolov wrote:
> Once I set INPUT_QUERY_CACHABLE=true, I ran out of memory (it's around
> 11M docs).  Then I increased the max heap to 4000MB, which seems to be
> enough - no more failures.  Throughput increased to 88kBps, which is
> better, but let's see how far we can take this.  At this point the
> things I have it in mind to try are:
>
> 1) upgrade xqsync (it's not clear if that will help since the URI
> fetching seems to be working OK now with more memory, but maybe there is
> some other magic goodness in there?)
> 2) add more threads
>
> -Mike
>
> On 3/14/2012 7:52 PM, Michael Blakeley wrote:
>    
>> If xqsync is failing at that point, I doubt INPUT_QUERY_CACHEABLE will help. How many documents are you trying to process? 1-GB may not be enough JVM space.
>>
>> -- Mike
>>
>> On 14 Mar 2012, at 20:26 , Mike Sokolov wrote:
>>
>>      
>>> Thanks for the suggestions, Mike.  I discovered that
>>>
>>> DINPUT_QUERY_CACHABLE
>>>
>>> wasn't true, so I am trying that now; the process kept failing to retrieve uris, so maybe if we fetch them all up front?
>>>
>>> I looked at the networking a bit - pings are ~ 0.15 ms and I am seeing sustained transfer rates as high as 84MB/s using scp - I think I'd get more with larger files.  Also the servers don't seem busy - I am running xqsync on the destination box, which I suppose might not be ideal, but uses less network anyway - it is maxing out one of the cpus during the initial fetch of all the uris (over 10m of them) now that cachable=true.  Maybe there is a problem deep paging into the cts:uris query when it is not cached?
>>>
>>> I'll report back once the data actually starts transferring
>>>
>>> -Mike
>>>
>>>
>>> On 03/14/2012 10:14 AM, Michael Blakeley wrote:
>>>        
>>>> I would expect better than that. What is the document rate?
>>>>
>>>> You may not have enough client threads to keep the servers busy. What does the utilization look like on both sides?
>>>>
>>>> You may also be memory-limited in the JVM at some point, especially if the documents are big. If so, the JVM will spend a lot of time running the garbage collector. You can check that idea with the '-verbose:gc' option.
>>>>
>>>> Could there be a network limitation other than bandwidth? You might check that by exporting to packages instead, and see what that performance looks like. I have seen some cases where there was a slow hop on the network, or where a firewall was limiting performance.
>>>>
>>>> -- Mike
>>>>
>>>> On 14 Mar 2012, at 13:38 , Mike Sokolov wrote:
>>>>
>>>>
>>>>          
>>>>> I wonder if anyone has a rough guide to what sort of transfer speeds can
>>>>> be expected using xqsync to transfer a database from one node to
>>>>> another.  I have two quite beefy servers on the same LAN (at least
>>>>> 100Mb/s ~ 12MB/s), and I'm only getting ~30kB/sec.  I was hoping to get
>>>>> a few orders of magnitude more, but am I smoking crack?  Is there
>>>>> something I could be doing or not doing that might be limiting the speed
>>>>> somehow?
>>>>>
>>>>> This is my setup:
>>>>>
>>>>> java -cp ${BIN}/xqsync.jar:$BIN/xcc.jar:$BIN/xstream.jar:$BIN/xpp3.jar
>>>>> -Xmx1024m \
>>>>>    -DINPUT_CONNECTION_STRING=$SRCDB \
>>>>>    -DOUTPUT_CONNECTION_STRING=$DSTDB \
>>>>>    -DSKIP_EXISTING=true \
>>>>>    -DCOPY_COLLECTIONS=false \
>>>>>    -DCOPY_PERMISSIONS=false \
>>>>>    -DCOPY_PROPERTIES=true \
>>>>>    -DCOPY_QUALITY=false \
>>>>>    -DINPUT_BATCH_SIZE=10 \
>>>>>    -DINPUT_QUERY_CACHABLE \
>>>>>    -DTHREADS=8  \
>>>>>       com.marklogic.ps.xqsync.XQSync
>>>>>
>>>>> These are the startup messages from the log:
>>>>>
>>>>> INFO: XQSync starting: version 2009-03-10.1 on 1.6.0_26 (Java(TM) SE
>>>>> Runtime Environment)
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSync main
>>>>> INFO: XCC version = 3.2-7
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>>>> INFO: starting pool of 8 threads, queue size = 10000
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.Monitor run
>>>>> INFO: starting
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>>>> INFO: output version info: client 3.2-7, server 4.1-11
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run
>>>>> INFO: input version info: client 3.2-7, server 4.1-11
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>>>>> queueFromInputConnection
>>>>> INFO: buffer size = 0, caching = false
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager getUrisRequest
>>>>> INFO: listing all documents (with uri lexicon)
>>>>> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager
>>>>> queueFromInputConnection
>>>>>
>>>>> The connector is a bit old: Can I expect any substantial improvement
>>>>> from updating that?
>>>>>
>>>>> -- 
>>>>> Michael Sokolov
>>>>> Engineering Director
>>>>> www.ifactory.com
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General at developer.marklogic.com
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>>
>>>>>            
>>>> _______________________________________________
>>>> General mailing list
>>>> General at developer.marklogic.com
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>>          
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>    


More information about the General mailing list