[MarkLogic Dev General] Peformance
issue (Merging)-XDMP:node-replace function
Michael Blakeley
michael.blakeley at marklogic.com
Wed Mar 18 07:57:57 PST 2009
Deshbir,
There isn't an exact answer to (1), but you can approximate by visiting
the forest status page in the admin UI. Divide the total on-disk size of
the forest by the total number of fragments. But this will always be an
approximation because there are several on-disk data structures which
have different overheads, grow at different rates, and interact in
complex ways.
The fragment rule has split your documents into an average of 20
fragments per document, so your average expanded fragment size is
probably about 180-kB. That's a little large, but might be perfectly ok
for your application. Ultimately, fragmentation is all about aligning
the physical storage and indexing of your XML with your application needs.
thanks,
-- Mike
On 2009-03-18 04:27, Deshbir wrote:
> Mike,
>
> Thank a lot for your inputs.
>
> You were right! The slow query performance (for xdmp update functions) was not related to the merges. We verified by temporarily disabling merging, and found it had none or very little impact on the query performance.
>
> Your suggestion on fragmenting the document (we used Fragment Parents) also worked very well. After applying a fragmentations rule (and re-indexing the database) the query performance (for xdmp update functions) improved significantly. We did not notice any negative impact on the data-loading queries.
>
> Before Fragmentation (i.e. default Mark logic settings)
> - Total Size of Database: 25 MB
> - No of documents: 41
> - No. of fragments: 101
>
> After Fragmentation
> - Total Size of Database: 29 MB (Not sure why this changed)
> - No of documents: 41
> - No. of fragments: 811
>
> Some more questions for you:
>
> 1. How does one accurately determine the size of a document (or an element) in Marklogic? I presume that size of an exported XML file on disk is not the same as the size of the same document in Marklogic database? In our application the maximum size of the document on Disk (i.e. exported XML file) is 3.5 MB.
>
> 2. Do we still need to consider breaking up our documents (currently 3.5 MB on disk) into smaller pieces? Or does fragment roots/parents have the same effect? Note, in our case the documents are dynamic i.e. the application regularly create/modifies documents in the MarkLogic database.
>
> Once again, thanks for all the help.
>
> Regards,
> Deshbir
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Michael Blakeley
> Sent: Monday, March 16, 2009 10:31 PM
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Peformance issue (Merging) - XDMP:node-replace function
>
> Deshbir,
>
> You can learn more about merges by reading our admin guide, available
> via http://developer.marklogic.com/pubs
>
> Merges are asynchronous with respect to queries, but they can compete
> with queries for system resources. I suspect that's a false trail, though.
>
> How large is the document on which node-replace is running? If you do
> see a that ErrorLog extract after every node-replace, that suggests a
> document size of 5-20 MB. If so, you should consider breaking up your
> documents into smaller ones, or possibly use a fragment root or fragment
> parent (fragments are also discussed in the admin guide).
>
> -- Mike
>
> On 2009-03-16 04:06, deshbir.dugal at comprotechnologies.com wrote:
>> Hello,
>>
>> We are experiencing extremely slow XQUERY performance for the XDMP:node-replace function. Following is an XQUERY snippet that consistently takes more than 5 secs on one of our servers (Mark Logic 3.2).
>> ============================================
>> let $docbookNode :=<p>hello</p>
>> let $path := doc(".....")/../../
>> return
>> xdmp:node-replace($path,$docbookNode)
>> ============================================
>> On another (different) Mark Logic installation (3.2), the same code takes consistently less that 300 milliseconds!
>>
>> We've compared the server settings and they appear to be the same across both servers (they are probably the default Mark Logic installation settings)
>>
>> On examining the log folder, we found that every time an "xdmp:node-replace" was executed, the following lines are being added to the error log file:
>> ============================================
>> 2009-03-16 06:45:32.114 Info: Saving C:\Program Files\MarkLogic\Data\Forests\Documents\00000470
>> 2009-03-16 06:45:32.880 Info: Saved 15 MB in 1 sec at 15 MB/sec to C:\Program Files\MarkLogic\Data\Forests\Documents\00000470
>> 2009-03-16 06:45:33.036 Info: Merging 62 MB from C:\Program Files\MarkLogic\Data\Forests\Documents\0000046f and C:\Program Files\MarkLogic\Data\Forests\Documents\00000470 to C:\Program Files\MarkLogic\Data\Forests\Documents\00000471
>> 2009-03-16 06:45:37.661 Info: Merged 55 MB in 5 sec at 11 MB/sec to C:\Program Files\MarkLogic\Data\Forests\Documents\00000471
>> 2009-03-16 06:45:37.958 Info: Deleted C:\Program Files\MarkLogic\Data\Forests\Documents\0000046f
>> 2009-03-16 06:45:38.098 Info: Deleted C:\Program Files\MarkLogic\Data\Forests\Documents\00000470
>> ============================================
>>
>> What is going wrong here? What could be causing all the "merging" activity?
>>
>> Thanks in advance.
>>
>> Regards,
>> Deshbir
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
More information about the General
mailing list