[MarkLogic Dev General] Advice on improving "join" on attribute performance

David Lee David.Lee at marklogic.com
Fri Mar 16 09:16:04 PDT 2012


As implied by Geert  but maybe not obvious.
If you have a full document containing things like this ( tables of strings with ids), the indexing isn't going to help much at  all if any.   Indexing is from   Term -> Fragment.   Since this is in one document all the indexing does is say "Yea that ID is in the document" ... and since your already using doc() that doesn't add much value.
You could try making the elements individual fragments (via fragment or fragment root properties on the server) which then will make the  indexing much more useful.   OTOH all those small fragments have a cost too.  But not much.  You could also seperate out the doc into lots of little docs ... to much the same advantage. Fragments and small docs are about the same overhead.


Or in this case since it does actually all fit in memory .. using a map may be the best choice.

Lots of choices. have fun and let us know what works for you !

-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee at marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Geert Josten
Sent: Friday, March 16, 2012 12:00 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Advice on improving "join" on attribute performance

Hi Nick,

I guess David is referring to the xdmp:get-server-field() and xdmp:set-server-field() functions (http://community.marklogic.com/pubs/5.0/apidocs/AppServerBuiltins.html#xdmp:get-server-field). Make sure to check whether it is initialized. You could also insert the map:map into the database, but retrieval from database might be slower. Would be beneficial if you'd need to share info among hosts or initialization would be relatively slow, but perhaps that is not the case here.

I'm expecting ordering on $x/@id to be slow because there is no range index on it. Results might improve if you'd index the id attrib on elementa,elementb,etc. (You can supply multiple element names in a single index.) I also think the ordering might perform best if elementa, elementb, etc are declared as fragment roots, or stored as individual documents (not sure that would fit you data approach)..

Personally, I prefer to rely on the search and lexicon function of MarkLogic explicitly. That helps you writing you logic such that you optimally use the indexes, and you don't depend on the optimizer so much to translate your code to use the indexes.

Kind regards,
Geert


Van: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] Namens Nick Tuckett
Verzonden: vrijdag 16 maart 2012 16:28
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Advice on improving "join" on attribute performance

Many thanks, David - just tried creating and serialising a map version of the language data, and then using it inside my test query - massive speedup as you suggested; less than two seconds for the "experienced" user elapsed time, <0.24 seconds reported by the profiler and only a small transient memory hit (~1%).

Please would you point me at appropriate documentation on system global properties?

On 16 March 2012 14:26, David Lee <David.Lee at marklogic.com<mailto:David.Lee at marklogic.com>> wrote:
That is a perfect use case for maps.
If the file doesn't change often you could even set it as a system global property.

Let me know if you'd like some sample code

-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee at marklogic.com<mailto:dlee at marklogic.com>
Phone: +1 650-287-2531<tel:%2B1%20650-287-2531>
Cell:  +1 812-630-7622<tel:%2B1%20812-630-7622>
www.marklogic.com<http://www.marklogic.com/>

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] On Behalf Of Nick Tuckett
Sent: Friday, March 16, 2012 10:26 AM

To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Advice on improving "join" on attribute performance

It's just shy of 1.5Mb for one language with just over 13000 entries, so that might be feasible...?

We've got localised text for eight languages, so if used in production that would be about 12Mb total.

_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120316/7a41ef64/attachment-0001.html 


More information about the General mailing list