[MarkLogic Dev General] How to give hints to MarkLogic on which condition is faster to check first?

semerau at hotmail.com semerau at hotmail.com
Fri Apr 20 20:54:43 PDT 2012


So could I do :
cts:search(/,    cts:and-query(        cts:inexpensive-query...        ,        cts:expensive-query...    ))
and MarkLogic will check the first condition (cts:inexpensive-query) first and only check the second condition if the first is true?

CC: general at developer.marklogic.com
From: mike at blakeley.com
Date: Fri, 20 Apr 2012 19:49:17 -0700
To: general at developer.marklogic.com
Subject: Re: [MarkLogic Dev General] How to give hints to MarkLogic on which	condition is faster to check first?

Yes, boolean ops will short-circuit. You can test this for yourself using xdmp:sleep and xdmp:elapsed-time.
-- Mike
On Apr 20, 2012, at 19:15, "semerau at hotmail.com" <semerau at hotmail.com> wrote:





I may have some queries where the comparison is expensive. So what I'd like to do is add an extra element in each doc which is a "shortcut" to check for first before doing the expensive comparison.
For example, so suppose I had data that had randoms words ("boat", "alligator", "house") in an element called "words" and there was a random number of words in the element (say, 5 to 50). Other documents may have the same words but in a different order. I want to find the documents that have the same number of words and the same words regardless of the order.
So 
1: <words>boat alligator house bandit flower</words>
and 2: <words>bandit alligator bandit flower boat</words>would be a match
but 3: <words>bandit alligator bandit flower boat island</words>would not be a match because is has an extra word
I thought that I can add a new element to each doc which represents the number of words (5, 6, 11, 20, etc) and I can first check that the doc has the right number of words before I check to see if it has the same words. I am thinking the extra check on the number of words would shortcut the query to not even bother checking individual words if the number of words doesn't match and save me some time. The time may add up if I have millions or tens of millions of docs to query against.
So if my thinking is correct, then I would have documents that look like this:
<doc>    <words>bandit alligator bandit flower boat</words>    <num-words>5</num-words></doc>
I could put a range index on the "num-words" element of type xs:int.
Then I'd like to write queries so that the num-words condition is checked first by the magical MarkLogic engine and only if that first condition is met would it check the rest.
I know in Java that the runtime environment won't check the second condition if the first is false in a boolean statement. So:
if (1 == 0 && explode()) { ....
"explode()" will never be called because the first condition in the statement is false. But the order is important; "1 == 0" must be before "explode()" in the statement because that statement will be evaluated from left to right.
I don't know if XQuery or MarkLogic works that way (didn't see anything in the spec) and I know that MarkLogic has all sorts of optimizations, but how will it know that it's faster to check the "num-words" condition before the individual words? Can I write a cts:query that gives a hint to MarkLogic to give precedence to one condition over another to save time? *I* know that the num-words check is faster but how can *MarkLogic* know that? 
I suppose it could be argued that it doesn't really matter because MarkLogic runs fast anyway, but I'm talking about long running queries over massive data sets so even small amounts of time are important to me.
Thanks!
-Ryan 		 	   		  
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120420/876f5abc/attachment-0001.html 


More information about the General mailing list