[MarkLogic Dev General] Performance of fn:exists(fn:doc($uri))

Michael Blakeley mike at blakeley.com
Thu Aug 30 08:05:10 PDT 2012


These are URI locks, not fragment locks. The URI doesn't have to exist in order to create the lock. The point is to guarantee read-consistency for the update, so that the if-then-else expression operates reliably.

The case where the URI does exists would be vanishingly rare, since xdmp:random() returns a 64-bit pseudo-random unsigned long. You could test the cost by using a smaller random space, if you were interested. But you can't simply drop the read locks without sacrificing the guarantee of uniqueness. So if you do end up taking extra read locks, they are quite necessary.

To put it more simply: how are you going to guarantee the uniqueness of the URI, if not by checking to see if it exists?

-- Mike

On 29 Aug 2012, at 22:25 , Geert Josten wrote:

> Hi Mike,
> 
> Not quite sure, but the conflict occurs when the uri doesn't exist yet, so
> there would be nothing to lock. Does that still create a read-lock?
> 
> And in case the uri does exist, wouldn't this create potentially a lot of
> unnecessary read-locks (in case it takes a lot of attempts to find an
> unused uri)?
> 
> Kind regards,
> Geert
> 
> -----Oorspronkelijk bericht-----
> Van: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] Namens Michael Blakeley
> Verzonden: woensdag 29 augustus 2012 21:35
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Performance of
> fn:exists(fn:doc($uri))
> 
> No, you can't do that safely because cts:uris-match won't take a
> read-lock. You are opening yourself up to a race condition. And in some
> circumstances it will be slower than the recommended technique. There
> seems to be a popular idea that cts:uris-match() is always fastest, but
> that is not always true.
> 
> The recommended technique is probably the fastest way to guarantee a new,
> unique URI. If you are going through the process of inserting a new
> document, this technique adds very little extra work. The document-insert
> itself always has to look for an existing document, because it might be
> replacing an existing document or it might be inserting a new document. It
> always has to write-lock the URI. So the extra exists() call merely
> repeats the URI lookup, which is cheap because it will be cached for the
> xdmp:document-insert call, and also gets a read-lock before
> xdmp:document-insert gets the write lock. In the vanishingly rare event
> that xdmp:random() produces an existing URI, this extra work is repeated -
> but is still quite cheap.
> 
> -- Mike
> 
> On 29 Aug 2012, at 12:29 , William Merritt Sawyer wrote:
> 
>> If you have the uri-lexicon turned on you can use
> cts:uri-match(fn:concat("/document-", xdmp:random(), ".xml"))
>> 
>> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf OfDanny Sinang
>> Sent: Wednesday, August 29, 2012 12:33 PM
>> To: MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] Performance of
> fn:exists(fn:doc($uri))
>> 
>> Thanks Geert.
>> 
>> I did try fn:exists(fn:doc($uri))  on CQ before your response came in
> and found it to be fast.
>> 
>> The locking / prevention of duplicate id's is discussed in
> http://markmail.org/message/mm5vtacpdzwfy44j  .
>> 
>> Regards,
>> Danny
>> 
>> On Wed, Aug 29, 2012 at 2:23 PM, Geert Josten <geert.josten at dayon.nl>
> wrote:
>> Hi Danny,
>> 
>> Performance should be easy to measure. Call the function from within
> QConsole x number of time and request profile output. Do the same while
> using xdmp:exists instead of fn:exists. That function works only on
> (partially) searchable expression, because it doesn't retrieve the actual
> content. It won't create a read-lock either, but I'm not sure why you want
> one. It won't prevent duplicate id's from being generated in concurrent
> requests..
>> 
>> Kind regards,
>> Geert
>> 
>> Van: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] Namens Danny Sinang
>> Verzonden: woensdag 29 augustus 2012 19:11
>> Aan: general
>> Onderwerp: [MarkLogic Dev General] Performance of
> fn:exists(fn:doc($uri))
>> 
>> Hi,
>> 
>> ML support suggested we do this to generate a unique ID for our
> documents :
>> 
>> declare function choose-uri() as xs:string
>>    {
>>       let $uri := fn:concat("/document-", xdmp:random(), ".xml")
>>       return if (fn:exists(fn:doc($uri))) then choose-uri() else $uri
>>    };
>> 
>> My question is, will the call to fn:exists(fn:doc($uri)) be fast,
> considering that we now have 8 million documents ?
>> 
>> The fn:exists(fn:doc($uri)) call is needed to obtain a read lock, which
> will be upgraded to a write lock when xdmp:document-insert is called.
>> 
>> Regards,
>> Danny
>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> 
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 



More information about the General mailing list