[MarkLogic Dev General] Performance of fn:exists(fn:doc($uri))

Michael Blakeley mike at blakeley.com
Thu Aug 30 08:05:06 PDT 2012


Why the reluctance to use a technique that is recommended by MarkLogic support?

That blog post uses xdmp:request. Care to guess how xdmp:request ids are generated? Yep, xdmp:random again. Oh, and generate-id? Random.

So this technique reduces to random + dateTime + random, and is still theoretically subject to collisions. It gets worse, because request ids and node ids aren't as unique as you might think. Request ids only have to be unique for a single host and app-server: note that xdmp:request-cancel takes a host, server, *and* request ids. Node ids only have to be unique for a single request, and I'm not sure if that's even enforced.

In conclusion you *can* use request + dateTime + generate-id, but collisions are still possible. So check for uniqueness using exists(doc($uri)), and recurse if necessary. Take the read lock: that's what it's there for.

-- Mike

On 30 Aug 2012, at 06:47 , Ryan Dew wrote:

> It seems like it would be better to focus on using something more likely to be unique than xdmp:random rather than focusing on read locks. 
> 
> http://maxdewpoint.blogspot.com/2012/08/generate-unique-ids-for-collision.html
> 
> -Ryan Dew 
> 
> On Aug 29, 2012, at 11:25 PM, Geert Josten <geert.josten at dayon.nl> wrote:
> 
>> Hi Mike,
>> 
>> Not quite sure, but the conflict occurs when the uri doesn't exist yet, so
>> there would be nothing to lock. Does that still create a read-lock?
>> 
>> And in case the uri does exist, wouldn't this create potentially a lot of
>> unnecessary read-locks (in case it takes a lot of attempts to find an
>> unused uri)?
>> 
>> Kind regards,
>> Geert
>> 
>> -----Oorspronkelijk bericht-----
>> Van: general-bounces at developer.marklogic.com
>> [mailto:general-bounces at developer.marklogic.com] Namens Michael Blakeley
>> Verzonden: woensdag 29 augustus 2012 21:35
>> Aan: MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] Performance of
>> fn:exists(fn:doc($uri))
>> 
>> No, you can't do that safely because cts:uris-match won't take a
>> read-lock. You are opening yourself up to a race condition. And in some
>> circumstances it will be slower than the recommended technique. There
>> seems to be a popular idea that cts:uris-match() is always fastest, but
>> that is not always true.
>> 
>> The recommended technique is probably the fastest way to guarantee a new,
>> unique URI. If you are going through the process of inserting a new
>> document, this technique adds very little extra work. The document-insert
>> itself always has to look for an existing document, because it might be
>> replacing an existing document or it might be inserting a new document. It
>> always has to write-lock the URI. So the extra exists() call merely
>> repeats the URI lookup, which is cheap because it will be cached for the
>> xdmp:document-insert call, and also gets a read-lock before
>> xdmp:document-insert gets the write lock. In the vanishingly rare event
>> that xdmp:random() produces an existing URI, this extra work is repeated -
>> but is still quite cheap.
>> 
>> -- Mike
>> 
>> On 29 Aug 2012, at 12:29 , William Merritt Sawyer wrote:
>> 
>>> If you have the uri-lexicon turned on you can use
>> cts:uri-match(fn:concat("/document-", xdmp:random(), ".xml"))
>>> 
>>> From: general-bounces at developer.marklogic.com
>> [mailto:general-bounces at developer.marklogic.com] On Behalf OfDanny Sinang
>>> Sent: Wednesday, August 29, 2012 12:33 PM
>>> To: MarkLogic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] Performance of
>> fn:exists(fn:doc($uri))
>>> 
>>> Thanks Geert.
>>> 
>>> I did try fn:exists(fn:doc($uri))  on CQ before your response came in
>> and found it to be fast.
>>> 
>>> The locking / prevention of duplicate id's is discussed in
>> http://markmail.org/message/mm5vtacpdzwfy44j  .
>>> 
>>> Regards,
>>> Danny
>>> 
>>> On Wed, Aug 29, 2012 at 2:23 PM, Geert Josten <geert.josten at dayon.nl>
>> wrote:
>>> Hi Danny,
>>> 
>>> Performance should be easy to measure. Call the function from within
>> QConsole x number of time and request profile output. Do the same while
>> using xdmp:exists instead of fn:exists. That function works only on
>> (partially) searchable expression, because it doesn't retrieve the actual
>> content. It won't create a read-lock either, but I'm not sure why you want
>> one. It won't prevent duplicate id's from being generated in concurrent
>> requests..
>>> 
>>> Kind regards,
>>> Geert
>>> 
>>> Van: general-bounces at developer.marklogic.com
>> [mailto:general-bounces at developer.marklogic.com] Namens Danny Sinang
>>> Verzonden: woensdag 29 augustus 2012 19:11
>>> Aan: general
>>> Onderwerp: [MarkLogic Dev General] Performance of
>> fn:exists(fn:doc($uri))
>>> 
>>> Hi,
>>> 
>>> ML support suggested we do this to generate a unique ID for our
>> documents :
>>> 
>>> declare function choose-uri() as xs:string
>>>    {
>>>       let $uri := fn:concat("/document-", xdmp:random(), ".xml")
>>>       return if (fn:exists(fn:doc($uri))) then choose-uri() else $uri
>>>    };
>>> 
>>> My question is, will the call to fn:exists(fn:doc($uri)) be fast,
>> considering that we now have 8 million documents ?
>>> 
>>> The fn:exists(fn:doc($uri)) call is needed to obtain a read lock, which
>> will be upgraded to a write lock when xdmp:document-insert is called.
>>> 
>>> Regards,
>>> Danny
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list