[MarkLogic Dev General] Performance of fn:exists(fn:doc($uri))

William Merritt Sawyer william.sawyer at ldschurch.org
Thu Aug 30 14:21:03 PDT 2012


I would think cts:frequency would be faster.  Would be something like this:

for $value in cts:element-values(xs:QName("assetId"))
let $frequency := cts:frequency($value)
where $frequency > 1
return fn:concat($value, ":", $frequency)

-Will

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny Sinang
Sent: Thursday, August 30, 2012 2:49 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Performance of fn:exists(fn:doc($uri))

Hi Geert,

I was pertaining to an element value that's in more than 1 document.

I tried this but it took so long it timed out :

for $assetId in (cts:element-values(xs:QName("assetId")))
let $count := xdmp:estimate(cts:search(/asset, cts:element-value-query(xs:QName("assetId"), $assetId)))
return
if ($count > 1) then
   fn:concat($assetId, ", ", $count)
else
   ()

Any way to improve on this ?

Regards,
Danny
On Thu, Aug 30, 2012 at 3:21 PM, Geert Josten <geert.josten at dayon.nl<mailto:geert.josten at dayon.nl>> wrote:
Hi Danny,

Are you talking about duplicate uri's? That is normally not possible. If you mean some element value that occurs in more than one document, do something like this:

xdmp:estimate(cts:search(doc(), cts:element-value-query(xs:QName('myelem'), 'myid')))

Kind regards,
Geert

Van: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] Namens Danny Sinang
Verzonden: donderdag 30 augustus 2012 21:13
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Performance of fn:exists(fn:doc($uri))

What I meant was, what's the fastest way to check for documents with duplicate id's ?
On Thu, Aug 30, 2012 at 3:00 PM, Danny Sinang <d.sinang at gmail.com<mailto:d.sinang at gmail.com>> wrote:
Can anyone recommend a fast way to check for duplicate id's ?
On Thu, Aug 30, 2012 at 2:06 PM, Geert Josten <geert.josten at dayon.nl<mailto:geert.josten at dayon.nl>> wrote:
That read-locks are URI locks, not fragment locks is something I didn't
know. Sounds excellent, should have known earlier..

And now the internal code MarkLogic uses to generate id's for all its
internal objects makes much more sense too..

Mike wrote:
> To put it more simply: how are you going to guarantee the uniqueness of
the URI, if not by checking to see if it exists?

I can only think of one other way, by using a write lock on a fixed uri
(or several fixed uri's), like always doing a
xdmp:document-insert('/assets/lock', <x/>) before deriving a new uri. But
that slows down creation processes, likely more than using the read-lock
approach. :-/

That leaves perhaps only one thing that need attention. If you already
have many documents, then the likeliness random comes up with an id that
already exists increases. The average number of attempts it needs to take
to find an unused number increases over time too. Luckely the range of
random is very large (20 digits), so you really need quite a very lot of
documents to even get close to 1/100000 of the space..

:)

Grtz,
Geert
_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general



_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120830/a41e9317/attachment.html 


More information about the General mailing list