[MarkLogic Dev General] Performance of fn:exists(fn:doc($uri))

Danny Sinang d.sinang at gmail.com
Thu Aug 30 13:57:01 PDT 2012


Sorry about that.

Here's a better listing of the functions involved :


declare function xutils:buildUri($id as  xs:string, $type as xs:string)  as
xs:string  {

if($type eq "asset") then fn:concat("/assets/", $id, ".xml")
  else if($type eq "group") then fn:concat("/groups/", $id, ".xml")
  else ""
};

declare function xutils:choose-uri() as xs:string
{
   let $uri :=
xutils:buildUri(xs:string(xdmp:random($vars:ASSETS-INIT-ID)), "asset")
   return
          if (fn:exists(fn:doc($uri))) then
              choose-uri()
          else
              $uri
};

declare function xutils:assets-uuid() {
(:
xutils:uuid($vars:ASSETS-INIT-ID, $vars:ASSETS-ID-FILE)
:)

    let $uri := xutils:choose-uri()
    let $assetId :=
            fn:replace(fn:replace(
            $uri,
            "/assets/","")
            ,"\.xml","")
    return $assetId

};

declare function addAsset($doc as element(asset), $user as xs:string) {
let $assetId := xutils:assets-uuid()
 return
updateAsset($assetId, $doc, $user)
};

declare function updateAsset($assetId as xs:string, $doc as element(asset),
$user as xs:string) {
let $isExistingUser :=  xutils:doesUserExist($user)
 let $assetUri := xutils:buildUri($assetId, "asset")

        ...

 let $insert := xdmp:document-insert($assetUri, $assetDoc,
xdmp:default-permissions(), vars:getCollections("assets"), 0,
vars:forest-ids("assets"))
 return $assetId
};



On Thu, Aug 30, 2012 at 4:06 PM, Michael Blakeley <mike at blakeley.com> wrote:

> Prefixing the URI? I think your sample code was edited down too much.
>
> If you are rewriting the output of choose-uri in any way, you are no
> longer checking and using the same URI. If you want some sort of dynamic
> prefix, add it as an argument to choose-uri.
>
> -- Mike
>
> On 30 Aug 2012, at 09:02 , Danny Sinang wrote:
>
> > Hi Mike,
> >
> > I cut out the code where $assetUri was defined. But yes, it's the output
> of choose-uri().
> >
> > The call to xutils:buildUri() prefixes the uri with "/assets/".
> >
> > Thanks for the link to directory assistance. I may need to set
> "directory creation" to manual .
> >
> > Regards,
> > Danny
> >
> > On Thu, Aug 30, 2012 at 11:47 AM, Michael Blakeley <mike at blakeley.com>
> wrote:
> > I say: the value of $assetUri is undefined.
> >
> > But if $assetUri is simply whatever comes back from choose-uri(), then
> yes. That's how updates work: any reads in an update take read-locks, and
> writes will upgrade those locks as needed.
> http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/dev_guide/transactions.xmltries to explain this, especially
> http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/dev_guide/transactions.xml%2340227
> >
> > BTW consider writing choose-uri to use something like 'asset/' + random.
> That way you can use 'asset/' as a directory prefix with xdmp:directory()
> or cts:directory-query(). Along those lines,
> http://blakeley.com/blogofile/2012/03/19/directory-assistance/ might
> interest you too.
> >
> > -- Mike
> >
> > On 30 Aug 2012, at 08:15 , Danny Sinang wrote:
> >
> > > Hi Mike,
> > >
> > > When I call addAsset (see code below), will the read lock that happens
> inside xutils:choose-uri() be upgraded to a write lock by the time
> xdmp:document-insert() is called inside updateAsset() ?
> > >
> > > Regards,
> > > Danny
> > >
> > > declare function xutils:choose-uri() as xs:string
> > > {
> > >    let $uri := xutils:buildUri(xdmp:random(), "asset")
> > >    return
> > >           if (fn:exists(fn:doc($uri))) then
> > >               choose-uri()
> > >           else
> > >               $uri
> > > };
> > >
> > > declare function xutils:assets-uuid() {
> > >     let $uri := xutils:choose-uri()
> > >     let $assetId :=
> > >             fn:replace(fn:replace(
> > >             fn:base-uri($uri),
> > >             "/assets/","")
> > >             ,"\.xml","")
> > >     return $assetId
> > > };
> > >
> > > declare function addAsset($doc as element(asset), $user as xs:string) {
> > >       let $assetId := xutils:assets-uuid()
> > >       return
> > >               updateAsset($assetId, $doc, $user)
> > > };
> > >
> > > declare function updateAsset($assetId as xs:string, $doc as
> element(asset), $user as xs:string) {
> > >         (: cut out some code for brevity :)
> > >
> > >       let $insert := xdmp:document-insert($assetUri, $assetDoc,
> xdmp:default-permissions(), vars:getCollections("assets"), 0,
> vars:forest-ids("assets"))
> > >       return $assetId
> > > };
> > >
> > >
> > > On Thu, Aug 30, 2012 at 11:05 AM, Michael Blakeley <mike at blakeley.com>
> wrote:
> > > These are URI locks, not fragment locks. The URI doesn't have to exist
> in order to create the lock. The point is to guarantee read-consistency for
> the update, so that the if-then-else expression operates reliably.
> > >
> > > The case where the URI does exists would be vanishingly rare, since
> xdmp:random() returns a 64-bit pseudo-random unsigned long. You could test
> the cost by using a smaller random space, if you were interested. But you
> can't simply drop the read locks without sacrificing the guarantee of
> uniqueness. So if you do end up taking extra read locks, they are quite
> necessary.
> > >
> > > To put it more simply: how are you going to guarantee the uniqueness
> of the URI, if not by checking to see if it exists?
> > >
> > > -- Mike
> > >
> > > On 29 Aug 2012, at 22:25 , Geert Josten wrote:
> > >
> > > > Hi Mike,
> > > >
> > > > Not quite sure, but the conflict occurs when the uri doesn't exist
> yet, so
> > > > there would be nothing to lock. Does that still create a read-lock?
> > > >
> > > > And in case the uri does exist, wouldn't this create potentially a
> lot of
> > > > unnecessary read-locks (in case it takes a lot of attempts to find an
> > > > unused uri)?
> > > >
> > > > Kind regards,
> > > > Geert
> > > >
> > > > -----Oorspronkelijk bericht-----
> > > > Van: general-bounces at developer.marklogic.com
> > > > [mailto:general-bounces at developer.marklogic.com] Namens Michael
> Blakeley
> > > > Verzonden: woensdag 29 augustus 2012 21:35
> > > > Aan: MarkLogic Developer Discussion
> > > > Onderwerp: Re: [MarkLogic Dev General] Performance of
> > > > fn:exists(fn:doc($uri))
> > > >
> > > > No, you can't do that safely because cts:uris-match won't take a
> > > > read-lock. You are opening yourself up to a race condition. And in
> some
> > > > circumstances it will be slower than the recommended technique. There
> > > > seems to be a popular idea that cts:uris-match() is always fastest,
> but
> > > > that is not always true.
> > > >
> > > > The recommended technique is probably the fastest way to guarantee a
> new,
> > > > unique URI. If you are going through the process of inserting a new
> > > > document, this technique adds very little extra work. The
> document-insert
> > > > itself always has to look for an existing document, because it might
> be
> > > > replacing an existing document or it might be inserting a new
> document. It
> > > > always has to write-lock the URI. So the extra exists() call merely
> > > > repeats the URI lookup, which is cheap because it will be cached for
> the
> > > > xdmp:document-insert call, and also gets a read-lock before
> > > > xdmp:document-insert gets the write lock. In the vanishingly rare
> event
> > > > that xdmp:random() produces an existing URI, this extra work is
> repeated -
> > > > but is still quite cheap.
> > > >
> > > > -- Mike
> > > >
> > > > On 29 Aug 2012, at 12:29 , William Merritt Sawyer wrote:
> > > >
> > > >> If you have the uri-lexicon turned on you can use
> > > > cts:uri-match(fn:concat("/document-", xdmp:random(), ".xml"))
> > > >>
> > > >> From: general-bounces at developer.marklogic.com
> > > > [mailto:general-bounces at developer.marklogic.com] On Behalf OfDanny
> Sinang
> > > >> Sent: Wednesday, August 29, 2012 12:33 PM
> > > >> To: MarkLogic Developer Discussion
> > > >> Subject: Re: [MarkLogic Dev General] Performance of
> > > > fn:exists(fn:doc($uri))
> > > >>
> > > >> Thanks Geert.
> > > >>
> > > >> I did try fn:exists(fn:doc($uri))  on CQ before your response came
> in
> > > > and found it to be fast.
> > > >>
> > > >> The locking / prevention of duplicate id's is discussed in
> > > > http://markmail.org/message/mm5vtacpdzwfy44j  .
> > > >>
> > > >> Regards,
> > > >> Danny
> > > >>
> > > >> On Wed, Aug 29, 2012 at 2:23 PM, Geert Josten <
> geert.josten at dayon.nl>
> > > > wrote:
> > > >> Hi Danny,
> > > >>
> > > >> Performance should be easy to measure. Call the function from within
> > > > QConsole x number of time and request profile output. Do the same
> while
> > > > using xdmp:exists instead of fn:exists. That function works only on
> > > > (partially) searchable expression, because it doesn't retrieve the
> actual
> > > > content. It won't create a read-lock either, but I'm not sure why
> you want
> > > > one. It won't prevent duplicate id's from being generated in
> concurrent
> > > > requests..
> > > >>
> > > >> Kind regards,
> > > >> Geert
> > > >>
> > > >> Van: general-bounces at developer.marklogic.com
> > > > [mailto:general-bounces at developer.marklogic.com] Namens Danny Sinang
> > > >> Verzonden: woensdag 29 augustus 2012 19:11
> > > >> Aan: general
> > > >> Onderwerp: [MarkLogic Dev General] Performance of
> > > > fn:exists(fn:doc($uri))
> > > >>
> > > >> Hi,
> > > >>
> > > >> ML support suggested we do this to generate a unique ID for our
> > > > documents :
> > > >>
> > > >> declare function choose-uri() as xs:string
> > > >>    {
> > > >>       let $uri := fn:concat("/document-", xdmp:random(), ".xml")
> > > >>       return if (fn:exists(fn:doc($uri))) then choose-uri() else
> $uri
> > > >>    };
> > > >>
> > > >> My question is, will the call to fn:exists(fn:doc($uri)) be fast,
> > > > considering that we now have 8 million documents ?
> > > >>
> > > >> The fn:exists(fn:doc($uri)) call is needed to obtain a read lock,
> which
> > > > will be upgraded to a write lock when xdmp:document-insert is called.
> > > >>
> > > >> Regards,
> > > >> Danny
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> General mailing list
> > > >> General at developer.marklogic.com
> > > >> http://developer.marklogic.com/mailman/listinfo/general
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> General mailing list
> > > >> General at developer.marklogic.com
> > > >> http://developer.marklogic.com/mailman/listinfo/general
> > > >
> > > > _______________________________________________
> > > > General mailing list
> > > > General at developer.marklogic.com
> > > > http://developer.marklogic.com/mailman/listinfo/general
> > > > _______________________________________________
> > > > General mailing list
> > > > General at developer.marklogic.com
> > > > http://developer.marklogic.com/mailman/listinfo/general
> > > >
> > >
> > > _______________________________________________
> > > General mailing list
> > > General at developer.marklogic.com
> > > http://developer.marklogic.com/mailman/listinfo/general
> > >
> > > _______________________________________________
> > > General mailing list
> > > General at developer.marklogic.com
> > > http://developer.marklogic.com/mailman/listinfo/general
> >
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://developer.marklogic.com/mailman/listinfo/general
> >
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120830/dd1972f6/attachment-0001.html 


More information about the General mailing list