[MarkLogic Dev General] xpath string construction

Michael Blakeley michael.blakeley at marklogic.com
Tue Oct 14 09:34:41 PDT 2008


Eric,

My guess is that xdmp:request() is only unique at a given point in time. 
I'm basing this on the idea that, if xdmp:request is unique for all 
time, then the server would have to keep a record of all past request 
keys. I don't think the server does that.

I like to use xdmp:random() for ids. It isn't guaranteed to be unique, 
but it returns a 64-bit unsigned pseudo-random number so the chances of 
avoiding duplicates are quite good. Let's generate a million ids, and 
see if lightning strikes twice in the same place:

count(distinct-values(
   for $i in 1 to 1000 * 1000
   return xdmp:random()
))
=> 1000000

Looks good, and generating 150 ids takes 1-ms on my laptop.

But if you don't quite trust xdmp:random(), it's simple to write a 
recursive function that generates an id with xdmp:random(), checks to 
make sure the id is not in use, and recurs if the id-check fails. This 
can be quite lightweight - there will be no extra transactional locking 
unless one or more random numbers are already in use, and that should be 
very very rare. Here's one for documents with hexadecimal uris:

declare function get-id()
  as xs:string
{
   let $id := xdmp:integer-to-hex(xdmp:random())
   return
     if (not(doc($id))) then $id else get-id()
};

Note that if you want to call get-id() multiple times in one 
transaction, you need to track the ids that have already been issued in 
this transaction (just in case of duplicate random numbers). The chances 
are still very small, but we're being paranoid, right? This time we'll 
use the 'local' namespace....

declare variable $IDS as xs:string* := ();

declare function local:get-id()
  as xs:string
{
   let $id := xdmp:integer-to-hex(xdmp:random())
   return
     if (not(doc($id)) and not($IDS = $id)) then (
       $id,
       xdmp:set($IDS, ($IDS, $id))
     ) else local:get-id()
};

count(distinct-values(
   for $i in 1 to 1000
   return local:get-id()
))
=> 1000

This can slow down the query somewhat, since we have to keep checking an 
ever-larger list of ids. On my laptop it takes 0.1-ms to generate 1 id, 
and 8-ms for 100 ids. Checking the list of issues ids takes pretty much 
all of the elapsed time.

Just for completeness... you might also want the id to mean something. 
If so, generate it by concatenating or hashing (xdmp:hash64) the 
meaningful content.

Many people ask about sequential ids. It is possible to model an id 
sequence as a database document. But as with RDBMS sequences, there are 
serialization penalties. I don't see the advantage of sequential ids, so 
I rarely, if ever, use this approach.

-- Mike

Eric Palmitesta wrote:
> Hi Wayne,
> 
> Yes, I've been looking into generating unique identifiers to ease such 
> things as deletion.  I'm still new to the 'document' model, still 
> figuring out what's portable from my 'relational' model experience.
> 
> Is xdmp:request() guaranteed to be unique?  If so, that's a candidate to 
> use as a unique identifier when inserting a new node.
> 
> If there's a way to synchronize a particular block of code across all 
> sessions across all e-nodes, a hash of xdmp:request-timestamp() might 
> also work.
> 
> I'm sure some mailing-list-folk have needed to generate an identifier 
> which is guaranteed unique, anyone have suggestions / advice?
> 
> Much thanks,
> 
> Eric
> 
> Wayne Feick wrote:
>> Hi Eric,
>>
>> In 4.0, you can use xdmp:unpath() to do this
>>
>>     http://developer.marklogic.com/pubs/4.0/apidocs/Extension.html#xdmp:unpath
>>
>> However, in the example you've given I'd recommend changing the approach 
>> to use some sort of an id attribute on person (since there are duplicate 
>> names) rather than a positional XPath expression. With your current 
>> approach, two users could each intend to delete "bob" at index 3 when in 
>> fact the second attempt would actually delete "ryan".
>>
>> As a rule, exposing xpath expressions to a web app is dangerous since 
>> there is no guarantee they still refer to the same node from one 
>> transaction to the next.
>>
>> Wayne.
>>
>>
>> On Fri, 2008-10-10 at 14:43 -0400, Eric Palmitesta wrote:
>>> Is there a specific reason why one can't construct an xpath out of a string?
>>>
>>> For example,
>>>
>>> let $media := 'book' (: or 'journal', or 'article' :)
>>> return
>>>    doc('/path/to/file.xml')/path/to/$media/title
>>>
>>> Another use case, I want to display a list of items, and offer a 
>>> 'delete' link for each item.
>>>
>>> lets say /people.xml contained the following:
>>>    <people>
>>>      <person name="bob" />
>>>      <person name="jim" />
>>>      <person name="bob" />
>>>      <person name="ryan" />
>>>    </people>
>>>
>>> So I'd display something like:
>>>
>>> for $person in doc('/people.xml')/people/person
>>> return
>>>    <div>
>>>      $person/@name
>>>      <a href="delete.xqy?path={ xdmp:path($person) }>delete</a>
>>>    </div>
>>>
>>> This will give me nice delete links like 
>>> "delete.xqy?path=/people/person[1]", but in the supposed delete.xqy, I'd 
>>> want to do something similar to:
>>>
>>> let $file := '/people.xml'
>>> let $person := xdmp:get-request-field('path')
>>> return
>>>    xdmp:node-delete(doc($file)/$person)
>>>
>>> I can't, of course, the doc call will be fine but I can't construct 
>>> xpath with a string.  And the node-delete (and any other 
>>> node-manipulation function) requires actual nodes, not strings.
>>>
>>> I end up having to write eval-based utility functions:
>>>
>>> define function util:remove-element($uri as xs:string, $xpath as xs:string)
>>> {
>>> 	let $node := concat("doc('", $uri, "')", $xpath)
>>> 	return
>>> 		xdmp:eval(concat("xdmp:node-delete(", $node, ")"))
>>> }
>>>
>>> Please tell me I'm all wrong and there's a better way.
>>>
>>> Cheers,
>>>
>>> Eric
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com <mailto:General at developer.marklogic.com>
>>> http://xqzone.com/mailman/listinfo/general
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general



More information about the General mailing list