[MarkLogic Dev General] xpath string construction

Robert Koberg rob at koberg.com
Wed Oct 15 06:57:18 PDT 2008


Hi again,

To me, this is the same as locking the file, except that you are  
possibly letting someone spend wasted time editing a doc only to lose  
their changes if not up-to-date. As you say it is rare, but just wait  
till you hear from someone who spends 10 minutes editing a file only  
to see all the work lost.

best,
-Rob


On Oct 15, 2008, at 9:41 AM, Eric Palmitesta wrote:

> Good morning all!  Sorry to cause such a stir.  Upon reading your  
> responses, I feel you've gotten the wrong idea, which is probably  
> due to communication failure on my part.
>
> My idea of sequential ids is one 'special' document, for example / 
> id.xml, which contains nothing but <id>42</id>, and an id() function  
> which exclusive-locks the file, yanks 42 out, increments it,  
> replaces the text node with 43, and unlocks the file.  My  
> environment is read-heavy, write-light, so although write operations  
> which require a unique id would touch this file, I don't think it  
> would be an awful bottleneck.  This guaranteed unique ids without  
> having to ever worry about collisions.
>
> Of course, the counter-argument is that since it's a write-light  
> environment, the chances of using random() and lighting striking  
> twice, as Michael put it, are infinitesimally small.  I don't truly  
> have a problem with using random ids, I'm just saying it's worth  
> noting that it is *impossible* for lighting to strike twice with  
> sequential ids.
>
> Eric
>
> Wayne Feick wrote:
>> Hi Eric,
>> A disadvantage of sequential ids is that you can end up read  
>> locking all of your documents in order to find the current max id.  
>> You can address this partially by moving the next id into a  
>> separate document, but that document can still become a bottleneck  
>> if you have a high insertion rate. You could also address this by  
>> creating a range index on the id and using cts:element-values() or  
>> cts:element-attribute-values() to find the max.
>> By switching to random ids, you get better parallelism since our  
>> indexes can quickly determine if the id is already in use and will  
>> lock at most one document (or 0 if your existing id search is  
>> unfiltered). There is still a vanishingly small probability that  
>> two competing threads would allocate the same random id at the same  
>> moment in time, but that is improbable enough to be ignored.
>> Wayne.
>> On Tue, 2008-10-14 at 13:07 -0400, Eric Palmitesta wrote:
>>> Wow, thanks for the reply, Michael.  I'll probably be using some  
>>> variation of one of your examples.
>>>
>>> Michael Blakeley wrote:
>>> > Many people ask about sequential ids. It is possible to model an  
>>> id > sequence as a database document. But as with RDBMS sequences,  
>>> there are > serialization penalties. I don't see the advantage of  
>>> sequential ids, so > I rarely, if ever, use this approach.
>>>
>>> Assuming the recursive check isn't feasible (it doesn't scale  
>>> well), the advantage of sequential ids is being able to sleep at  
>>> night knowing collisions are simply impossible, and are not  
>>> reliant on a 'good-enough' random() function.  I'm nit-picking of  
>>> course, I'm sure random() is fine.  :)
>>>
>>> Cheers,
>>>
>>> Eric
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com <mailto:General at developer.marklogic.com 
>>> >
>>> http://xqzone.com/mailman/listinfo/general
>> ------------------------------------------------------------------------
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general



More information about the General mailing list