[MarkLogic Dev General] xpath string construction

Eric Palmitesta eric.palmitesta at utoronto.ca
Wed Oct 15 07:09:09 PDT 2008


Rob,

I think so far we're talking about insertion, not editing.  What you're 
referring to is a whole other can of worms.  I've implemented something 
like a lock-less editor before (java-based website, nothing to do with 
xquery) which, upon saving an edited document, would check to see if the 
timestamp on the document has changed while your editing was taking 
place.  If so, it would hold onto the data and say "Hey, someone edited 
and saved the doc you're editing and trying to save now.  I've recovered 
your data though, we can proceed from here".  This was for a relatively 
low-traffic app, though.

I think someone described something similar to this not too long ago on 
this mailing list, although I can't find that email now.

Eric

Robert Koberg wrote:
> Hi again,
> 
> To me, this is the same as locking the file, except that you are 
> possibly letting someone spend wasted time editing a doc only to lose 
> their changes if not up-to-date. As you say it is rare, but just wait 
> till you hear from someone who spends 10 minutes editing a file only to 
> see all the work lost.
> 
> best,
> -Rob
> 
> 
> On Oct 15, 2008, at 9:41 AM, Eric Palmitesta wrote:
> 
>> Good morning all!  Sorry to cause such a stir.  Upon reading your 
>> responses, I feel you've gotten the wrong idea, which is probably due 
>> to communication failure on my part.
>>
>> My idea of sequential ids is one 'special' document, for example 
>> /id.xml, which contains nothing but <id>42</id>, and an id() function 
>> which exclusive-locks the file, yanks 42 out, increments it, replaces 
>> the text node with 43, and unlocks the file.  My environment is 
>> read-heavy, write-light, so although write operations which require a 
>> unique id would touch this file, I don't think it would be an awful 
>> bottleneck.  This guaranteed unique ids without having to ever worry 
>> about collisions.
>>
>> Of course, the counter-argument is that since it's a write-light 
>> environment, the chances of using random() and lighting striking 
>> twice, as Michael put it, are infinitesimally small.  I don't truly 
>> have a problem with using random ids, I'm just saying it's worth 
>> noting that it is *impossible* for lighting to strike twice with 
>> sequential ids.
>>
>> Eric
>>
>> Wayne Feick wrote:
>>> Hi Eric,
>>> A disadvantage of sequential ids is that you can end up read locking 
>>> all of your documents in order to find the current max id. You can 
>>> address this partially by moving the next id into a separate 
>>> document, but that document can still become a bottleneck if you have 
>>> a high insertion rate. You could also address this by creating a 
>>> range index on the id and using cts:element-values() or 
>>> cts:element-attribute-values() to find the max.
>>> By switching to random ids, you get better parallelism since our 
>>> indexes can quickly determine if the id is already in use and will 
>>> lock at most one document (or 0 if your existing id search is 
>>> unfiltered). There is still a vanishingly small probability that two 
>>> competing threads would allocate the same random id at the same 
>>> moment in time, but that is improbable enough to be ignored.
>>> Wayne.
>>> On Tue, 2008-10-14 at 13:07 -0400, Eric Palmitesta wrote:
>>>> Wow, thanks for the reply, Michael.  I'll probably be using some 
>>>> variation of one of your examples.
>>>>
>>>> Michael Blakeley wrote:
>>>> > Many people ask about sequential ids. It is possible to model an 
>>>> id > sequence as a database document. But as with RDBMS sequences, 
>>>> there are > serialization penalties. I don't see the advantage of 
>>>> sequential ids, so > I rarely, if ever, use this approach.
>>>>
>>>> Assuming the recursive check isn't feasible (it doesn't scale well), 
>>>> the advantage of sequential ids is being able to sleep at night 
>>>> knowing collisions are simply impossible, and are not reliant on a 
>>>> 'good-enough' random() function.  I'm nit-picking of course, I'm 
>>>> sure random() is fine.  :)
>>>>
>>>> Cheers,
>>>>
>>>> Eric
>>>> _______________________________________________
>>>> General mailing list
>>>> General at developer.marklogic.com 
>>>> <mailto:General at developer.marklogic.com>
>>>> http://xqzone.com/mailman/listinfo/general
>>> ------------------------------------------------------------------------
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general


More information about the General mailing list