[MarkLogic Dev General] xpath string construction

Robert Koberg rob at koberg.com
Wed Oct 15 07:50:49 PDT 2008


Ahhh, OK

On save you could use ML's new XSL processor to perform an identity  
transform with the exception that each element without an ID would use  
the XSL function generate-id to guarantee uniqueness for the newly  
added ID attribute. (this still has the problem of a deletion and  
reuse of an old ID as I mentioned before)

Wait, ML doesn't have an XSL processor? :)

thanks, I'll be here all week, don't forget to tip your waitress,
-Rob




On Oct 15, 2008, at 10:34 AM, Eric Palmitesta wrote:

> Oh, sorry that wasn't clear to begin with...
>
> The problem exposes itself when allowing a user to delete nodes  
> within a document.  You'd like to show the user a list of what's  
> currently there, and offer something to click which will delete that  
> node.  Something like:
>
> Eric - <a href="...">delete</a>
> Rob - <a href="...">delete</a>
> Mike - <a href="...">delete</a>
>
> Using the xdmp:path of the node you wish to delete won't work,  
> because if they've got two browsers open, and they delete /path/to/ 
> node[14] in the 1st browser, then in the 2nd browser /path/to/ 
> node[20] will actually be blowing away what the user sees as the  
> 21st item, not the 20th (because the document order has shifted by 1  
> due to the first deletion).
>
> How to mitigate?  Tack an id="blah" attribute on user-deletable  
> nodes. Something like:
>
> <people>
>  <person id="5gb4t" name="Eric" />
>  <person id="by54e" name="Rob" />
>  <person id="vj942" name="Mike" />
> </people>
>
> Now one doesn't have to rely on xdmp:path (a bad idea to begin  
> with). The ensuing barrage of email had only to do with how one  
> would generate this unique id, as MarkLogic provides no facility for  
> this.
>
> Hope that helps,
>
> Eric
>
> Robert Koberg wrote:
>> On Oct 15, 2008, at 10:09 AM, Eric Palmitesta wrote:
>>> Rob,
>>>
>>> I think so far we're talking about insertion, not editing.
>> OK, that wasn't what I was understanding. And sorry to keep coming  
>> back, but I want to understand what I am missing.
>> Assuming you don't mean inserting in an existing document (which I  
>> understand to be editing), and you are just inserting a new  
>> document, how would you have an ID to compare against? And, why  
>> isn't a URI good enough?
>> best,
>> -Rob
>>> What you're referring to is a whole other can of worms.  I've  
>>> implemented something like a lock-less editor before (java-based  
>>> website, nothing to do with xquery) which, upon saving an edited  
>>> document, would check to see if the timestamp on the document has  
>>> changed while your editing was taking place.  If so, it would hold  
>>> onto the data and say "Hey, someone edited and saved the doc  
>>> you're editing and trying to save now.  I've recovered your data  
>>> though, we can proceed from here".  This was for a relatively low- 
>>> traffic app, though.
>>>
>>> I think someone described something similar to this not too long  
>>> ago on this mailing list, although I can't find that email now.
>>>
>>> Eric
>>>
>>> Robert Koberg wrote:
>>>> Hi again,
>>>> To me, this is the same as locking the file, except that you are  
>>>> possibly letting someone spend wasted time editing a doc only to  
>>>> lose their changes if not up-to-date. As you say it is rare, but  
>>>> just wait till you hear from someone who spends 10 minutes  
>>>> editing a file only to see all the work lost.
>>>> best,
>>>> -Rob
>>>> On Oct 15, 2008, at 9:41 AM, Eric Palmitesta wrote:
>>>>> Good morning all!  Sorry to cause such a stir.  Upon reading  
>>>>> your responses, I feel you've gotten the wrong idea, which is  
>>>>> probably due to communication failure on my part.
>>>>>
>>>>> My idea of sequential ids is one 'special' document, for  
>>>>> example /id.xml, which contains nothing but <id>42</id>, and an  
>>>>> id() function which exclusive-locks the file, yanks 42 out,  
>>>>> increments it, replaces the text node with 43, and unlocks the  
>>>>> file.  My environment is read-heavy, write-light, so although  
>>>>> write operations which require a unique id would touch this  
>>>>> file, I don't think it would be an awful bottleneck.  This  
>>>>> guaranteed unique ids without having to ever worry about  
>>>>> collisions.
>>>>>
>>>>> Of course, the counter-argument is that since it's a write-light  
>>>>> environment, the chances of using random() and lighting striking  
>>>>> twice, as Michael put it, are infinitesimally small.  I don't  
>>>>> truly have a problem with using random ids, I'm just saying it's  
>>>>> worth noting that it is *impossible* for lighting to strike  
>>>>> twice with sequential ids.
>>>>>
>>>>> Eric
>>>>>
>>>>> Wayne Feick wrote:
>>>>>> Hi Eric,
>>>>>> A disadvantage of sequential ids is that you can end up read  
>>>>>> locking all of your documents in order to find the current max  
>>>>>> id. You can address this partially by moving the next id into a  
>>>>>> separate document, but that document can still become a  
>>>>>> bottleneck if you have a high insertion rate. You could also  
>>>>>> address this by creating a range index on the id and using  
>>>>>> cts:element-values() or cts:element-attribute-values() to find  
>>>>>> the max.
>>>>>> By switching to random ids, you get better parallelism since  
>>>>>> our indexes can quickly determine if the id is already in use  
>>>>>> and will lock at most one document (or 0 if your existing id  
>>>>>> search is unfiltered). There is still a vanishingly small  
>>>>>> probability that two competing threads would allocate the same  
>>>>>> random id at the same moment in time, but that is improbable  
>>>>>> enough to be ignored.
>>>>>> Wayne.
>>>>>> On Tue, 2008-10-14 at 13:07 -0400, Eric Palmitesta wrote:
>>>>>>> Wow, thanks for the reply, Michael.  I'll probably be using  
>>>>>>> some variation of one of your examples.
>>>>>>>
>>>>>>> Michael Blakeley wrote:
>>>>>>> > Many people ask about sequential ids. It is possible to  
>>>>>>> model an id > sequence as a database document. But as with  
>>>>>>> RDBMS sequences, there are > serialization penalties. I don't  
>>>>>>> see the advantage of sequential ids, so > I rarely, if ever,  
>>>>>>> use this approach.
>>>>>>>
>>>>>>> Assuming the recursive check isn't feasible (it doesn't scale  
>>>>>>> well), the advantage of sequential ids is being able to sleep  
>>>>>>> at night knowing collisions are simply impossible, and are not  
>>>>>>> reliant on a 'good-enough' random() function.  I'm nit-picking  
>>>>>>> of course, I'm sure random() is fine.  :)
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Eric
>>>>>>> _______________________________________________
>>>>>>> General mailing list
>>>>>>> General at developer.marklogic.com <mailto:General at developer.marklogic.com 
>>>>>>> >
>>>>>>> http://xqzone.com/mailman/listinfo/general
>>>>>> ------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> General at developer.marklogic.com
>>>>>> http://xqzone.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General at developer.marklogic.com
>>>>> http://xqzone.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> General at developer.marklogic.com
>>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general



More information about the General mailing list