[MarkLogic Dev General] xpath string construction

Michael Blakeley michael.blakeley at marklogic.com
Tue Oct 14 11:08:24 PDT 2008


I don't understand why you say that random ids don't scale well. I'd 
argue that sequential ids don't scale at all, and that the recursive 
approach scales pretty well. Maybe we mean different things when we talk 
about scaling?

For small numbers of ids per transaction, where small is less than a few 
hundred, the recursive random approach will perform well and will very 
rarely lock any extra documents at all. The queries to see if a document 
already exists will be indexed, and will never return more than 1 
fragment. So I would expect this approach to scale at O(log n) with the 
number of documents in the database.

The sequential approach can do better with large number of ids per 
transaction. But this approach requires a lock on the sequence document, 
for every transaction, effectively serializing everything. You can 
mitigate this with multiple sequence documents, but then you don't, 
strictly speaking, have a sequence anymore.

I suppose it depends on the workload you need to optimize: the random 
approach is better for highly-parallel workloads, while the sequential 
approach might do better for workloads that need thousands of ids per 
transaction and don't mind being single-threaded.

In today's world, I'd rather use a solution that is capable of a high 
degree of parallelism. So I rarely, if ever, use sequential ids.

To generate thousands of ids per transaction (say, 1 for each element in 
a document), I might start with a single random id. Any check would be 
highly unlikely to take more than zero iterations. Then I'd concatenate 
my own, in-memory sequence onto it, for each descendant element.

So the root element might have an id "123af097324-1", its first child 
would be "123af097324-2", etc (apply string-pad if you want the ids to 
have a fixed length). This approach would preserve the scalability of 
random ids, without requiring extra work to check any but the first id.

-- Mike

Eric Palmitesta wrote:
> Wow, thanks for the reply, Michael.  I'll probably be using some 
> variation of one of your examples.
> Michael Blakeley wrote:
>> Many people ask about sequential ids. It is possible to model an id 
>> sequence as a database document. But as with RDBMS sequences, there are 
>> serialization penalties. I don't see the advantage of sequential ids, so 
>> I rarely, if ever, use this approach.
> Assuming the recursive check isn't feasible (it doesn't scale well), the 
> advantage of sequential ids is being able to sleep at night knowing 
> collisions are simply impossible, and are not reliant on a 'good-enough' 
> random() function.  I'm nit-picking of course, I'm sure random() is 
> fine.  :)
> Cheers,
> Eric
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general

More information about the General mailing list