[MarkLogic Dev General] xpath string construction
michael.blakeley at marklogic.com
Tue Oct 14 11:08:24 PDT 2008
I don't understand why you say that random ids don't scale well. I'd
argue that sequential ids don't scale at all, and that the recursive
approach scales pretty well. Maybe we mean different things when we talk
For small numbers of ids per transaction, where small is less than a few
hundred, the recursive random approach will perform well and will very
rarely lock any extra documents at all. The queries to see if a document
already exists will be indexed, and will never return more than 1
fragment. So I would expect this approach to scale at O(log n) with the
number of documents in the database.
The sequential approach can do better with large number of ids per
transaction. But this approach requires a lock on the sequence document,
for every transaction, effectively serializing everything. You can
mitigate this with multiple sequence documents, but then you don't,
strictly speaking, have a sequence anymore.
I suppose it depends on the workload you need to optimize: the random
approach is better for highly-parallel workloads, while the sequential
approach might do better for workloads that need thousands of ids per
transaction and don't mind being single-threaded.
In today's world, I'd rather use a solution that is capable of a high
degree of parallelism. So I rarely, if ever, use sequential ids.
To generate thousands of ids per transaction (say, 1 for each element in
a document), I might start with a single random id. Any check would be
highly unlikely to take more than zero iterations. Then I'd concatenate
my own, in-memory sequence onto it, for each descendant element.
So the root element might have an id "123af097324-1", its first child
would be "123af097324-2", etc (apply string-pad if you want the ids to
have a fixed length). This approach would preserve the scalability of
random ids, without requiring extra work to check any but the first id.
Eric Palmitesta wrote:
> Wow, thanks for the reply, Michael. I'll probably be using some
> variation of one of your examples.
> Michael Blakeley wrote:
>> Many people ask about sequential ids. It is possible to model an id
>> sequence as a database document. But as with RDBMS sequences, there are
>> serialization penalties. I don't see the advantage of sequential ids, so
>> I rarely, if ever, use this approach.
> Assuming the recursive check isn't feasible (it doesn't scale well), the
> advantage of sequential ids is being able to sleep at night knowing
> collisions are simply impossible, and are not reliant on a 'good-enough'
> random() function. I'm nit-picking of course, I'm sure random() is
> fine. :)
> General mailing list
> General at developer.marklogic.com
More information about the General