[MarkLogic Dev General] Difficulty modeling our data in MarkLogic
Damon.Feldman at marklogic.com
Wed Apr 25 14:32:02 PDT 2012
Ideally, your data will use the "X[tensible" feature of XML to handle new tags. Your data structure is a kind of meta-xml where the XML generically describes XML:
Might be more simply be represented as
This approach could not work in a relational DB, because you would need new columns for every new key, but it's fine to add new XML elements in most contexts. You will need to add range indexes for each custom field, but they will only "reindex" documents that contain the fields in question.
There are other possible approaches, but clean, simple data modeling is ideal if you can manage it.
From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com]<mailto:[mailto:general-bounces at developer.marklogic.com]> On Behalf Of Fullbright, Faron
Sent: Wednesday, April 25, 2012 10:04 AM
To: 'general at developer.marklogic.com'
Subject: [MarkLogic Dev General] Difficulty modeling our data in MarkLogic
We are evaluating the potential to use MarkLogic for indexing and storage of content and have come across a use case that doesn't seem to map well to the MarkLogic indexing model.
Just wanted to describe the data model we are using (or at least that section of it that applies to this case), and see if we're potentially overlooking something.
Our primary requirement for indexing revolves around custom tags that we allow clients to associate with objects. These custom tags are name/value pairs, and the values can have various types (string, date, datetime, real, int, etc.).
We need to be able to support fast range queries (that account for data type), fast ordering, and fast aggregation of distinct values across these tags. Each of these operations needs to consider the tag name and value and the value's type.
I believe this would be a nice fit for pre-defined Range Indexes in MarkLogic if we had a finite, predetermined set of tag names and could create distinct elements for each tag name and could predefine a Range Index for each. But since the set of potential tag names is unlimited, and since one tag name could be potentially associated with values that have multiple types, we can't really predefine anything.
Based on the documentation we've seen, we might potentially be able to get the functionality that I describe above to work using xpath queries against the standard indexes that MarkLogic builds when importing an XML document, but our concern is that, in the absence of Range Indexes, we would lack scalability (we need fast performance across a large number of objects each of which would have a large number of tags).
Is there some way to work around this with Range Indexes?
An example fragment of data:
Note: we would need dateTag values to have type date, stringTag values to have type string, and realTag values to have type real for purposes of filtering, sorting, etc.
This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. Local registered entity information: http://www.msci.com/legal/local_registered_entities.html
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General