[MarkLogic Dev General] Simple search question
Michael Blakeley
mike at blakeley.com
Wed Jun 20 14:00:54 PDT 2012
A map could work, if you keep it in a server field. Update the server field along with the lookup document, which should be fairly rare. You could also configure a trigger on the database-online event. This would probably be the fastest solution, because the map would always be available in memory. One possible drawback is that map updates would not be atomic: you might update the in-memory map before the database document has committed, or vice-versa. That may not be a problem for your application, though.
If that doesn't appeal, I think the best approach is to model each item as a document, with a URI something like /lists/{list-name}/{item-id}. The lookup then becomes doc(concat('/list/', $list-name, '/', $item-id)) - which is highly efficient. The value could be text, or could be a root 'value' element.
If you don't want to break up the lists, I think walking the tree to find a matching item will be the bottleneck. Using a full XPath with a limiting predicate will help: '(/list/item[id eq $id])[1]' rather than '//item'. The predicate [1] lets the evaluator bail out after it finds the first match, so it doesn't have to walk the entire tree.
Or you could fragment on item and use '(//item[ id eq $id])[1]', but I would prefer the item-document approach if possible. The cheapest fragment lookup possible is a simple doc() call.
Someone will likely mention co-occurrences, and you could try that with range indexes on both id and value. Again you would have to fragment on 'item', or store each item in its own document. But the database lookup and index join for co-occurrence is fairly complex, and this might end up taking more time than a single-item doc() call would.
So the server field will almost certainly be the fastest option, but it does introduce some extra complexity. Storing lists as directories and items as documents is probably the next fastest, since you can construct the URI with just the list name and the item id.
-- Mike
On 20 Jun 2012, at 13:29 , Tim Meagher wrote:
> Hi Folks,
>
> I have a couple of documents with thousands of ID to name lookups that look
> like this:
>
> <list>
> <item>
> <id>123-1</id>
> <value>Able</value>
> </item>
> <item>
> <id>123-2</id>
> <value>Adam</value>
> </item>
> ...
> <list>
>
> I want to simply search these documents to find the value for a given id
> (there are no duplicate ids) that needs to get added to a document. There
> are a couple of ways to do this, i.e. via an xpath expression or a
> cts:search, but I'm wondering what the search experts would suggest. I
> don't need to know how to perform the document update, I just want to know
> how to optimize the search. I have the flexibility to add a namespace and
> to set up indexes. Note that each update is independent of the other, i.e.,
> I would not necessarily want to load a map:map for each lookup.
>
> Thank you!
>
> Tim Meagher
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general
>
More information about the General
mailing list