[MarkLogic Dev General] Document/Root Node Indexing
Jason Hunter
jhunter at marklogic.com
Thu Jan 18 22:03:32 PST 2007
Patrick Force wrote:
> We have a directory that includes policy data, each in a separate
> document, with the root node name InsurancePolicy. I'm trying to figure
> out the best way to optimize querying /InsurancePolicy. The documents
> in this directory are currently at about 20,000 and are quickly
> growing. I'm currently trying to run queries for data cleaning and
> viewing data in general such as:
>
> for $unattachedpolicy in /InsurancePolicy
> where empty(/InsuranceProduct[@id = $unattachedpolicy/@productId and
> @version = $unattachedpolicy/@productVersion]/@id)
> return <unattachedpolicy id={$unattachedpolicy/@id}
> productId={$unattachedpolicy/@productId}
> productVersion={$unattachedpolicy/@productVersion}/>
Hi Patrick,
First let me confirm I understand what you're trying to do. You're
looking for /InsurancePolicy documents where either there isn't a
corresponding /InsuranceProduct or where if there is a corresponding
/InsuranceProduct then it's lacking an id attribute. The definition of
"corresponding" is based on comparing two attribute values.
I suspect the "or lacking an id attribute" really isn't necessary
although the code is written that way right now. So basically you're
looking for policies that lack a corresponding product.
Second, can I ask how long is "very long"? With this schema, because
the id and version together act as kind of a pseudo primary key, I don't
think you'll be able to do full resolution out of indexes. The best
algorithm will be O(n) where n is the number of policies.
for $policy in /InsurancePolicy
let $id := $policy/@productId
let $ver := $policy/@productVersion
where xdmp:estimate(/InsuranceProduct[@id = $id][@ver = $version]) = 0
return
<unattachedpolicy>{$policy/@*}</unattachedpolicy>
This will result in a disk bound query, taking perhaps 20 seconds for
20k fragment loads of all /InsurancePolicy root fragments. This is
probably faster than what you're seeing no because no /InsuranceProduct
fragments need be loaded.
Hope this helps. Maybe someone else has a better idea?
-jh-
More information about the General
mailing list