[MarkLogic Dev General] Document/Root Node Indexing

Jason Hunter jhunter at marklogic.com
Thu Jan 18 22:03:32 PST 2007


Patrick Force wrote:
> We have a directory that includes policy data, each in a separate 
> document, with the root node name InsurancePolicy.  I'm trying to figure 
> out the best way to optimize querying /InsurancePolicy.  The documents 
> in this directory are currently at about 20,000 and are quickly 
> growing.  I'm currently trying to run queries for data cleaning and 
> viewing data in general such as:
> 
> for $unattachedpolicy in /InsurancePolicy
> where empty(/InsuranceProduct[@id = $unattachedpolicy/@productId and 
> @version = $unattachedpolicy/@productVersion]/@id)
> return <unattachedpolicy id={$unattachedpolicy/@id} 
> productId={$unattachedpolicy/@productId} 
> productVersion={$unattachedpolicy/@productVersion}/>

Hi Patrick,

First let me confirm I understand what you're trying to do.  You're 
looking for /InsurancePolicy documents where either there isn't a 
corresponding /InsuranceProduct or where if there is a corresponding 
/InsuranceProduct then it's lacking an id attribute.  The definition of 
"corresponding" is based on comparing two attribute values.

I suspect the "or lacking an id attribute" really isn't necessary 
although the code is written that way right now.  So basically you're 
looking for policies that lack a corresponding product.

Second, can I ask how long is "very long"?  With this schema, because 
the id and version together act as kind of a pseudo primary key, I don't 
think you'll be able to do full resolution out of indexes.  The best 
algorithm will be O(n) where n is the number of policies.

for $policy in /InsurancePolicy
let $id := $policy/@productId
let $ver := $policy/@productVersion
where xdmp:estimate(/InsuranceProduct[@id = $id][@ver = $version]) = 0
return
<unattachedpolicy>{$policy/@*}</unattachedpolicy>

This will result in a disk bound query, taking perhaps 20 seconds for 
20k fragment loads of all /InsurancePolicy root fragments.  This is 
probably faster than what you're seeing no because no /InsuranceProduct 
fragments need be loaded.

Hope this helps.  Maybe someone else has a better idea?

-jh-



More information about the General mailing list