[MarkLogic Dev General] How to get accurate fragmentcountsusing xdmp:estimate when the QName is a variable
Geert Josten
Geert.Josten at daidalos.nl
Wed Sep 15 08:44:54 PDT 2010
Darin,
I think David hits the nail on the head. From your story I get the impression you are expecting some kind of node count, but if multiple nodes matching your criterium exist within the same document fragment, the estimate for those nodes will be exactly one. The estimate will therefore most likely return a number that will be lower than you might expect.
Apart from that, using node-name() to filter on node names might not perform very well. You never know, but I wouldn't expect the optimizer to recognize that indexes could improve performance here. Eval has a slight overhead but might save time as a more straight-forward expression like for instance //my:elem[@my:attr eq 'myval'] seems easier to optimize, at least to me. Using indices explicitly will be quickest and safest most likely though, but I think David is right that you should use count, and you will need that even with index functions like cts:element-attribute-value-query..
HTH!
Kind regards,
Geert
>
drs. G.P.H. (Geert) Josten
Consultant
Daidalos BV
Hoekeindsehof 1-4
2665 JZ Bleiswijk
T +31 (0)10 850 1200
F +31 (0)10 850 1199
mailto:geert.josten at daidalos.nl
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of
> Lee, David
> Sent: woensdag 15 september 2010 17:21
> To: General Mark Logic Developer Discussion; Mary Holstege
> Subject: Re: [MarkLogic Dev General] How to get accurate
> fragment countsusing xdmp:estimate when the QName is a variable
>
> Just a guess ... but xdmp:estimate estimates fragments, not
> matching nodes.
> Having the correct nodes returned in no way has anything to
> do with what xdmp:estimate will return.
> How is your DB fragmented ? If each node is in a seperate
> fragment then xdmp:estimate should work (and return the same
> as count() ) but if you have one big XML file or if say the
> results span across all fragments then xdmp:estimate will
> return the number of fragments that have any matches, not the
> number of matches.
>
> You could try just using count() instead of xdmp:estimate
> (but it will be slower)
>
> -David
>
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> dlee at epocrates.com
> 812-482-5224
>
>
>
>
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of
> McBeath, Darin W (ELS-STL)
> Sent: Wednesday, September 15, 2010 11:03 AM
> To: Mary Holstege; General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] How to get accurate
> fragment countsusing xdmp:estimate when the QName is a variable
>
> Thanks Mary. That was pretty dumb on my part. But, I still
> do have a question based on my more complex example.
>
> Basically, I want to get fragment counts where a fragment
> contains a given element, attribute, and value for that
> attribute. I was thinking that I could do XPath such as
> below wrapped in a xdmp:estimate.
>
> xdmp:estimate(//*[node-name(.)=$eQName and string(./@*[node-name(.) =
> $aQName]) = $value])
>
> But, this always seems to return me the number of fragments in the DB.
> I did verify that
>
> //*[node-name(.)=$eQName and string(./@*[node-name(.) =
> $aQName]) = $value]
>
> Only returns the nodes which I want ... so, unlike last time
> it would appear the query is at least written correctly.
> Perhaps, what I'm trying to do with XPath and an accurate
> estimate is not currently possible. I believe that I have
> the necessary indexes enabled that should support an accurate
> estmate.
>
> I did verify that something explicit such as the following
> returns the correct result:
>
> xdmp:estimate(//skos:ConceptScheme[@rdf:about=$value])
>
> This makes me believe that my indexes are configured correctly.
>
> I will likely drop back and try a
> cts:element-attribute-value-query or xdmp:eval the XPath
> expression (such as above) ... but, I'm curious as to whether
> I'm still doing something wrong above or whether this is
> really not possible. I also can't really create range
> indexes on the element/attribute as I'm trying to make this a
> fairly generic solution whereby one could query on any
> element/attribute.
>
> Thanks.
>
> Darin.
>
> -----Original Message-----
> From: Mary Holstege [mailto:mary.holstege at marklogic.com]
> Sent: Tuesday, September 14, 2010 4:22 PM
> To: General Mark Logic Developer Discussion; McBeath, Darin W
> (ELS-STL)
> Subject: Re: [MarkLogic Dev General] How to get accurate
> fragment counts using xdmp:estimate when the QName is a variable
>
> On Tue, 14 Sep 2010 13:05:58 -0700, McBeath, Darin W
> (ELS-STL) <D.McBeath at elsevier.com> wrote:
> ...
> > The following query returns me the value I would expect.
> >
> > xdmp:estimate(//skos:ConceptScheme)
> >
> > However, if I have a variable $eQName which is essentially the QName
> for
> > skos:ConceptScheme
> >
> > xdmp:estimate(//$eQName)
> >
> > returns me every fragment in the DB.
>
> I think the problem is that your query isn't doing what you
> think it is. It equivalent to //"skos:ConceptScheme",
> the value of which is the string "skos:ConceptScheme" repeated
> for every element in the database. So the estimate is
> correct, but it isn't what you want.
>
> The only way to get the result you wany from the path is
> something like //*[fn:node-name(.)=$eQName].
>
> //Mary
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
More information about the General
mailing list