[MarkLogic Dev General] xdmp:estimate() and fn:distinct-values()

Ron Hitchens ron at ronsoft.com
Sat Aug 4 04:46:44 PDT 2012

   Put an element range index on both userId and productId.
Then you can do (also untested):

    fn:count (cts:element-values (xs:QName("userId"), (), (),
       cts:element-value-query (xs:QName("productId"), $myBooks, "exact")))

   This fn:count should be fast because it will only count the
values in the range index (those that survive the filter that
selects matching productId's, which can be resolved from the
range index on productId).

   The slowdown comes when a query cannot answer the question
you're asking from the indexes and has to look inside the documents
to test the values.  Range indexes store the unique values in the
index and correlate them back to the fragment those values occur in.

   Just be careful that you define the proper type when creating
the element range indexes and that you provide the same collation
if the indexes are strings.

   You may also get a boost from creating appropriate dateTime
range indexes and applying similar filter queries for those. 

On Aug 4, 2012, at 11:28 AM, David Lee wrote:

> Untested Suggestion.
> Put userId into a element range index then use   estimate (cts:values())
> -----------------------------------------------------------------------------
> David Lee
> Lead Engineer
> MarkLogic Corporation
> dlee at marklogic.com
> Phone: +1 650-287-2531
> Cell:  +1 812-630-7622
> www.marklogic.com
> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny Sinang
> Sent: Friday, August 03, 2012 10:38 PM
> To: general
> Subject: [MarkLogic Dev General] xdmp:estimate() and fn:distinct-values()
> Hello,
> The query below runs quite fast (i.e. below 1 second).
> let $totalCount := xdmp:estimate(/user[reg/productId=$myBooks]/userId)
> let $numUnexpired := xdmp:estimate(/user[reg[productId=$myBooks and (endDate = 0 or endDate >= $current-epoch-time)]]/userId)
> return ($totalCount, $numUnexpired, xdmp:elapsed-time())
> Problem is, what I really need is to get the number of distinct values of "userId".
> Doing xdmp:estimate(fn:distinct-values()) results in in XDMP:UNSEARCHABLE error.
> Using fn:count() instead of xdmp:estimate() works, but takes so long (i.e. 30 seconds).
> Is there a workaround for this ?
> Regards,
> Danny
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

Ron Hitchens {mailto:ron at ronsoft.com}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown

More information about the General mailing list