[MarkLogic Dev General] xdmp:estimate() and fn:distinct-values()

Danny Sinang d.sinang at gmail.com
Sat Aug 4 06:17:48 PDT 2012


Hi Ron,

Thanks.

I have element range indexes for userId and productId now, but I'm not sure
I explained well what I needed.

I'm trying to get :

1. all the unique userId's of users who have registrations for particular
books.
2. all the unique userId's of users whose book registrations have not
expired yet.

Each user is represented like this :

<user>
    <userId>12345</userId>
    ...
    <registeredBooks>
                    <registration>
                             <productId>ABCDEFG</productId>
                             <startDate></startDate>
                             <endDate></endDate>
                             ...
                    </registration>

                    <registration>
                             <productId>TUVWXY</productId>
                             <startDate></startDate>
                             <endDate></endDate>
                             ...
                    </registration>
    </registeredBooks>
</user>

As you can see, a user can have more than 1 book registration, and he can
also have more than 1 book registration for the same book (i.e. his
previous registration expired and he bought some more time to read it
again).

So given the above business rules, my queries (to give me all users who
registered for specific books) can return the same userId more than once.
That's why I need to get the distinct values of the userId's returned.

The queries I showed earlier (simplified versions of the actual query) work
fast but don't eliminate the duplicate userId's returned by the queries.

Your suggested query returns the unique userId's in the index, but not the
unique userId's returned by the query.

I'm pretty new to cts stuff so I'd really appreciate all the assistance I
could get. First off, how do I express in cts the query
/user[registeredBooks/registration/productId=$myBooks]/userId ? Next,  how
do I get the distinct userId's returned ?

Regards,
Danny

On Sat, Aug 4, 2012 at 7:46 AM, Ron Hitchens <ron at ronsoft.com> wrote:

>
>    Put an element range index on both userId and productId.
> Then you can do (also untested):
>
>     fn:count (cts:element-values (xs:QName("userId"), (), (),
>        cts:element-value-query (xs:QName("productId"), $myBooks, "exact")))
>
>    This fn:count should be fast because it will only count the
> values in the range index (those that survive the filter that
> selects matching productId's, which can be resolved from the
> range index on productId).
>
>    The slowdown comes when a query cannot answer the question
> you're asking from the indexes and has to look inside the documents
> to test the values.  Range indexes store the unique values in the
> index and correlate them back to the fragment those values occur in.
>
>    Just be careful that you define the proper type when creating
> the element range indexes and that you provide the same collation
> if the indexes are strings.
>
>    You may also get a boost from creating appropriate dateTime
> range indexes and applying similar filter queries for those.
>
> On Aug 4, 2012, at 11:28 AM, David Lee wrote:
>
> > Untested Suggestion.
> > Put userId into a element range index then use   estimate (cts:values())
> >
> >
> >
> -----------------------------------------------------------------------------
> > David Lee
> > Lead Engineer
> > MarkLogic Corporation
> > dlee at marklogic.com
> > Phone: +1 650-287-2531
> > Cell:  +1 812-630-7622
> > www.marklogic.com
> >
> > This e-mail and any accompanying attachments are confidential. The
> information is intended solely for the use of the individual to whom it is
> addressed. Any review, disclosure, copying, distribution, or use of this
> e-mail communication by others is strictly prohibited. If you are not the
> intended recipient, please notify us immediately by returning this message
> to the sender and delete all copies. Thank you for your cooperation.
> >
> > From: general-bounces at developer.marklogic.com [mailto:
> general-bounces at developer.marklogic.com] On Behalf Of Danny Sinang
> > Sent: Friday, August 03, 2012 10:38 PM
> > To: general
> > Subject: [MarkLogic Dev General] xdmp:estimate() and fn:distinct-values()
> >
> > Hello,
> >
> > The query below runs quite fast (i.e. below 1 second).
> >
> > let $totalCount := xdmp:estimate(/user[reg/productId=$myBooks]/userId)
> > let $numUnexpired := xdmp:estimate(/user[reg[productId=$myBooks and
> (endDate = 0 or endDate >= $current-epoch-time)]]/userId)
> > return ($totalCount, $numUnexpired, xdmp:elapsed-time())
> >
> > Problem is, what I really need is to get the number of distinct values
> of "userId".
> >
> > Doing xdmp:estimate(fn:distinct-values()) results in in
> XDMP:UNSEARCHABLE error.
> >
> > Using fn:count() instead of xdmp:estimate() works, but takes so long
> (i.e. 30 seconds).
> >
> > Is there a workaround for this ?
> >
> > Regards,
> > Danny
> >
> >
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://developer.marklogic.com/mailman/listinfo/general
>
> ---
> Ron Hitchens {mailto:ron at ronsoft.com}   Ronsoft Technologies
>      +44 7879 358 212 (voice)          http://www.ronsoft.com
>      +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> "No amount of belief establishes any fact." -Unknown
>
>
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120804/aef53d2c/attachment-0001.html 


More information about the General mailing list