[MarkLogic Dev General] "Joins" in search:search or cts:search

Lee, David dlee at epocrates.com
Thu Nov 17 11:41:15 PST 2011


I suspect the answer is "no" ... but just plugging the brains out there ..

For good or bad I use this architype.

I have many "summary" documents  say  "/logs/1.xml" , "/logs/2.xml"  which belongs to the collection "/summaries"

There can be many (100k+)

Each summary document lists a refernce to external URL's (in this case Amazon S3) from which data could be loaded.
If I load the data I put each group into a collection named by the URL of the summary.
So say I have 10,000 XML documents   referenced by doc("/logs/1.xml") If I choose to load them, they will end up in collection
"/logs/1.xml".   These summaries are in the collection say "/summaries"

The reason for this is for the ability to easily bulk delete blocks of documents based on their summaries.
I can list the summaries and by a simple
                exists( collection( $url) )

cant tell if any actual log documents have been loaded.


NOW:  I want to be able to delete all records by summary but only if the documents have been loaded.
Suppose I had 100k summary URL's I could do

                for $url in collection("/summaries")
                                if( exists( collection( $url) )  then
                                                xdmp:collection-delete($url)
                                else ()


This works and all ... but suppose I want something more efficiient.
Overall there may be only say 1% of the summary documents actually loaded.  Furthermore if there were LOTS of ones loaded the above would timeout.

So I spawn a thread to delete say [1 to 10] of every summary collection ...
but say I have 100k collections most of the threads do nothing.
So I have to revert to the above to first check if the collection has anything before spawning a thread.

Quesiton:   Is there a cts:search  option which can do a collection query based on the results of the search itself ?
that is (pseudo code)
in one cts:search

    for $c in collection("x")/document-uri(.)
                if( exists( collection( $c) )
                                return $c

doing this in FLOWR is very slow ...
but its what I'm resorting to ....











----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
dlee at epocrates.com<mailto:dlee at epocrates.com>
812-482-5224

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20111117/7972ecf7/attachment-0001.html 


More information about the General mailing list