[MarkLogic Dev General] cts search - Frequency calculation
Michael Blakeley
michael.blakeley at marklogic.com
Wed Feb 24 15:12:02 PST 2010
This sounds like a frequency-order facet, which the server can calculate
for you using a string range-index on the element. In the long run you
are probably better off with the high-level search API
(http://developer.marklogic.com/pubs/4.1/apidocs/SearchAPI.html) rather
than using cts:search() and cts:element-values() directly. Start with
section 2.5 of
http://developer.marklogic.com/pubs/4.1/books/search-dev-guide.pdf
-- Mike
On 2010-02-24 12:33, Anurag Saxena wrote:
> Hi, i am trying to achieve something which is really very time consuming/performance impact task.
> I am trying to achieve it via for loops and using cts:search which takes minutes to return the result on a small set of documents say 30000.
> Below is the detail about what i am trying to achieve.
>
> Can some body please suggest my the way to achieve the below requirement without having performance impact, i mean if it can be acieved on millions of documents in 10 to 20 minutes then i am fine with it and can use it as batch process instead of doing the same at run time.
>
> I am not much familiar with the mark logic api's but still have basic understanding and comfortable in using them by seeing the example.
>
>
> I have a collection under which there are millions of documents (for example 3 millions). I want to read each document for particular element value, and then search for that value under different element ( which can present in multiple occurance in single document) in the same set of documents to calculate the frequency of particular value read from the document. Once i'll get the frquency corresponding to each element value read from the document then i need to sort them to identify the most occurrance of the value. i will be supplying the collection value as parameter.....
>
> Details............
>
> Lets say i have collection.....will be passing as parameter
>
> collection name: /abc/xyz/1234/
> all the document names are: pqr.xml
> I need to check/iterate all the documents for......
> element name whose values frequency i want to know - (my:crv)
> element name under which i want to search the above mentioned value - (my:bbc)
> once i have a list of value with its frquency then i need to sort it by frquency to get the higher frequency values.
>
> Any help/quick response is highly appreciated.
>
> Thanks
> Anu
>
>
>
More information about the General
mailing list