[MarkLogic Dev General] cts search - Frequency calculation

Anurag Saxena anurag24k at gmail.com
Wed Feb 24 12:33:28 PST 2010


Hi, i am trying to achieve something which is really very time
consuming/performance impact task.
I am trying to achieve it via for loops and using cts:search which takes
minutes to return the result on a small set of documents say 30000.
Below is the detail about what i am trying to achieve.

Can some body please suggest my the way to achieve the below requirement
without having performance impact, i mean if it can be acieved on millions
of documents in 10 to 20 minutes then i am fine with it and can use it as
batch process instead of doing the same at run time.

I am not much familiar with the mark logic api's but still have basic
understanding and comfortable in using them by seeing the example.


I have a collection under which there are millions of documents (for example
3 millions). I want to read each document for particular element value, and
then search for that value under different element ( which can present in
multiple occurance in single document) in the same set of documents to
calculate the frequency of particular value read from the document. Once
i'll get the frquency corresponding to each element value read from the
document then i need to sort them to identify the most occurrance of the
value. i will be supplying the collection value as parameter.....

Details............

Lets say i have collection.....will be passing as parameter

collection name: /abc/xyz/1234/
all the document names are: pqr.xml
I need to check/iterate all the documents for......
element name whose values frequency i want to know - (my:crv)
element name under which i want to search the above mentioned value -
(my:bbc)
once i have a list of value with its frquency then i need to sort it by
frquency to get the higher frequency values.

Any help/quick response is highly appreciated.

Thanks
Anu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20100224/8f39a52b/attachment.html


More information about the General mailing list