[MarkLogic Dev General] cts:cluster

Damon Feldman Damon.Feldman at marklogic.com
Tue May 21 06:17:47 PDT 2013


Varun,

cts:cluster will group similar documents based on important terms in the documents, including words, element/word pairs and similar. If you build a separate document with only the people, you may be able to group them using cts:cluster, but cluster is intended for moderate sized sets or returned values rather than entire databases. You can also look at cts:similar-query(), again using a document with only people in it.

The cluster and similar functions use the same scores that searching uses - tf-idf scores for terms, which is why if you want it to focus on people you need to put the person elements in a separate document. If you want a more straightforward count of the number of times other people occur in the same document as a given person, you can use cts:element-value-co-occurences or cts:element-values() with a query constraint to a particular person you are checking, then count the number of documents mentioning each other person using cts:frequency on each returned value.

Also consider cts:element-value-co-occurences, if you want to focus on the most commonly paired people.

Yours,
Damon

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Varun Varunesh
Sent: Monday, May 20, 2013 3:41 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] cts:cluster

Hi All,

I need a quick help. I have not yet explored MarkLogic cts:cluster but my problem sounds more like clustering.

So, My problem is I have lots of document in database. Each document contains one or more person name within it. Now I have to create relationship graph of these persons i.e. if some set of persons available in more than a threshold number of documents then connect those names with edges.

I am using MarkLogic 5.0.

Please suggest your way to solve this problem using MarkLogic.

Thanks,
Varun  Varunesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20130521/1eb3e020/attachment.html 


More information about the General mailing list