[MarkLogic Dev General] cts:cluster
Damon.Feldman at marklogic.com
Tue May 21 06:17:47 PDT 2013
cts:cluster will group similar documents based on important terms in the documents, including words, element/word pairs and similar. If you build a separate document with only the people, you may be able to group them using cts:cluster, but cluster is intended for moderate sized sets or returned values rather than entire databases. You can also look at cts:similar-query(), again using a document with only people in it.
The cluster and similar functions use the same scores that searching uses - tf-idf scores for terms, which is why if you want it to focus on people you need to put the person elements in a separate document. If you want a more straightforward count of the number of times other people occur in the same document as a given person, you can use cts:element-value-co-occurences or cts:element-values() with a query constraint to a particular person you are checking, then count the number of documents mentioning each other person using cts:frequency on each returned value.
Also consider cts:element-value-co-occurences, if you want to focus on the most commonly paired people.
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Varun Varunesh
Sent: Monday, May 20, 2013 3:41 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] cts:cluster
I need a quick help. I have not yet explored MarkLogic cts:cluster but my problem sounds more like clustering.
So, My problem is I have lots of document in database. Each document contains one or more person name within it. Now I have to create relationship graph of these persons i.e. if some set of persons available in more than a threshold number of documents then connect those names with edges.
I am using MarkLogic 5.0.
Please suggest your way to solve this problem using MarkLogic.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General