cts:cluster will group similar documents based on important terms in the documents, including words, element/word pairs and similar. If you build a separate document with only the people, you may be able to group them using cts:cluster, but cluster is intended for moderate sized sets or returned values rather than entire databases. You can also look at cts:similar-query(), again using a document with only people in it.

The cluster and similar functions use the same scores that searching uses - tf-idf scores for terms, which is why if you want it to focus on people you need to put the person elements in a separate document. If you want a more straightforward count of the number of times other people occur in the same document as a given person, you can use cts:element-value-co-occurences or cts:element-values() with a query constraint to a particular person you are checking, then count the number of documents mentioning each other person using cts:frequency on each returned value.

Also consider cts:element-value-co-occurences, if you want to focus on the most commonly paired people.


Hi All,

I need a quick help. I have not yet explored MarkLogic cts:cluster but my problem sounds more like clustering.

So, My problem is I have lots of document in database. Each document contains one or more person name within it. Now I have to create relationship graph of these persons i.e. if some set of persons available in more than a threshold number of documents then connect those names with edges.

I am using MarkLogic 5.0.

Please suggest your way to solve this problem using MarkLogic.

Varun  Varunesh
