[MarkLogic Dev General] Collation Lexicon Frequency
Paul M
pjmaip at yahoo.com
Wed Dec 17 13:58:20 PST 2008
Hi:
I have the following docs:
doc1
<elem1>dear sir</elem1>
doc2
<elem1>dear sir</elem1>
doc3
<elem1>dear sir </elem1>
All have a variable amount of white space characters. Using lib-search, specifically these functions:
cts:element-values($element-qname, "", $options, $base-query) (:above three docs returned:)
cts:frequency($value) (:elem1 has three facets associate with $base-query, each with a value of 1:)
Each doc contains elem1, each with a unique value. There does not exist a simply method for the frequency function to consider the above three elements as "the same". (They likely hash to different values?)
The only easy method is to normalize the data by stripping white-space from the documents themselves?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20081217/e8a160a0/attachment.html
More information about the General
mailing list