[MarkLogic Dev General] Collation Lexicon Frequency
Danny Sokolsky
dsokolsky at marklogic.com
Wed Dec 17 14:39:53 PST 2008
One approach is to use a space-insensitive collation for the range
index. Then these would appear the same. Here is a simple example:
xquery version "1.0-ml";
declare default collation "http://marklogic.com/collation/en/S1/AS";
"hello there" = "hello there"
(: returns true :)
-Danny
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Paul M
Sent: Wednesday, December 17, 2008 1:58 PM
To: general at developer.marklogic.com
Subject: [MarkLogic Dev General] Collation Lexicon Frequency
Hi:
I have the following docs:
doc1
<elem1>dear sir</elem1>
doc2
<elem1>dear sir</elem1>
doc3
<elem1>dear sir </elem1>
All have a variable amount of white space characters. Using lib-search,
specifically these functions:
cts:element-values($element-qname, "", $options, $base-query) (:above
three docs returned:)
cts:frequency($value) (:elem1 has three facets associate with
$base-query, each with a value of 1:)
Each doc contains elem1, each with a unique value. There does not exist
a simply method for the frequency function to consider the above three
elements as "the same". (They likely hash to different values?)
The only easy method is to normalize the data by stripping white-space
from the documents themselves?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20081217/5c4f5544/attachment.html
More information about the General
mailing list