[MarkLogic Dev General] Word Boundaries in Chinese?
Marc Moskowitz
mmoskowitz at ifactory.com
Wed May 7 14:31:16 PDT 2008
I'm seeing some odd behavior when searching for text in Chinese. It
seems that the server is making decisions about word boundaries based on
some internal criteria.
This XQuery:
let $q := '意',
$doc := (
<yo>好意思</yo>,
<yo>意料</yo>,
<yo>好意</yo>,
<yo>词不达達意</yo>)
for $d in $doc
let $h := cts:highlight($d, $q, <hey>{$cts:text}</hey>)
return (count($h//hey), $h)
produces this result:
0
<yo>好意思</yo>
1
<yo><hey>意</hey>料</yo>
0
<yo>好意</yo>
1
<yo>词不达達<hey>意</hey></yo>
Is there some way of affecting where these boundaries are placed? Or of
turning this functionality fully on or off?
-Marc
More information about the General
mailing list