[MarkLogic Dev General] Word Boundaries in Chinese?

Marc Moskowitz mmoskowitz at ifactory.com
Wed May 7 14:31:16 PDT 2008


I'm seeing some odd behavior when searching for text in Chinese. It 
seems that the server is making decisions about word boundaries based on 
some internal criteria.

This XQuery:
let $q := '意',
$doc := (
<yo>好意思</yo>,
<yo>意料</yo>,
<yo>好意</yo>,
<yo>词不达達意</yo>)
for $d in $doc
let $h := cts:highlight($d, $q, <hey>{$cts:text}</hey>)
return (count($h//hey), $h)

produces this result:

0
<yo>好意思</yo>
1
<yo><hey>意</hey>料</yo>
0
<yo>好意</yo>
1
<yo>词不达達<hey>意</hey></yo>


Is there some way of affecting where these boundaries are placed? Or of 
turning this functionality fully on or off?
-Marc



More information about the General mailing list