[MarkLogic Dev General] Word Boundaries in Chinese?

Michael Sokolov sokolov at ifactory.com
Wed May 7 15:39:02 PDT 2008


Marc - it looks as if all the useful information in your e-mail got stripped
out by a mail demon.

Also - I suggest you take this one up w/support instead, or in addition.

-Mike

> -----Original Message-----
> From: general-bounces at developer.marklogic.com 
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of 
> Marc Moskowitz
> Sent: Wednesday, May 07, 2008 5:31 PM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Word Boundaries in Chinese?
> 
> I'm seeing some odd behavior when searching for text in 
> Chinese. It seems that the server is making decisions about 
> word boundaries based on some internal criteria.
> 
> This XQuery:
> let $q := '?',
> $doc := (
> <yo>???</yo>,
> <yo>??</yo>,
> <yo>??</yo>,
> <yo>?????</yo>)
> for $d in $doc
> let $h := cts:highlight($d, $q, <hey>{$cts:text}</hey>) 
> return (count($h//hey), $h)
> 
> produces this result:
> 
> 0
> <yo>???</yo>
> 1
> <yo><hey>?</hey>?</yo>
> 0
> <yo>??</yo>
> 1
> <yo>????<hey>?</hey></yo>
> 
> 
> Is there some way of affecting where these boundaries are 
> placed? Or of turning this functionality fully on or off?
> -Marc
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
> 



More information about the General mailing list