[MarkLogic Dev General] Word Boundaries in Chinese?
Michael Sokolov
sokolov at ifactory.com
Wed May 7 15:39:02 PDT 2008
Marc - it looks as if all the useful information in your e-mail got stripped
out by a mail demon.
Also - I suggest you take this one up w/support instead, or in addition.
-Mike
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of
> Marc Moskowitz
> Sent: Wednesday, May 07, 2008 5:31 PM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Word Boundaries in Chinese?
>
> I'm seeing some odd behavior when searching for text in
> Chinese. It seems that the server is making decisions about
> word boundaries based on some internal criteria.
>
> This XQuery:
> let $q := '?',
> $doc := (
> <yo>???</yo>,
> <yo>??</yo>,
> <yo>??</yo>,
> <yo>?????</yo>)
> for $d in $doc
> let $h := cts:highlight($d, $q, <hey>{$cts:text}</hey>)
> return (count($h//hey), $h)
>
> produces this result:
>
> 0
> <yo>???</yo>
> 1
> <yo><hey>?</hey>?</yo>
> 0
> <yo>??</yo>
> 1
> <yo>????<hey>?</hey></yo>
>
>
> Is there some way of affecting where these boundaries are
> placed? Or of turning this functionality fully on or off?
> -Marc
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
More information about the General
mailing list