[MarkLogic Dev General] Sorting pinyin text?
Mary Holstege
mary.holstege at marklogic.com
Tue Apr 1 09:12:08 PST 2008
On Tue, 01 Apr 2008 08:31:12 -0700, Marc Moskowitz
<mmoskowitz at ifactory.com> wrote:
> The standard zh collation sorts Chinese characters correctly, but I'm
> trying to sort the pinyin transliterations. For example, this XQuery:
>
> default collation="http://marklogic.com/collation/zh"
> let $words := ('fù-bèi shòu dí','fùdi','fùgǎo','fūzi','fùtòng','fùxiè',
> 'fù-mu')
> for $x in $words
> order by $x
> return $x
>
> returns
>
> fù-bèi shòu dí
> fù-mu
> fùdi
> fùgǎo
> fùtòng
> fùxiè
> fūzi
>
> which is in codepoint order, instead of the correct order:
>
> fūzi (1st tone comes before 4th)
> fù-bèi shòu dí
> fùdi
> fùgǎo
> fù-mu (hyphens should be ignored)
> fùtòng
> fùxiè
>
> Am I correct that the supported way to sort this text is to create a
> sortable form for each of these strings at document load time?
> -Marc
Ah, right. I missed the key word "transliteration". Yes, I think what
you want to do is create some kind of sort key at document load time.
//Mary
More information about the General
mailing list