[MarkLogic Dev General] Sorting pinyin text?

Mary Holstege mary.holstege at marklogic.com
Tue Apr 1 09:12:08 PST 2008


On Tue, 01 Apr 2008 08:31:12 -0700, Marc Moskowitz  
<mmoskowitz at ifactory.com> wrote:
> The standard zh collation sorts Chinese characters correctly, but I'm  
> trying to sort the pinyin transliterations. For example, this XQuery:
>
> default collation="http://marklogic.com/collation/zh"
> let $words := ('fù-bèi shòu dí','fùdi','fùgǎo','fūzi','fùtòng','fùxiè',  
> 'fù-mu')
> for $x in $words
> order by $x
> return $x
>
> returns
>
> fù-bèi shòu dí
> fù-mu
> fùdi
> fùgǎo
> fùtòng
> fùxiè
> fūzi
>
> which is in codepoint order, instead of the correct order:
>
> fūzi (1st tone comes before 4th)
> fù-bèi shòu dí
> fùdi
> fùgǎo
> fù-mu (hyphens should be ignored)
> fùtòng
> fùxiè
>
> Am I correct that the supported way to sort this text is to create a  
> sortable form for each of these strings at document load time?
> -Marc

Ah, right. I missed the key word "transliteration".  Yes, I think what
you want to do is create some kind of sort key at document load time.

//Mary


More information about the General mailing list