[MarkLogic Dev General] Unicode flattening for non combined
characters.
Peter Hickman
peter.hickman at semantico.com
Mon Jul 21 06:52:22 PDT 2008
Our client has data such as "Jacob Ørn" that they want to search for.
The are expecting that searching for "orn" would match "Ørn" as they see
"Ø" as an accented character. According to the Unicode Standard 4.0
(always a good read :)) U+00D8 "Latin Capital Letter O With Stroke" is
not a combined character and therefore is not matched by "O" when doing
a case and diacritical insensitive search. This is what I expect and
understand as a developer.
Is there some way of getting client's expected behaviour? I suspect that
the "Ø" is only one of several characters that have this problem, such
as the "Ł" (U+0141) in "Łodz".
--
Peter Hickman.
Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 358223
f: 01273 723232
e: peter.hickman at semantico.com
w: www.semantico.com
More information about the General
mailing list