[MarkLogic Dev General] Diacritical insensitive search
Peter Hickman
peter.hickman at semantico.com
Thu Aug 30 06:26:23 PDT 2007
We have some data that contains the text "Łódź". The "Ł" is U0141.
However when searching for "lodz" it does not match the entries with
"Ł". "Łodz" however does match, indication that the "ó" and "ź" are
being handled correctly by the diacritical insensitive search.
Am I correct in assuming that the "Ł" does not decompose to an L with a
slash and is therefore not covered by a diacritical insensitive search.
Looking at the Unicode book it would seem that "Ł" is not available in
combined form.
If this is the case is there a way to add extra translations such as
U0141 => U004C.
Also, and I have not checked, this but will the case insensitive search
for "Ł" match "ł" (the lowercase version)?
Again, if not can we add a rule?
--
Peter Hickman.
Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 358223
f: 01273 723232
e: peter.hickman at semantico.com
w: www.semantico.com
More information about the General
mailing list