[MarkLogic Dev General] Diacritical insensitive search

Peter Hickman peter.hickman at semantico.com
Thu Aug 30 06:26:23 PDT 2007


We have some data that contains the text "Łódź". The "Ł" is U0141.

However when searching for "lodz" it does not match the entries with 
"Ł". "Łodz" however does match, indication that the "ó" and "ź" are 
being handled correctly by the diacritical insensitive search.

Am I correct in assuming that the "Ł" does not decompose to an L with a 
slash and is therefore not covered by a diacritical insensitive search. 
Looking at the Unicode book it would seem that "Ł" is not available in 
combined form.

If this is the case is there a way to add extra translations such as 
U0141 => U004C.

Also, and I have not checked, this but will the case insensitive search 
for "Ł" match "ł" (the lowercase version)?

Again, if not can we add a rule?

-- 
Peter Hickman.

Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 358223
f: 01273 723232
e: peter.hickman at semantico.com
w: www.semantico.com



More information about the General mailing list