[MarkLogic Dev General] Fuzzy and/or phonetic searching

Steve Mallen Steve.Mallen at semantico.com
Wed May 14 07:53:49 PDT 2008


Hi folks,

I've been looking through the developer docs to try to find out if I can 
do fuzzy searching or any type of phonetic searching in XQuery with Mark 
Logic.

Does anyone know if there any functions to determine similarities and 
distance between strings - e.g. soundex, levenstein, metaphone?

Specifically, I'd like to be able to do lucene-style fuzzy searches 
based on levenstein distance (for example, in Lucene, a search for 
"roam~" will find words like "foam" and "roams").  The spellcheck module 
looks like it does something similar, but I'm not sure what the 
implementation is based on?  How does it find words from a dictionary 
that are spelt similarly to the search term?  Is there any developer 
control over this?

I'd also like to be able to do phonetic searches, so that, for example, 
a search for "fiziks" would match "physics" since they are phonetically 
similar.  A few relational databases support "soundex" searches, and 
SOLR supports the use of various phonetic transcription algorithms.  I 
guess that I could create an index of phonetic transcriptions during 
content load, and do lookups based on that, but it would be good if 
there was something I could use 'out-of-the-box'.

Could anyone shed any light on this?

Many thanks,
-Steve



More information about the General mailing list