[MarkLogic Dev General] Determining stems for proper nouns?

Mary Holstege mary.holstege at marklogic.com
Sat Mar 17 08:42:21 PDT 2012


On Fri, 16 Mar 2012 19:46:48 -0700, David Sewell <dsewell at virginia.edu> wrote:

>
> If I had some clue as to the set of words like "Quakers" and "Whigs"
> that do not stem to singular nouns, I could create a custom dictionary
> to handle such cases. Are MarkLogic's decisions here based on an
> internal dictionary? algorithms? both?

Both, but unfortunately I can't tell you what is in that dictionary,
or the exact circumstances under which the rules get applied because
stemming is licensed from Inxight (or one of their successors or
assigns) and we don't have a lot in the way of details.

I think your options are to either run a little test over the word
lexicon to determine which words need special handling in a custom
dictionary, and maybe repeat this experiment from time to time to
see if it needs adjustment, or to accept the lack of precision
and run case-insensitive.

Sorry I can't be more help here.

//Mary

Mary Holstege
Principal Engineer
Mark Logic Corporation


More information about the General mailing list