[MarkLogic Dev General] Determining stems for proper nouns?
David Sewell
dsewell at virginia.edu
Sat Mar 17 06:02:51 PDT 2012
On Sat, 17 Mar 2012, Mike Sokolov wrote:
> It looks as if it just doesn't "know" that there is such a thing as a Quaker
> or a Whig, and doesn't apply rule-based stemming to unknown capitalized
> words, which is sensible, because how could it know whether (for example):
>
> Barsoomians is a plural noun that could be stemmed or simply a name (David
> Barsoomians) that should not.
>
> Just a guess, and I have no clue what the MarkLogic word list is, but I
> suppose you could derive it from exhaustive search...
Right, the brute-force fallback would be processing a lexicon list of
all the capitalized words in the database. I'm sort of hoping to avoid
that, though.
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell at virginia.edu Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
More information about the General
mailing list