[MarkLogic Dev General] Determining stems for proper nouns?
dsewell at virginia.edu
Sat Mar 17 06:02:51 PDT 2012
On Sat, 17 Mar 2012, Mike Sokolov wrote:
> It looks as if it just doesn't "know" that there is such a thing as a Quaker
> or a Whig, and doesn't apply rule-based stemming to unknown capitalized
> words, which is sensible, because how could it know whether (for example):
> Barsoomians is a plural noun that could be stemmed or simply a name (David
> Barsoomians) that should not.
> Just a guess, and I have no clue what the MarkLogic word list is, but I
> suppose you could derive it from exhaustive search...
Right, the brute-force fallback would be processing a lexicon list of
all the capitalized words in the database. I'm sort of hoping to avoid
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell at virginia.edu Tel: +1 434 924 9973
More information about the General