[MarkLogic Dev General] Determining stems for proper nouns?

David Sewell dsewell at virginia.edu
Sat Mar 17 06:02:51 PDT 2012

On Sat, 17 Mar 2012, Mike Sokolov wrote:

> It looks as if it just doesn't "know" that there is such a thing as a Quaker 
> or a Whig, and doesn't apply rule-based stemming to unknown capitalized 
> words, which is sensible, because how could it know whether (for example):
> Barsoomians is a plural noun that could be stemmed or simply a name (David 
> Barsoomians) that should not.
> Just a guess, and I have no clue what the MarkLogic word list is, but I 
> suppose you could derive it from exhaustive search...

Right, the brute-force fallback would be processing a lexicon list of 
all the capitalized words in the database. I'm sort of hoping to avoid 
that, though.

David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/

More information about the General mailing list