[MarkLogic Dev General] Determining stems for proper nouns?

David Sewell dsewell at virginia.edu
Sat Mar 17 06:02:51 PDT 2012


On Sat, 17 Mar 2012, Mike Sokolov wrote:

> It looks as if it just doesn't "know" that there is such a thing as a Quaker 
> or a Whig, and doesn't apply rule-based stemming to unknown capitalized 
> words, which is sensible, because how could it know whether (for example):
>
> Barsoomians is a plural noun that could be stemmed or simply a name (David 
> Barsoomians) that should not.
>
> Just a guess, and I have no clue what the MarkLogic word list is, but I 
> suppose you could derive it from exhaustive search...

Right, the brute-force fallback would be processing a lexicon list of 
all the capitalized words in the database. I'm sort of hoping to avoid 
that, though.

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/


More information about the General mailing list