[MarkLogic Dev General] Determining stems for proper nouns?
geert.josten at dayon.nl
Sat Mar 17 10:22:07 PDT 2012
Or to extend on the idea of Mike, add two query terms, one case-sensitive,
one case-insensitive, and give the later a lower weight..
> -----Oorspronkelijk bericht-----
> Van: general-bounces at developer.marklogic.com [mailto:general-
> bounces at developer.marklogic.com] Namens Mike Sokolov
> Verzonden: zaterdag 17 maart 2012 16:09
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Determining stems for proper
> On 3/17/2012 9:02 AM, David Sewell wrote:
> > On Sat, 17 Mar 2012, Mike Sokolov wrote:
> >> It looks as if it just doesn't "know" that there is such a thing as a
> >> or a Whig, and doesn't apply rule-based stemming to unknown
> >> words, which is sensible, because how could it know whether (for
> >> Barsoomians is a plural noun that could be stemmed or simply a name
> >> Barsoomians) that should not.
> >> Just a guess, and I have no clue what the MarkLogic word list is, but
> >> suppose you could derive it from exhaustive search...
> > Right, the brute-force fallback would be processing a lexicon list of
> > all the capitalized words in the database. I'm sort of hoping to avoid
> > that, though.
> Have you considered a two-pass search where you widen by lower-casing
> all terms when no results are found? The result wouldn't be as precise
> as it could be if you knew which terms were in the stemming dict, but
> would enable you to find Young as a name (or at the start of a sentence)
> without matching young, and also match Quaker->Quakers.
> General mailing list
> General at developer.marklogic.com
More information about the General