[MarkLogic Dev General] Stemmed relevance scoring

Caraliza Fonseca-Ensor c.m.fonseca at gmail.com
Tue Jun 8 02:06:45 PDT 2010


Hi Danny

Thanks for getting back on this.

I think you are probably correct - in most cases we won't want to give
preference to exact matches over stemmed matches. We are really just
investigating whether it is possible, and are now going to proceed with
testing with our dataset.

Thanks again,
Cara


On 7 June 2010 17:17, Danny Sokolsky <Danny.Sokolsky at marklogic.com> wrote:

>  Hi Cara,
>
>
>
> If you need to do this, you can enable word-searches and do an or-query of
> the stemmed search and the unstemmed search (you will have to specify the
> query options “stemmed” and “unstemmed” in the respective cts:query
> constructors).  That should let the one with the exact match contribute to
> score.
>
>
>
> Think about if that is really what you want to do, though.  Especially when
> you end up with a large corpus of documents, I am not sure how much that
> will change the score, and stemming is really about increasing search
> recall.  So think about your assumption that a document that contains “ran”
> is more relevant than one that contains “run”.  In many cases, that is not
> necessarily true.
>
>
>
> -Danny
>
>
>
> *From:* general-bounces at developer.marklogic.com [mailto:
> general-bounces at developer.marklogic.com] *On Behalf Of *Caraliza
> Fonseca-Ensor
> *Sent:* Monday, June 07, 2010 5:30 AM
> *To:* general at developer.marklogic.com
> *Subject:* [MarkLogic Dev General] Stemmed relevance scoring
>
>
>
> Hi
>
> Is it possible to run a search which assigns lower scores to documents
> containing stems of the search terms, so that they have a lower relevance
> than documents which contain the exact search term?
>
> e.g.
> Document 1:
> <test>the word is run</test>
>
> Document 2:
> <test>the word is ran</test>
>
> Document 3:
> <test>the word is running</test>
>
> Query:
> cts:search(/test, cts:element-query(xs:QName("test"), "ran"))
>
>
> Is there a way to ensure that Document 2 is given the highest relevance
> score, as it is the only document containing the exact search term?
>
> Thanks,
> Cara
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20100608/06a21c93/attachment.html 


More information about the General mailing list