[MarkLogic Dev General] Custom dictionary for stemming

Rhodes, David (LNG-CON) david.rhodes at lexisnexis.com
Wed Jul 22 08:02:33 PDT 2015


I am trying to use a custom dictionary to extend the set of stemmed words.

I am using MarkLogic 7.0, and have been following the documentation guides in Chapters 17 and 18:
http://docs.marklogic.com/7.0/guide/search-dev/stemming
http://docs.marklogic.com/7.0/guide/search-dev/custom-dictionaries

I noted that there are two ways to see if words are resolving to their stems:

cts:stem(word) returns the stems of word

and

cts:contains(word, stem) returns true if these two terms resolve to the same stem

I confirmed that both of these work for terms that are in the default dictionary (e.g., run and running, bite and bitten)

I have added a custom dictionary that adds "Int'l" as a word with "International" as its stem.

cdict:dictionary-write("en",$dict)

With that dictionary added as the custom dictionary for English, cts:stem works but cts:contains does not.
cts:stem("Int'l") returns International
cts:contains("Int'l", "International") returns false

I reindexed my database, since I understand that my dictionary entry means that all documents containing "Int'l" should now be indexed under "International".

cts:contains("Int'l", "International") still returns false
Furthermore, in the real search work flow that I am doing, searches for "Int'l" do not return documents containing "International" (But searches for "bitten" do return documents containing "bite").

My database indexes are set to Stemmed Searches = Basic, and Word Searches = False.

I think that stemming can be a powerful feature for my work flow, if I can just get it to work. Thank you for any advice you can offer.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20150722/129dc90e/attachment.html 


More information about the General mailing list