[MarkLogic Dev General] Custom dictionary for stemming
Rhodes, David (LNG-CON)
david.rhodes at lexisnexis.com
Wed Jul 22 08:02:33 PDT 2015
I am trying to use a custom dictionary to extend the set of stemmed words.
I am using MarkLogic 7.0, and have been following the documentation guides in Chapters 17 and 18:
I noted that there are two ways to see if words are resolving to their stems:
cts:stem(word) returns the stems of word
cts:contains(word, stem) returns true if these two terms resolve to the same stem
I confirmed that both of these work for terms that are in the default dictionary (e.g., run and running, bite and bitten)
I have added a custom dictionary that adds "Int'l" as a word with "International" as its stem.
With that dictionary added as the custom dictionary for English, cts:stem works but cts:contains does not.
cts:stem("Int'l") returns International
cts:contains("Int'l", "International") returns false
I reindexed my database, since I understand that my dictionary entry means that all documents containing "Int'l" should now be indexed under "International".
cts:contains("Int'l", "International") still returns false
Furthermore, in the real search work flow that I am doing, searches for "Int'l" do not return documents containing "International" (But searches for "bitten" do return documents containing "bite").
My database indexes are set to Stemmed Searches = Basic, and Word Searches = False.
I think that stemming can be a powerful feature for my work flow, if I can just get it to work. Thank you for any advice you can offer.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General