[MarkLogic Dev General] pre-processing and filtering out common
words
Michael Blakeley
michael.blakeley at marklogic.com
Wed May 21 12:18:54 PDT 2008
It's common practice to remove "stop words"
(http://en.wikipedia.org/wiki/Stopwords) from queries, but also to
provide some syntax for exceptions. For example, there should be a way
to find Hamlet's soliloquy by searching for "to be or not to be". One
technique is to remove individual query terms that are stop words, but
to leave quoted phrases intact.
Other common practices are to lower-case individual query terms, to
remove some or all punctuation, and to remove singular possessives
(trailing "'s"). But not every application will implement all of these
techniques: requirements vary.
-- Mike
Paul M wrote:
> Do you pre-process your search queries, so that common words are removed, such as (and, the, to, I, an, a, etc...)? Does this speed search results noticeably? (Many fragments returned when common words are use as search terms, correct?)
>
> thank you
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
More information about the General
mailing list