[MarkLogic Dev General] case sensitivity and search:search

Jakob Fix jakob.fix at gmail.com
Wed Apr 11 15:51:35 PDT 2012


Thank you all for your contributions, but (I seem to have a deficiency
in the field of collations, and I'm sorry for flooding the list in an
attempt to fill this void):

let $options :=
<search:options xmlns="http://marklogic.com/appservices/search">
  <default-suggestion-source>
    <word>
      <field name="suggest-field"
collation="http://marklogic.com/collation/en/S1"/>
    </word>
  </default-suggestion-source>
   <term>
    <term-option>case-insensitive</term-option>
  </term>
</search:options>
return

search:suggest("health", $options, 5)

==>
Health
healthcare
healthier
healthiness
healthrelated

two problems: sentence-case Health, and ignored hyphen in health-related.

for the latter problem, I added the collation
http://marklogic.com/collation/en/S4 to the field specification, I now
have two collations /en/S1 and /en/S4 - unfortunately this doesn't
return health-related, but still healthrelated.

Also, I have no clue as to why it still returns Health instead of
health.  How can I specify two collations in my query in order to have
results returned that are both case-insensitive and respect
punctuation?

(OK, I have now switched to the non-language-specific //S1 and //S4
following Mary's suggestion. The problems persist,)

feeling dafter by the minute ...
Jakob.



On Thu, Apr 12, 2012 at 00:26, Jakob Fix <jakob.fix at gmail.com> wrote:
> Will, Colleen,
>
> ah I knew somehow that collations might had something to do with it!
> The search dev guide indicates that
> http://marklogic.com/collation/en/S1 is the one to choose for case and
> diacritic insensitive searches.
>
> I replaced the previous default collation with this collation (it's
> re-indexing as I write). Now that it has finished re-indexing, I'm
> getting a
>
> XDMP-FIELDLXCNNOTFOUND:
> cts:field-word-match(xs:NCName("suggest-field"), "env*", "document",
> (), xs:double("1"), ()) -- No field word lexicon for suggest-field
> http://marklogic.com/collation
>
> error, but the new collation is the only one attached to this field.
> Where does the error message take the default collation from?  Or do I
> need both collations?  I added the collation attribute to
> <default-suggestion-source
> collation="http://marklogic.com/collation/en/S1"> and also added the
> line
>
> declare default collation "http://marklogic.com/collation/en/S1";
>
> to the start of the query, all to no avail.
>
> And then I simply tried the (undocumented?) @collation attribute for
> the <field> element ... tada!  This worked:
>
> env
> envahissantes
> envahisseur
> envejece
> envejecen
>
> Well, this opens another question: the collation is for English, but I
> seem to have multiple languages (French at least, and the last two
> words look "different"), so I guess I would just add more collations
> for all languages we may have (provided a license key?).  For the
> single purpose of rendering different case irrelevant, would this one
> collation be sufficient?
>
>
> Thanks for prodding me in the right direction.
>
> cheers,
> Jakob.
>
>
>
> On Wed, Apr 11, 2012 at 23:55, Colleen Whitney
> <Colleen.Whitney at marklogic.com> wrote:
>> I think a case-insensitive collation might accomplish this....
>>
>> Colleen Whitney
>> MarkLogic Corporation
>>
>> Phone +1 650 655 2366
>> email  colleen.whitney at marklogic.com
>> web    www.marklogic.com
>>
>> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
>>
>> ________________________________________
>> From: general-bounces at developer.marklogic.com [general-bounces at developer.marklogic.com] On Behalf Of Jakob Fix [jakob.fix at gmail.com]
>> Sent: Wednesday, April 11, 2012 2:38 PM
>> To: General Mark Logic Developer Discussion
>> Subject: [MarkLogic Dev General] case sensitivity and search:search
>>
>> Hello,
>>
>> my goal is to search a couple of elements for the type-ahead (aka
>> search:suggest) and to return suggestions. I've created a word field
>> index (called "suggest-field") based on the two elements.
>>
>> The search has to be case insensitive, i.e. currently I get results like this:
>> env
>> ENV
>> envahissantes
>> envahisseur
>> envejece
>>
>> for the first two, I want only "env", not "ENV" or a potential "Env".
>> I've disabled "fast case sensitive searches" on the database level as
>> otherwise this was inherited by the field configuration.  I have the
>> basic collation (http://marklogic.com/collation/). Also, I'm using a
>> term-option set to "case-insensitive" (see sample query below). But
>> none of these options make that the search considers different case
>> irrelevant.
>>
>> Which knob do I have to twiddle?
>>
>> let $options :=
>> <search:options xmlns="http://marklogic.com/appservices/search">
>>  <default-suggestion-source>
>>    <word>
>>      <field name="suggest-field"/>
>>    </word>
>>  </default-suggestion-source>
>>  <term>
>>    <term-option>case-insensitive</term-option>
>>  </term>
>> </search:options>
>> return
>>
>> search:suggest("env", $options, 5)
>>
>> cheers,
>> Jakob.
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list