[MarkLogic Dev General] Reg: Diacritic-insensitive lexicons

Gajanan Chinchwadkar Gajanan.Chinchwadkar at marklogic.com
Wed Sep 7 23:37:37 PDT 2011


Here is what I am thinking:

If you set "fast diacritic sensitive searches" to false, then you will have diacritic insensitive value index. So a search query may work, but I am not sure if cts:frequency will work correctly on that o/p with desired performance. That's where you need to give it a try. Something like this:
cts:frequency(
  cts:search(
    fn:doc(),
    cts:element-value-query(xs:QName("name"), "Annie*", ("diacritic-insensitive", "wildcarded")),
   "unfiltered)
)

Other alternative is (as you mentioned earlier) try out a different collation and continue using element-value-match(). I think "Base English" collation will flatten out all the diacritic characters in the value lexicon.

See if this helps,

Rgds,

Gajanan
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of ambika arumugam
Sent: Wednesday, September 07, 2011 11:04 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Reg: Diacritic-insensitive lexicons

Hi Gajanan,

The value of "fast diacritic sensitive searches" setting in the database is set to true.
Do you mean to say that if it is set to false will the indexes be regrouped as required to yield the results I need?

Regards,
Ambika

On Thu, Sep 8, 2011 at 11:07 AM, Gajanan Chinchwadkar <Gajanan.Chinchwadkar at marklogic.com<mailto:Gajanan.Chinchwadkar at marklogic.com>> wrote:
What's the value of "fast diacritic sensitive searches" setting on your database? If it is true, will it affect to other applications if you set it to false?

From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] On Behalf Of ambika arumugam
Sent: Wednesday, September 07, 2011 10:31 PM

To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Reg: Diacritic-insensitive lexicons

Hi all,

I am running the cts:element-value-match query, I understand that indexes will be created for each unique value in the element 'name'. But is it possible to customize the indexing like creating a same index for Annie, Ánnie and Ànnie. So that if i perform cts:element-value-match query like

cts:element-value-match(xs:QName("name"), "Annie*") [1 to 10]

lets consider, if i have '3' matches for Annie in element 'name' and
'1' match for Ánnie in element 'name and
'2' matches for Ànnie in element 'name' in the database.

cts:frequency(cts:element-value-match(xs:QName("name"), "Annie*") [1 to 10])

Then performing the above query should return result of 6 (summing the individuals - Annie(3),Ánnie (1)m,Ànnie (2)). I also tried options of cts:element-value-match query still without any changes to the indexes i am not able to achieve this result.

Thanks in advance,

Regards,
Ambika

On Tue, Sep 6, 2011 at 11:23 AM, Gajanan Chinchwadkar <Gajanan.Chinchwadkar at marklogic.com<mailto:Gajanan.Chinchwadkar at marklogic.com>> wrote:
You mention that you are trying to count element names. But the query you are using seems to find all the elements named "name" whose value starts with "Annie".

Please clarify: what do you want to count exactly?

Also do you have a range index of type "string" set on the element named "name"? Basically the function element-value-match() simply reads all the values in the range index which match pattern "Annie*".

Thanks,

Gajanan
From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] On Behalf Of ambika arumugam
Sent: Monday, September 05, 2011 9:55 PM

To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Reg: Diacritic-insensitive lexicons

Hi all,


I am trying to get the count of element names using the query

cts:element-value-match(xs:QName("name"), "Annie*",("case-insensitive","collation=http://marklogic.com/collation/","diacritic-insensitive") )[1 to 10]
I am using cts:frequency of the above query to get the results.

For this i want values of Ánnie and Ànnie to match this query using diacritic-insensitive option as the third parameter of element-value-match query. But i am not getting results for this query as expected.
Should the collation be changed from root collation to unicode collation to get this done?

Regards
Ambika

_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20110907/d135053a/attachment.html 


More information about the General mailing list