[MarkLogic Dev General] Wildcard Searches | MultiLingual Data | Cts Queries

Mary Holstege mary.holstege at marklogic.com
Tue Jan 12 09:49:34 PST 2016


There are two things going on here:
(1) Language only applies to stemmed searches, and wildcarded searches are  
not stemmed. So your lang=zh is irrelevant.
If this were a non-wildcarded search, your lang=zh would still not work as  
you expect in this case, because:
(1) MarkLogic performs some basis script/language detection.  If your  
query were just "Caustic" (unwildcarded), since "Caustic" consists of  
Latin characters it is therefore is not processed as Chinese. What  
language it is processed as is determined by some configuration rules (not  
user servicable) which in this case say it should be processed as English.  
So your lang=zh is having no effect at all.  If you used lang=fr (a  
language which shares the same script as English, then it would be  
processed as French and your lang=fr would have an effect and would  
produce matches only in text marked as French.

I think your best bet here is to create language specific fields with  
"test" as an include element if the xml:lang attribute equals "zh"
(etc.).  That works as long as you are consistent about the level at which  
the xml:lang is marked.

//Mary


On Tue, 12 Jan 2016 09:21:19 -0800, Rahul Gupta <rahul.gupta at nagarro.com>  
wrote:

> Hi Team,
>
> I am working on wildcard searches with multilingual data. I have enabled  
> 3-character searches,  3 character word-positions, 2-character searches,  
> trailing wildcard searches on database.
>
> I have the following document:
>
> <doc>
>        <test xml:lang="zh">烧碱</test>
>        <test xml:lang="en">Caustic Soda</test>
> </doc>
>
> Problem is that I want to get this document only if I search for  
> “*Caus*” in “en” and not in “zh”.  ML is returning me the document even  
> if I perform the following queries where “lang=zh” has been used. Also  
> need to understand what this “lang” option actually does? Or is it a bug  
> in MarkLogic?
>
> cts:search(
> fn:doc(),
>   cts:element-value-query(xs:QName("test"), "Caus*",  
> ("wildcarded","lang=zh")),
>   "unfiltered")
>
> cts:search(
>   fn:doc(),
>   cts:element-query(xs:QName("test"),
>   cts:word-query("Caus* *", ("wildcarded","lang=zh"))),
>   "unfiltered")
>
> Can’t use path range queries since it doesn’t support wildcards. Any  
> help would be much appreciated. Thanks in advance.
>
> Thanks,
> Rahul


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/


More information about the General mailing list