[MarkLogic Dev General] Wildcard Searches | MultiLingual Data | Cts Queries
mary.holstege at marklogic.com
Tue Jan 12 09:49:34 PST 2016
There are two things going on here:
(1) Language only applies to stemmed searches, and wildcarded searches are
not stemmed. So your lang=zh is irrelevant.
If this were a non-wildcarded search, your lang=zh would still not work as
you expect in this case, because:
(1) MarkLogic performs some basis script/language detection. If your
query were just "Caustic" (unwildcarded), since "Caustic" consists of
Latin characters it is therefore is not processed as Chinese. What
language it is processed as is determined by some configuration rules (not
user servicable) which in this case say it should be processed as English.
So your lang=zh is having no effect at all. If you used lang=fr (a
language which shares the same script as English, then it would be
processed as French and your lang=fr would have an effect and would
produce matches only in text marked as French.
I think your best bet here is to create language specific fields with
"test" as an include element if the xml:lang attribute equals "zh"
(etc.). That works as long as you are consistent about the level at which
the xml:lang is marked.
On Tue, 12 Jan 2016 09:21:19 -0800, Rahul Gupta <rahul.gupta at nagarro.com>
> Hi Team,
> I am working on wildcard searches with multilingual data. I have enabled
> 3-character searches, 3 character word-positions, 2-character searches,
> trailing wildcard searches on database.
> I have the following document:
> <test xml:lang="zh">烧碱</test>
> <test xml:lang="en">Caustic Soda</test>
> Problem is that I want to get this document only if I search for
> “*Caus*” in “en” and not in “zh”. ML is returning me the document even
> if I perform the following queries where “lang=zh” has been used. Also
> need to understand what this “lang” option actually does? Or is it a bug
> in MarkLogic?
> cts:element-value-query(xs:QName("test"), "Caus*",
> cts:word-query("Caus* *", ("wildcarded","lang=zh"))),
> Can’t use path range queries since it doesn’t support wildcards. Any
> help would be much appreciated. Thanks in advance.
Using Opera's revolutionary email client: http://www.opera.com/mail/
More information about the General