[MarkLogic Dev General] hyphens and cts:element-value-query

Gary Larsen gary.larsen at envisn.com
Tue Feb 28 11:49:14 PST 2017


Learning more than usual today J

 

"collation=URI" is an option on element-range-query(), but not on
element-value-query().  Looks like creating an range index would be useful
for elements which may have spaces or punctuation and need exact matching..

 

Thanks, Gary

 

From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Geert Josten
Sent: Tuesday, February 28, 2017 2:28 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] hyphens and cts:element-value-query

 

In defense of Andreas, Mary does write this too:

 

"At the boundary, where you specify exact unstemmed value

queries or exact range queries with a codepoint collation,
the results will line up. For exact queries there are universal
index entries for the value that include punctuation and 

whitespace, but we don't index those tokens otherwise."

 

E.g. it might work if you select codepoint collation
("collation=http://marklogic.com/collation/codepoint") together with the
"exact" option. MarkLogic defaults to using its own root collation.

 

From: <general-bounces at developer.marklogic.com> on behalf of James Kerr
<James.Kerr at marklogic.com>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com>
Date: Tuesday, February 28, 2017 at 7:34 PM
To: MarkLogic Developer Discussion <general at developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] hyphens and cts:element-value-query

 

>From Mary's response: "Word tokens may be stemmed and punctuation and space
tokens are not indexed" (emphasis my own).

 

The fact that punctuation and space tokens are not indexed is why you cannot
do punctuation-sensitive or whitespace-sensitive, unfiltered word or value
queries.

 

Depending on what you are trying to accomplish, custom tokenization
(https://docs.marklogic.com/guide/search-dev/custom-tokenization) may be a
good option for you.

 

On a side note, can you share what you are doing for your predicate check?
By adding a check like this, you are essentially just implementing your own
filtered search so it's unclear what the benefit would be over just using
the "filtered" search option.

 

-James 

 

 

From: <general-bounces at developer.marklogic.com> on behalf of Gary Larsen
<gary.larsen at envisn.com>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com>
Date: Tuesday, February 28, 2017 at 1:12 PM
To: 'MarkLogic Developer Discussion' <general at developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] hyphens and cts:element-value-query

 

Geert and Andreas,

 

Thanks for pointing out tokens vs. values that I wasn't understanding.

 

Using 'filtered' in cts:search works, but I've always tried to avoid that
for performance reasons.  In this case I've added a predicate check in the
result instead.

 

But to Andreas's point, it seems that 'exact' or 'punctuation-sensitive'
should be able to match, or maybe I'm not understanding the documentation
for cts:element-value-query.  If it did work I guess there would be extra
work un-tokenizing?

 

I using ML version 8.0-6

 

Thanks for any clarification,

Gary

 

 

From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Andreas Hubmer
Sent: Tuesday, February 28, 2017 8:23 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] hyphens and cts:element-value-query

 

Hi Geert,

 

As far as I know there are index entries for "exact" queries in the
universal index, that include punctuation and whitespace. Thus, Gary's value
queries should work unfiltered.

 

There is an email by Mary Holstege supporting my assumption:
http://developer.marklogic.com/pipermail/general/2013-March/012552.html

 

Cheers,

Andreas

 

 

 

2017-02-28 13:58 GMT+01:00 Geert Josten <Geert.Josten at marklogic.com>:

Hi Gary,

 

Sounds like you are running an unfiltered search. Either enable filtering to
get rid of false positives, or switch to using element-range-query (which
requires a range index). Keep in mind that value-queries don't use range
indexes (even if available), but rely on the universal index, which contains
tokens, not values..

 

Cheers,

Geert

 

From: <general-bounces at developer.marklogic.com> on behalf of Gary Larsen
<gary.larsen at envisn.com>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com>
Date: Monday, February 27, 2017 at 10:01 PM
To: 'General MarkLogic Developer Discussion'
<general at developer.marklogic.com>
Subject: [MarkLogic Dev General] hyphens and cts:element-value-query

 

I'm trying to get this cts query to treat hyphens as text:

 

cts:element-value-query(xs:QName(ename), 'value 1', 'exact')

cts:element-value-query(xs:QName(ename), 'value-1', 'exact')

 

Even though the ename  value-1 does not exist a match is found.   

 

Thanks,

Gary


_______________________________________________
General mailing list
General at developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20170228/e398f427/attachment-0001.html 


More information about the General mailing list