[MarkLogic Dev General] Is a cts:query looking at specific attributes only within the same element possible?

Dan Meyers Dan.Meyers at bbc.co.uk
Thu Jul 16 02:12:06 PDT 2015


Thanks to both Chris and Mary for their responses :). With their help I’ve now got something that appears to work with my small test dataset, and am just hoping to confirm my understanding. I’ve altered the structure of our documents such that the ‘inherited’ attribute is always present, and have inserted the id attribute as text within each relationship as well, as follows:

Doc 1:
<test uuid="X" >
<relationships>
<relationship id="one” inherited="false”>one</relationship>
<relationship id="two" inherited="true">two</relationship>
</relationships>
</test>

Doc 2:
<test uuid="Y">
<relationships>
<relationship id="one" inherited="false">one</relationship>
<relationship id="three" inherited="false">three</relationship>
</relationships>
</test>

Doc 3:
<test uuid="Z">
<relationships>
<relationship id="one" inherited="true">one</relationship>
<relationship id="four" inherited="false">four</relationship>
</relationships>
</test>

This document structure means I can now do a cts:and-query on the id I want and inherited being false, rather than an and-not query to account for inherited missing being treated as false. This query returns the correct results on my test dataset, when I use an and-query wrapped in an element-query.

>From Mary’s update, if I am understanding correctly, inserting the id as text within the relationship element also allows this query to run more efficiently, as long as the word position index is turned on for this database. Our largest documents currently have a little under 4000 relationships (most are under 1000), and the database does not currently have the word position index turned on, so we’ll have to evaluate how long it would take to reindex, how much extra disk space that would take up, performance impacts elsewhere etc. We’ll also see how well the query performs with our larger documents.

Thanks to all

Dan

From: Mary Holstege <Mary.Holstege at marklogic.com<mailto:Mary.Holstege at marklogic.com>>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Date: Wednesday, 15 July 2015 14:59
To: "general at developer.marklogic.com<mailto:general at developer.marklogic.com>" <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Is a cts:query looking at specific attributes only within the same element possible?


There are a couple things going on here:
(1) Queries do matching per fragment, so if you do an and query of two value queries or range queries, there is no constraint that the relationship elements be the same instance in the fragment.

(2) Wrapping an element-query on relationship around the and-query will constrain things to be happening within the same instance.

cts:element-query(xs:QName("relationship"),
  cts:and-query((
    cts:element-attribute-value-query(xs:QName("relationship"),xs:QName("id"),"one","exact"),
    cts:element-attribute-value-query(xs:QName("relationship"),xs:QName("inherited"),"true","exact"))))

If you have the right positions enabled, the indexes can resolve this without the filter (although if you have a lot of relationship instances in a document this can be quite expensive)... except:

(3) Empty element positions are problematic. Positions are word positions, and the position of an element is the word position of the first word when the element starts to the word position of the first word after the element ends. Positions of attributes are the positions of their element. If everything is an empty element, you have no words and everything has the same position: 0 to 1 and so positions cannot discriminate between what is happening in one instance of relationship and the next, and you have to rely on filtered search to get you the answer, at which point expressing this as an XPath is a lot less verbose.  You can force positions to work by making sure there is at least one word somehow, always, between relationship elements.

//Mary
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20150716/830b599a/attachment.html 


More information about the General mailing list