[MarkLogic Dev General] Search result estimation: issue with json array structure

APEL Holger APEL at iso.org
Fri Sep 1 00:28:24 PDT 2017


Thank you James for pointing me in the right direction. I knew there would be an index option to eliminate my false positives. With positions turned on the counts are all good now

Regards,
Holger

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of James Kerr
Sent: 2017-08-31 18:18
To: MarkLogic Developer Discussion <general at developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Search result estimation: issue with json array structure

I’m not sure if you figured this out yet but you are likely running into a filtered vs. unfiltered issue now. The Java search client runs queries unfiltered by default as it is a best practice to be able to resolve queries from the indexes without filtering.

Since you are using container queries though, you will need positions turned on for your indexes so it can resolve the nested structure without filtering.

You will want to turn on “word positions”, “element word positions” and “element value positions” to support resolving these types of queries unfiltered. See this knowledgebase article https://help.marklogic.com/knowledgebase/article/View/245/0/queries-constrained-to-elements as well as the “Usage Notes” for https://docs.marklogic.com/cts.elementQuery and https://docs.marklogic.com/cts:json-property-scope-query for details.

-James

From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of APEL Holger <APEL at iso.org<mailto:APEL at iso.org>>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Date: Tuesday, August 22, 2017 at 4:05 AM
To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Search result estimation: issue with json array structure

Ah yes, I got the cts.andQuery parameter wrong. But what I really want to do is using the Java API to query my pojos

StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition q = qb.containerQuery(qb.jsonProperty("stages"),
    qb.and(
        qb.value(qb.jsonProperty("status"), "CURRENT"),
        qb.value(qb.jsonProperty("stageId"), 9999)
    ));

DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8082, new DigestAuthContext("admin", "admin"));

SearchHandle result = client.newQueryManager().search(q, new SearchHandle());
logger.info("query: {}", q.serialize());
logger.info("returned: {}", result.getTotalResults());

The serialized query is:
<query xmlns="http://marklogic.com/appservices/search">
  <container-query>
    <json-property>stages</json-property>
    <and-query>
      <value-query type="string">
        <json-property>status</json-property>
        <text>CURRENT</text>
      </value-query>
      <value-query type="number">
        <json-property>stageId</json-property>
        <text>9999</text>
      </value-query>
    </and-query>
  </container-query>
</query>

And totalResults: 20

From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com] On Behalf Of James Kerr
Sent: 2017-08-19 05:58
To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Search result estimation: issue with json array structure

The function signature for cts.andQuery accepts an array. You are using () instead of [] around your sub-queries. This should work:

fn.count(
  cts.search(
      cts.jsonPropertyScopeQuery("stages",
         cts.andQuery([
            cts.jsonPropertyValueQuery("status", "CURRENT"),
            cts.jsonPropertyValueQuery("stageId", 9999)
         ])
      )
  , 'filtered')
);


From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of APEL Holger <APEL at iso.org<mailto:APEL at iso.org>>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Date: Friday, August 18, 2017 at 5:05 AM
To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: [MarkLogic Dev General] Search result estimation: issue with json array structure

Hello community,

I stumbled over a use case where the total result count of a query is wrong.

Here or set of test data

declareUpdate();
for (i = 0; i < 10; i++) {
  xdmp.documentInsert(
  "/a" + i + ".json",
  {
    "project": {
        "stages": [{"stageId": 9999, "status": "CURRENT"},
                   {"stageId": 9999, "status": "CLOSED"}]
    }
  }
  );
  xdmp.documentInsert(
         "/b" + i + ".json",
  {
    "project": {
        "stages": [{"stageId": 9998, "status": "CURRENT"},
                   {"stageId": 9999, "status": "CLOSED"}]
    }
  }
  );
};

fn.count(
  cts.search(
      cts.jsonPropertyScopeQuery("stages",
         cts.andQuery(
            (cts.jsonPropertyValueQuery("status", "CURRENT"),
             cts.jsonPropertyValueQuery("stageId", 9999))
         )
      )
  , 'filtered')
);

Returns 20 but in xquery

fn:count(
  cts:search(/,
      cts:json-property-scope-query("stages",
         cts:and-query(
            (cts:json-property-value-query("status", "CURRENT"),
             cts:json-property-value-query("stageId", 9999))
         )
     )
  )
)

correctly returns 10

I guess cts.search uses the search:* module behind the scenes because search:resolve gives me the same result doing an equivalent query. So it seems the problem is result estimation … I know search:resolve uses xdmp:estimate and xdmp:remainder and replacing fn:count with xdmp:estimate

xdmp:estimate(
  cts:search(/,
      cts:json-property-scope-query("stages",
         cts:and-query(
            (cts:json-property-value-query("status", "CURRENT"),
             cts:json-property-value-query("stageId", 9999))
         )
     )
  )
)

also gives me the 20.

In our use case the data set is rather small so the wrong estimates are very noticeable and not acceptable.
So my questions: is there a way to get the right count?

·         By tuning some indexes

·         Using additional query-options

·         Changing our query or even data model if there is no other way

Any hint is welcome

holger apel
software manager | information technology and electronic services | iso central secretariat<http://www.iso.org/iso/contact_iso>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20170901/7f54072f/attachment-0001.html 


More information about the General mailing list