[MarkLogic Dev General] Query times out, and page-limit constraints

Michael Blakeley mike at blakeley.com
Wed Mar 21 09:35:42 PDT 2012


Like Damon, I suspect that the /item... bit is the expensive part, probably because of the database lookup inside the predicate.

But why use search:search() at all? It seems to me that you aren't really using any of the search:search features, and would be better off using cts:query directly.

for $n in cts:search(
  doc(),
  cts:and-query(
    (cts:element-attribute-range-query(
       xs:QName("journal_cite"),
       xs:QName("datetimeRecevied"), ">", $dateTime),
     cts:collection-query('/citation/type/journal_cite'))))[1 to 100]
let $doi := $n/citation/target_doi/data(.)
return element result {
  $n,
  /item[@doi = $doi] }

If you only wanted the URIs you could also use cts:uris() - but since you are fetching the documents anyway, cts:search is probably a little more efficient.

-- Mike

On 21 Mar 2012, at 06:02 , Damon Feldman wrote:

> Dean,
>  
>  
> Try removing sub-expressions until you have a minimal expression that shows the problem. E.g. does just the search:search() call work, without any other code?
>  
> I suspect your XPath: /item[@doi= <expr>] may be the problem. If the <expr> is not properly formed (e.g. missing a namespace, misspelled element) you could get many items retrieved rather than just one, and that would be executed for every result.
>  
> Retrieving all results is not possible with search:search() to avoid huge queries, which would time out. If your DB only has, say, 10,000 items and you think they can come back in a single call, set the page size to 10,000. If you think you can get 20MM in one shot, use that limit, etc. The idea is to force the developer to think hard about how big that result set can be and explicitly set the limits.
>  
> Yours,
> Damon
>  
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Dean Pullen
> Sent: Wednesday, March 21, 2012 7:37 AM
> To: general at developer.marklogic.com
> Subject: [MarkLogic Dev General] Query times out, and page-limit constraints
>  
> Hi all,
> 
> I'm relatively new to Marklogic and have an example query to debug (at end of email).
> 
> I've got two questions - 
> 1) Why does this time out? We have a lot of data, but it's constrained by a page-limit of 100 and by the given $dateTimeStr variable. Should I be searching using some other method, like a constraint or what? You'll notice the search:search first parm is '' i.e. blank.
> 
> 2) How can I (once question 1 is 'fixed') retrieve ALL results and not just the number of pages specified by the page-limit.
> 
> 
> Many thanks,
> 
> Dean.
> 
> 
> QUERY:
> 
> 
> xquery version "1.0-ml";
> 
> import module namespace search = "http://marklogic.com/appservices/search"
> at "/MarkLogic/appservices/search/search.xqy";
> 
> declare variable $dateTimeStr as xs:string := '2012-02-01T18:43:30.728';
> 
> 
> Ddeclare function local:retrieveiteminfo($result as element(citation)) as element(search:result) {
> 
>     <search:result>
>         {/item[@doi = $result/target_doi]}
>     </search:result>
> };
> 
> let $dateTime := xs:dateTime($dateTimeStr)
> 
> let $top-citation-results:= search:search('',
>     <options xmlns="http://marklogic.com/appservices/search">
>         <additional-query>{
>             cts:and-query((
>             cts:element-attribute-range-query(xs:QName("journal_cite"),
>             xs:QName("datetimeRecevied"), ">", $dateTime),
>             cts:collection-query('/citation/type/journal_cite')
>             ))
>     
>             }
>         </additional-query>
>     </options>,
>             1,
>             100
>             )
> 
> 
>         for $uri in $top-citation-results/search:result/@uri
>         return
> <search:result>{fn:doc($uri)}{/item[@doi = fn:doc($uri)/citation/target_doi]}</search:result>
> 
> 
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list