[MarkLogic Dev General] Struggling with Query Time Out (I'm satisfied)

Treskon, Matthew Matthew.Treskon at ARS.USDA.GOV
Tue Mar 20 11:44:46 PDT 2012


Betty,

If you have any flexibility in your data, (certainly not down to the millisecond), maybe you could create just a date element. I'd think that your queries against a range index that has 365*[years] would be a lot more efficient than running against a range index with as many 'rows' (quote unquote) as you have records.


Matthew




-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Betty Harvey
Sent: Tuesday, March 20, 2012 2:35 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Struggling with Query Time Out (I'm satisfied)

The raw command is now down to 31  minutes and brought back 74,447
results.  I can live with that since this is going to be a monthly report.
 It will be interesting to see if it increases more when I add more
variables and include the Excel vocabulary.

This is the final query:

for $ACE in
cts:search(xdmp:directory('/opt/MOR/ACE/')/descendant::ns1:ACE/ns1:ModifiedDate,
     cts:and-query((
       cts:element-range-query (xs:QName('ns1:ModifiedDate'), '>',
xs:dateTime('2011-04-01T16:00:00.00') ),
       cts:element-range-query (xs:QName('ns1:ModifiedDate'), '<',
xs:dateTime('2011-05-01T16:00:00.00') )
)) )

Thanks everyone for your help and advice!!!

Betty

> It may be frustrating, but I'd say you are making progress. The old query
> might have taken 8 hours and this one might take 90 minutes, for example.
> Both might time out, but the new query is searchable and that's an
> improvement.
>
> How many documents will this query return, and what are you trying to do
> with them? You can get the match count via xdmp:estimate(cts:search(...))
> around your cts:search below.
>
> If you want to prove out your query and see how long it takes for a subset
> of your inputs, you could add a positional predicate outside the
> cts:search call:
>
> for $ACE in cts:search(
>   xdmp:directory('/opt/MOR/ACE/')/descendant::ns1:ACE/ns1:ModifiedDate,
>   cts:element-range-query(
>     xs:QName('ns1:ModifiedDate'), '<=',
> xs:dateTime('2011-04-01T16:00:00.00') ) )[1 to 10]
> ...
>
> But it may be that the '...' is the important part. Are you simply trying
> to get all the values? If so, you can read them directly from the range
> index:
>
> cts:element-values(
>   (),
>   (),
>   cts:and-query(
>     (cts:directory-query('/opt/MOR/ACE/', 'infinity'),
>      cts:element-query(
>        xs:QName('ns1:ACE/ns1:ModifiedDate'),
>        cts:element-range-query(
>          xs:QName('ns1:ModifiedDate'),
>          '<=', xs:dateTime('2011-04-01T16:00:00.00'))))))
>
> -- Mike
>
> On 20 Mar 2012, at 08:32 , Betty Harvey wrote:
>
>> Hi Evan:
>>
>> This is a great tool.  I ran the command and the predicate doesn't work.
>> I decided to try another approach and use another element that is
>> indexed
>> but there is only 1 in each object.  There can be up to 20 Events in a
>> object.    I have tried running both in CQ and http application and both
>> time out.
>>
>> The xdmp:plan command says it is fully searchable.  I am obviously
>> doing
>> something wrong.
>>
>> for $ACE in
>> cts:search(xdmp:directory('/opt/MOR/ACE/')/descendant::ns1:ACE/ns1:ModifiedDate,
>>       cts:element-range-query (xs:QName('ns1:ModifiedDate'), '<=',
>> xs:dateTime('2011-04-01T16:00:00.00') ) )
>>
>>
>> Thanks again!
>>
>> Betty
>>
>>> As a quick tip, Betty, you can easily check whether a given expression
>>> is
>>> searchable or not by using Query Console. I just ran this:
>>>
>>> declare namespace ns1="whatever";
>>> xdmp:plan(
>>> collection()/descendant::ns1:ACE/ns1:EventSet/ns1:GeneralEvent[1]
>>> )
>>>
>>> Whose output included this:
>>>  <qry:info-trace>Step 4 is unsearchable:
>>> ns1:GeneralEvent[1]</qry:info-trace>
>>>
>>> This tells me where the problem is, and verifies Mike's suspicion.
>>>
>>> Evan
>>>
>>>
>>> On 3/19/12 4:16 PM, "Michael Blakeley"
>>> <mike at blakeley.com<mailto:mike at blakeley.com>> wrote:
>>>
>>> Betty, I think it's the '[1]' that makes that expression unsearchable.
>>> Normally the XPath indexes simply record the presence of elements, not
>>> their position.
>>>
>>> -- Mike
>>>
>>> On 16 Mar 2012, at 15:03 , Betty Harvey wrote:
>>>
>>> Thanks!!!
>>> I set an element range index on the main database and have apparently
>>> run
>>> out of disk space - I will deal with that issue later. It is running on
>>> a
>>> VM machine.
>>> I also set a range index on EventDate in the 'documents' database for
>>> test
>>> purposes.   I rewrote the query to use cts:search and it comes back on
>>> the
>>> 'documents' database that the "Expression is unsearchable" so it looks
>>> like
>>> I am not sure what this error message means but I think it might not be
>>> recognizing the range index.
>>> Am I missing something significant.   The documents have 3 namespaces.
>>> The EventDate is in the 'ns1' namespace.  I only used one
>>> cts:element-range-query as a test.
>>> Revised test code:
>>> for $ACE in
>>> cts:search(collection()/descendant::ns1:ACE/ns1:EventSet/ns1:GeneralEvent[1],
>>>     cts:element-range-query (xs:QName('EventDate'), '<',
>>> xs:dateTime('2011-03-01T00:00:00') ) )
>>> let $ACEId := $ACE/ancestor::ns1:ACE/ns1:ACEId
>>> let $EventDate := $ACE/ns1:EventDate
>>> return
>>> <a>
>>> {$ACEId}
>>> {$EventDate}
>>> <time>{xdmp:elapsed-time()}</time>
>>> </a>
>>> Hi Betty,
>>> Using a cts:search like David suggests could speed up considerably,
>>> indeed. You can use xdmp:directory as searchable expression, I thought,
>>> but you can also add it to the query part using cts:directory-query.
>>> Note though that if you rewrite the date predicates to
>>> cts:element-range-query's, that it may make a lot of difference whether
>>> ACE is a fragment root or not. If you include /descendant::ACE in your
>>> searchable path, then the end result is filtered to make sure each ACE
>>> matches the query, but there could be a lot of false positives (and
>>> hence
>>> xdmp:estimate could return a too high value).
>>> Kind regards,
>>> Geert
>>> -----Oorspronkelijk bericht-----
>>> Van:
>>> general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>
>>> [mailto:general-
>>> bounces at developer.marklogic.com<mailto:bounces at developer.marklogic.com>]
>>> Namens David Lee
>>> Verzonden: vrijdag 16 maart 2012 19:54
>>> Aan: MarkLogic Developer Discussion
>>> Onderwerp: Re: [MarkLogic Dev General] Struggling with Query Time Out
>>> First off cts:search is exactly what you want for this.
>>> Second you are doing string compares against datetime values.  To help
>>> with this
>>> you may need to create a range index  on EventDate and compare against
>>> xs:dateTime('xxxxxx')
>>> Thirdly your doing a directory search which you might not actually need
>>> if these
>>> documents are in know namespaces.
>>> But hold off on that until you get the first two worked out.
>>> cts:search() is really your friend in this case, but you do want to
>>> make
>>> a range
>>> index so that the system knows the values are dates otherwise "gt" will
>>> do string
>>> not date comparisons
>>> Once you get both those working your searches should be nearly instant.
>>> --------------------------------------------------------------------------
>>> ---
>>> David Lee
>>> Lead Engineer
>>> MarkLogic Corporation
>>> dlee at marklogic.com<mailto:dlee at marklogic.com>
>>> Phone: +1 650-287-2531
>>> Cell:  +1 812-630-7622
>>> www.marklogic.com
>>> This e-mail and any accompanying attachments are confidential. The
>>> information is intended solely for the use of the individual to whom it
>>> is
>>> addressed. Any review, disclosure, copying, distribution, or use of
>>> this
>>> e-mail
>>> communication by others is strictly prohibited. If you are not the
>>> intended
>>> recipient, please notify us immediately by returning this message to
>>> the
>>> sender
>>> and delete all copies. Thank you for your cooperation.
>>> -----Original Message-----
>>> From:
>>> general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>
>>> [mailto:general-
>>> bounces at developer.marklogic.com<mailto:bounces at developer.marklogic.com>]
>>> On Behalf Of Betty Harvey
>>> Sent: Friday, March 16, 2012 3:17 PM
>>> To: MarkLogic Developer Discussion
>>> Subject: [MarkLogic Dev General] Struggling with Query Time Out
>>> I have been unable to get this query to run successfully without
>>> timing
>>> out.  To make sure my logic was correct I placed 100 documents in the
>>> 'documents' database and query runs successfully and very quickly. In
>>> the
>>> large database 1.7 million objects the query always times out.
>>> I am not sure cts:search will help.  I played around with it without
>>> success.   The goal of the query is to gather information for a
>>> particular
>>> month based on when the document was created.   Below is the code:
>>> for $ACE in xdmp:directory('opt/MOR/ACE/')/descendant::ACE
>>>   [EventSet/GeneralEvent[1]/EventDate gt '2011-03-01T00:00:00']
>>>   [EventSet/era:GeneralEvent[1]/EventDate lt '2011-04-01T00:00:00']
>>> let $ACEId := $ACE/ACEId
>>> let $EventDate := $ACE/EventSet/era:GeneralEvent[1]/era:EventDate
>>> return
>>> <a>
>>> {$ACEId}
>>> {$EventDate}
>>> </a>
>>> Any ideas are appreciated!
>>> Betty
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
>>> Betty Harvey                         | Phone:  410-787-9200  FAX: 9830
>>> Electronic Commerce Connection, Inc. |
>>> harvey at eccnet.com<mailto:harvey at eccnet.com>                    |
>>> Washington,DC XML Users Grp
>>> URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug
>>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/
>>> Member of XML Guild (www.xmlguild.org)
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>>
>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
>> Betty Harvey                         | Phone:  410-787-9200  FAX: 9830
>> Electronic Commerce Connection, Inc. |
>> harvey at eccnet.com                    | Washington,DC XML Users Grp
>> URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug
>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/
>> Member of XML Guild (www.xmlguild.org)
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>


/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
Betty Harvey                         | Phone:  410-787-9200  FAX: 9830
Electronic Commerce Connection, Inc. |
harvey at eccnet.com                    | Washington,DC XML Users Grp
URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/
Member of XML Guild (www.xmlguild.org)
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.



More information about the General mailing list