[MarkLogic Dev General] Struggling with Query Time Out (I'm satisfied)

Betty Harvey harvey at eccnet.com
Tue Mar 20 11:35:19 PDT 2012


The raw command is now down to 31  minutes and brought back 74,447
results.  I can live with that since this is going to be a monthly report.
 It will be interesting to see if it increases more when I add more
variables and include the Excel vocabulary.

This is the final query:

for $ACE in
cts:search(xdmp:directory('/opt/MOR/ACE/')/descendant::ns1:ACE/ns1:ModifiedDate,
     cts:and-query((
       cts:element-range-query (xs:QName('ns1:ModifiedDate'), '>',
xs:dateTime('2011-04-01T16:00:00.00') ),
       cts:element-range-query (xs:QName('ns1:ModifiedDate'), '<',
xs:dateTime('2011-05-01T16:00:00.00') )
)) )

Thanks everyone for your help and advice!!!

Betty

> It may be frustrating, but I'd say you are making progress. The old query
> might have taken 8 hours and this one might take 90 minutes, for example.
> Both might time out, but the new query is searchable and that's an
> improvement.
>
> How many documents will this query return, and what are you trying to do
> with them? You can get the match count via xdmp:estimate(cts:search(...))
> around your cts:search below.
>
> If you want to prove out your query and see how long it takes for a subset
> of your inputs, you could add a positional predicate outside the
> cts:search call:
>
> for $ACE in cts:search(
>   xdmp:directory('/opt/MOR/ACE/')/descendant::ns1:ACE/ns1:ModifiedDate,
>   cts:element-range-query(
>     xs:QName('ns1:ModifiedDate'), '<=',
> xs:dateTime('2011-04-01T16:00:00.00') ) )[1 to 10]
> ...
>
> But it may be that the '...' is the important part. Are you simply trying
> to get all the values? If so, you can read them directly from the range
> index:
>
> cts:element-values(
>   (),
>   (),
>   cts:and-query(
>     (cts:directory-query('/opt/MOR/ACE/', 'infinity'),
>      cts:element-query(
>        xs:QName('ns1:ACE/ns1:ModifiedDate'),
>        cts:element-range-query(
>          xs:QName('ns1:ModifiedDate'),
>          '<=', xs:dateTime('2011-04-01T16:00:00.00'))))))
>
> -- Mike
>
> On 20 Mar 2012, at 08:32 , Betty Harvey wrote:
>
>> Hi Evan:
>>
>> This is a great tool.  I ran the command and the predicate doesn't work.
>> I decided to try another approach and use another element that is
>> indexed
>> but there is only 1 in each object.  There can be up to 20 Events in a
>> object.    I have tried running both in CQ and http application and both
>> time out.
>>
>> The xdmp:plan command says it is fully searchable.  I am obviously
>> doing
>> something wrong.
>>
>> for $ACE in
>> cts:search(xdmp:directory('/opt/MOR/ACE/')/descendant::ns1:ACE/ns1:ModifiedDate,
>>       cts:element-range-query (xs:QName('ns1:ModifiedDate'), '<=',
>> xs:dateTime('2011-04-01T16:00:00.00') ) )
>>
>>
>> Thanks again!
>>
>> Betty
>>
>>> As a quick tip, Betty, you can easily check whether a given expression
>>> is
>>> searchable or not by using Query Console. I just ran this:
>>>
>>> declare namespace ns1="whatever";
>>> xdmp:plan(
>>> collection()/descendant::ns1:ACE/ns1:EventSet/ns1:GeneralEvent[1]
>>> )
>>>
>>> Whose output included this:
>>>  <qry:info-trace>Step 4 is unsearchable:
>>> ns1:GeneralEvent[1]</qry:info-trace>
>>>
>>> This tells me where the problem is, and verifies Mike's suspicion.
>>>
>>> Evan
>>>
>>>
>>> On 3/19/12 4:16 PM, "Michael Blakeley"
>>> <mike at blakeley.com<mailto:mike at blakeley.com>> wrote:
>>>
>>> Betty, I think it's the '[1]' that makes that expression unsearchable.
>>> Normally the XPath indexes simply record the presence of elements, not
>>> their position.
>>>
>>> -- Mike
>>>
>>> On 16 Mar 2012, at 15:03 , Betty Harvey wrote:
>>>
>>> Thanks!!!
>>> I set an element range index on the main database and have apparently
>>> run
>>> out of disk space - I will deal with that issue later. It is running on
>>> a
>>> VM machine.
>>> I also set a range index on EventDate in the 'documents' database for
>>> test
>>> purposes.   I rewrote the query to use cts:search and it comes back on
>>> the
>>> 'documents' database that the "Expression is unsearchable" so it looks
>>> like
>>> I am not sure what this error message means but I think it might not be
>>> recognizing the range index.
>>> Am I missing something significant.   The documents have 3 namespaces.
>>> The EventDate is in the 'ns1' namespace.  I only used one
>>> cts:element-range-query as a test.
>>> Revised test code:
>>> for $ACE in
>>> cts:search(collection()/descendant::ns1:ACE/ns1:EventSet/ns1:GeneralEvent[1],
>>>     cts:element-range-query (xs:QName('EventDate'), '<',
>>> xs:dateTime('2011-03-01T00:00:00') ) )
>>> let $ACEId := $ACE/ancestor::ns1:ACE/ns1:ACEId
>>> let $EventDate := $ACE/ns1:EventDate
>>> return
>>> <a>
>>> {$ACEId}
>>> {$EventDate}
>>> <time>{xdmp:elapsed-time()}</time>
>>> </a>
>>> Hi Betty,
>>> Using a cts:search like David suggests could speed up considerably,
>>> indeed. You can use xdmp:directory as searchable expression, I thought,
>>> but you can also add it to the query part using cts:directory-query.
>>> Note though that if you rewrite the date predicates to
>>> cts:element-range-query's, that it may make a lot of difference whether
>>> ACE is a fragment root or not. If you include /descendant::ACE in your
>>> searchable path, then the end result is filtered to make sure each ACE
>>> matches the query, but there could be a lot of false positives (and
>>> hence
>>> xdmp:estimate could return a too high value).
>>> Kind regards,
>>> Geert
>>> -----Oorspronkelijk bericht-----
>>> Van:
>>> general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>
>>> [mailto:general-
>>> bounces at developer.marklogic.com<mailto:bounces at developer.marklogic.com>]
>>> Namens David Lee
>>> Verzonden: vrijdag 16 maart 2012 19:54
>>> Aan: MarkLogic Developer Discussion
>>> Onderwerp: Re: [MarkLogic Dev General] Struggling with Query Time Out
>>> First off cts:search is exactly what you want for this.
>>> Second you are doing string compares against datetime values.  To help
>>> with this
>>> you may need to create a range index  on EventDate and compare against
>>> xs:dateTime('xxxxxx')
>>> Thirdly your doing a directory search which you might not actually need
>>> if these
>>> documents are in know namespaces.
>>> But hold off on that until you get the first two worked out.
>>> cts:search() is really your friend in this case, but you do want to
>>> make
>>> a range
>>> index so that the system knows the values are dates otherwise "gt" will
>>> do string
>>> not date comparisons
>>> Once you get both those working your searches should be nearly instant.
>>> --------------------------------------------------------------------------
>>> ---
>>> David Lee
>>> Lead Engineer
>>> MarkLogic Corporation
>>> dlee at marklogic.com<mailto:dlee at marklogic.com>
>>> Phone: +1 650-287-2531
>>> Cell:  +1 812-630-7622
>>> www.marklogic.com
>>> This e-mail and any accompanying attachments are confidential. The
>>> information is intended solely for the use of the individual to whom it
>>> is
>>> addressed. Any review, disclosure, copying, distribution, or use of
>>> this
>>> e-mail
>>> communication by others is strictly prohibited. If you are not the
>>> intended
>>> recipient, please notify us immediately by returning this message to
>>> the
>>> sender
>>> and delete all copies. Thank you for your cooperation.
>>> -----Original Message-----
>>> From:
>>> general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>
>>> [mailto:general-
>>> bounces at developer.marklogic.com<mailto:bounces at developer.marklogic.com>]
>>> On Behalf Of Betty Harvey
>>> Sent: Friday, March 16, 2012 3:17 PM
>>> To: MarkLogic Developer Discussion
>>> Subject: [MarkLogic Dev General] Struggling with Query Time Out
>>> I have been unable to get this query to run successfully without
>>> timing
>>> out.  To make sure my logic was correct I placed 100 documents in the
>>> 'documents' database and query runs successfully and very quickly. In
>>> the
>>> large database 1.7 million objects the query always times out.
>>> I am not sure cts:search will help.  I played around with it without
>>> success.   The goal of the query is to gather information for a
>>> particular
>>> month based on when the document was created.   Below is the code:
>>> for $ACE in xdmp:directory('opt/MOR/ACE/')/descendant::ACE
>>>   [EventSet/GeneralEvent[1]/EventDate gt '2011-03-01T00:00:00']
>>>   [EventSet/era:GeneralEvent[1]/EventDate lt '2011-04-01T00:00:00']
>>> let $ACEId := $ACE/ACEId
>>> let $EventDate := $ACE/EventSet/era:GeneralEvent[1]/era:EventDate
>>> return
>>> <a>
>>> {$ACEId}
>>> {$EventDate}
>>> </a>
>>> Any ideas are appreciated!
>>> Betty
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
>>> Betty Harvey                         | Phone:  410-787-9200  FAX: 9830
>>> Electronic Commerce Connection, Inc. |
>>> harvey at eccnet.com<mailto:harvey at eccnet.com>                    |
>>> Washington,DC XML Users Grp
>>> URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug
>>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/
>>> Member of XML Guild (www.xmlguild.org)
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>>
>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
>> Betty Harvey                         | Phone:  410-787-9200  FAX: 9830
>> Electronic Commerce Connection, Inc. |
>> harvey at eccnet.com                    | Washington,DC XML Users Grp
>> URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug
>> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/
>> Member of XML Guild (www.xmlguild.org)
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>


/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
Betty Harvey                         | Phone:  410-787-9200  FAX: 9830
Electronic Commerce Connection, Inc. |
harvey at eccnet.com                    | Washington,DC XML Users Grp
URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\/\/
Member of XML Guild (www.xmlguild.org)


More information about the General mailing list