[MarkLogic Dev General] Inconsistent results between facets and search results using search:search

Roland Priborsky broukpitlik at gmail.com
Fri May 17 07:26:03 PDT 2013


If I can help you somehow let me know....


2013/5/17 Mauricio Valderrama Fonseca <mauvafo at gmail.com>

> Thanks Michael!,
>
> I just made a quick test, and using the element-word-query instead of a
> element-query/word-query the facets are calculated well.
>
>
> On Thu, May 16, 2013 at 9:17 PM, Michael Blakeley <mike at blakeley.com>wrote:
>
>> In between search:parse and search:resolve, *you* are in control of the
>> query. You can do anything you like. To pick one possibility at random, you
>> can rewrite the query to be more efficient. Depending on the situation you
>> might need to transform the query in fairly complex ways, but you have the
>> right tools to do that.
>>
>> You may find it helpful to know how to convert back and forth between
>> cts:query items and cts:query XML using 'document { ... }' in one
>> direction, and 'cts:query(...xml...)/*' in the other. If you prefer you can
>> work with queries entirely as XML, but I usually stick with the cts:query
>> API.
>>
>> Here is a crude example, which could be improved and extended:
>>
>> import module namespace search = "http://marklogic.com/appservices/search
>> "
>>      at "/MarkLogic/appservices/search/search.xqy";
>> import module namespace dls = "http://marklogic.com/xdmp/dls"
>>                   at "/MarkLogic/dls.xqy";
>> declare namespace ft="http://thecompany/facet" ;
>>
>> declare function local:rewrite(
>>   $q as cts:query)
>> as cts:query
>> {
>>   typeswitch ($q)
>>   case cts:or-query return cts:or-query(
>>     local:rewrite(cts:or-query-queries($q)))
>>   case cts:word-query return cts:element-word-query(
>>     xs:QName('ft:author'),
>>     cts:word-query-text($q),
>>     cts:word-query-options($q))
>>   default return $q
>> };
>>
>> <cts:and-query xmlns:cts="http://marklogic.com/cts">
>>     { dls:documents-query() }
>>     <cts:not-query>
>>       <cts:collection-query>
>>         <cts:uri>error</cts:uri>
>>         <cts:uri>collection-A</cts:uri>
>>         <cts:uri>collection-B</cts:uri>
>>       </cts:collection-query>
>>     </cts:not-query>
>>     <cts:collection-query>
>>       <cts:uri>collection-C</cts:uri>
>>       <cts:uri>collection-D</cts:uri>
>>     </cts:collection-query>
>>     <cts:properties-query>
>> {
>>   document { local:rewrite(cts:query(search:parse('foo OR bar'))) }/*
>> }
>>     </cts:properties-query>
>>   </cts:and-query>
>> =>
>> <cts:and-query xmlns:cts="http://marklogic.com/cts">
>>   <cts:properties-query>
>>     <cts:registered-query>
>>       <cts:id>3813349421842441358</cts:id>
>>     </cts:registered-query>
>>   </cts:properties-query>
>>   <cts:not-query>
>>     <cts:collection-query>
>>       <cts:uri>error</cts:uri>
>>       <cts:uri>collection-A</cts:uri>
>>       <cts:uri>collection-B</cts:uri>
>>     </cts:collection-query>
>>   </cts:not-query>
>>   <cts:collection-query>
>>     <cts:uri>collection-C</cts:uri>
>>     <cts:uri>collection-D</cts:uri>
>>   </cts:collection-query>
>>   <cts:properties-query>
>>     <cts:or-query>
>>       <cts:element-word-query>
>>         <cts:element xmlns:ft="http://thecompany/facet
>> ">ft:author</cts:element>
>>         <cts:text xml:lang="en">foo</cts:text>
>>       </cts:element-word-query>
>>       <cts:element-word-query>
>>         <cts:element xmlns:ft="http://thecompany/facet
>> ">ft:author</cts:element>
>>         <cts:text xml:lang="en">bar</cts:text>
>>       </cts:element-word-query>
>>     </cts:or-query>
>>   </cts:properties-query>
>> </cts:and-query>
>>
>> -- Mike
>>
>> On 16 May 2013, at 14:51 , Mauricio Valderrama Fonseca <mauvafo at gmail.com>
>> wrote:
>>
>> > I forgot to say the version, I'm using 5.0-5
>> >
>> > Sorry for the confusion, from the initial $query:
>> >
>> > let $query := <cts:and-query xmlns:cts="http://marklogic.com/cts">
>> >     { dls:documents-query() }
>> >     <cts:not-query>
>> >       <cts:collection-query>
>> >       <cts:uri>error</cts:uri>
>> >       <cts:uri>collection-A</cts:uri>
>> >       <cts:uri>collection-B</cts:uri>
>> >       </cts:collection-query>
>> >     </cts:not-query>
>> >     <cts:collection-query>
>> >       <cts:uri>collection-C</cts:uri>
>> >       <cts:uri>collection-D</cts:uri>
>> >     </cts:collection-query>
>> >     <cts:properties-query>
>> >       <cts:element-query>
>> >       <cts:element xmlns:ft="http://thecompany/facet
>> ">ft:author</cts:element>
>> >       <cts:word-query qtextref="cts:text" xmlns:xs="
>> http://www.w3.org/2001/XMLSchema">
>> >         <cts:text>fiber</cts:text>
>> >         <cts:option>case-insensitive</cts:option>
>> >         <cts:option>diacritic-insensitive</cts:option>
>> >         <cts:option>punctuation-insensitive</cts:option>
>> >         <cts:option>whitespace-insensitive</cts:option>
>> >         <cts:option>stemmed</cts:option>
>> >         <cts:option>wildcarded</cts:option>
>> >       </cts:word-query>
>> >       </cts:element-query>
>> >     </cts:properties-query>
>> >   </cts:and-query>
>> >
>> > The bolded part (the word-query) is the result of a search:parse, like
>> this:
>> >
>> > let $options := <options xmlns="http://marklogic.com/appservices/search
>> ">
>> >  <term>
>> >      <term-option>case-insensitive</term-option>
>> >      <term-option>diacritic-insensitive</term-option>
>> >      <term-option>punctuation-insensitive</term-option>
>> >      <term-option>whitespace-insensitive</term-option>
>> >      <term-option>stemmed</term-option>
>> >      <term-option>wildcarded</term-option>
>> >    </term>
>> >  </options>
>> >
>> > return search:parse("fiber", $options)
>> >
>> > the query text could be more complex, like this: "fiber OR silicon",
>> and then it won't be a word-query and that's why I can't use the
>> > element-word-query you suggested me. The user inputs a text to find, he
>> can use the default grammar (AND, OR, parentisis, double quotes), and can
>> limit the elements to search (one or more than one), then dynamically the
>> element-queries are created and each one will use the parsed query text.
>> >
>> >
>> >
>> > On Thu, May 16, 2013 at 4:03 PM, Michael Blakeley <mike at blakeley.com>
>> wrote:
>> > I don't see it, unless maybe it's a release-specific bug. With 6.0-3:
>> >
>> > import module namespace search = "
>> http://marklogic.com/appservices/search"
>> >      at "/MarkLogic/appservices/search/search.xqy";
>> >
>> > search:parse(
>> > 'author:foo',
>> > <options xmlns="http://marklogic.com/appservices/search">
>> >    <return-results>true</return-results>
>> >    <constraint name="author">
>> >      <range type="xs:string">
>> >         <element ns="http://thecompany/facet" name="author"/>
>> >         <fragment-scope>properties</fragment-scope>
>> >         <facet-option>concurrent</facet-option>
>> >         <facet-option>item-order</facet-option>
>> >         <facet-option>ascending</facet-option>
>> >         <facet-option>limit=6</facet-option>
>> >      </range>
>> >    </constraint>
>> >    <term>
>> >      <term-option>case-insensitive</term-option>
>> >      <term-option>diacritic-insensitive</term-option>
>> >      <term-option>punctuation-insensitive</term-option>
>> >      <term-option>whitespace-insensitive</term-option>
>> >      <term-option>stemmed</term-option>
>> >      <term-option>wildcarded</term-option>
>> >    </term>
>> >  </options>)
>> > =>
>> > <cts:properties-query qtextref="schema-element(cts:query)" xmlns:cts="
>> http://marklogic.com/cts" xmlns:xs="http://www.w3.org/2001/XMLSchema">
>> >   <cts:element-range-query qtextpre="author:" qtextref="cts:annotation"
>> operator="=">
>> >     <cts:element xmlns:_1="http://thecompany/facet
>> ">_1:author</cts:element>
>> >     <cts:annotation qtextref="following-sibling::cts:value"/>
>> >     <cts:value xsi:type="xs:string" xmlns:xsi="
>> http://www.w3.org/2001/XMLSchema-instance">foo</cts:value>
>> >   </cts:element-range-query>
>> > </cts:properties-query>
>> >
>> > I copied those options verbatim from your email. But the query output
>> is different. I see an element-range query, not an element-query.
>> >
>> > Are you sure those options match up with your test case?
>> >
>> > -- Mike
>> >
>> > On 16 May 2013, at 11:57 , Mauricio Valderrama Fonseca <
>> mauvafo at gmail.com> wrote:
>> >
>> > > Hi Michael,
>> > >
>> > > The <cts:word-query..> in the script is the result of a search:parse
>> execution, then I cannot use the element-word-query. I need to get the
>> matches limiting the search to a specific element (in the document or in
>> the properties), and using the element-query retrieves the right results. I
>> tried using a constraint and it worked fine when the element was in the
>> document, but when the element is in the properties I could not make it
>> worked. Is there any other way to make it accurate instead of using the
>> "element word positions" index?
>> > >
>> > > Thanks
>> > >
>> > >
>> > > On Thu, May 16, 2013 at 12:14 PM, Michael Blakeley <mike at blakeley.com>
>> wrote:
>> > > It looks to me like you could use an element-word-query instead of a
>> nested element-query/word-query combination. That will be more efficient.
>> > >
>> > > http://docs.marklogic.com/cts:element-word-query
>> > >
>> > > Accurate results with element-query alone will often require element
>> positions, for example the "element word positions" index. However
>> positions require extra disk space and CPU time, so unless they are
>> necessary I prefer to avoid them.
>> > >
>> > > http://docs.marklogic.com/guide/admin/text_index
>> > >
>> > > -- Mike
>> > >
>> > > On 16 May 2013, at 07:50 , Mauricio Valderrama Fonseca <
>> mauvafo at gmail.com> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I'm facing a problem using the search:resolve function, this is the
>> xquery:
>> > > >
>> > > > xquery version "1.0-ml";
>> > > > import module namespace dls = "http://marklogic.com/xdmp/dls" at
>> "/MarkLogic/dls.xqy";
>> > > > import module namespace search="
>> http://marklogic.com/appservices/search" at
>> "/MarkLogic/appservices/search/search.xqy";
>> > > >
>> > > > let $query := <cts:and-query xmlns:cts="http://marklogic.com/cts">
>> > > >     { dls:documents-query() }
>> > > >     <cts:not-query>
>> > > >       <cts:collection-query>
>> > > >       <cts:uri>error</cts:uri>
>> > > >       <cts:uri>collection-A</cts:uri>
>> > > >       <cts:uri>collection-B</cts:uri>
>> > > >       </cts:collection-query>
>> > > >     </cts:not-query>
>> > > >     <cts:collection-query>
>> > > >       <cts:uri>collection-C</cts:uri>
>> > > >       <cts:uri>collection-D</cts:uri>
>> > > >     </cts:collection-query>
>> > > >     <cts:properties-query>
>> > > >       <cts:element-query>
>> > > >       <cts:element xmlns:ft="http://thecompany/facet
>> ">ft:author</cts:element>
>> > > >       <cts:word-query qtextref="cts:text" xmlns:xs="
>> http://www.w3.org/2001/XMLSchema">
>> > > >         <cts:text>fiber</cts:text>
>> > > >         <cts:option>case-insensitive</cts:option>
>> > > >         <cts:option>diacritic-insensitive</cts:option>
>> > > >         <cts:option>punctuation-insensitive</cts:option>
>> > > >         <cts:option>whitespace-insensitive</cts:option>
>> > > >         <cts:option>stemmed</cts:option>
>> > > >         <cts:option>wildcarded</cts:option>
>> > > >       </cts:word-query>
>> > > >       </cts:element-query>
>> > > >     </cts:properties-query>
>> > > >   </cts:and-query>
>> > > >
>> > > > let $options := <options xmlns="
>> http://marklogic.com/appservices/search">
>> > > >     <return-results>true</return-results>
>> > > >     <constraint name="author">
>> > > >       <range type="xs:string">
>> > > >       <element ns="http://thecompany/facet" name="author"/>
>> > > >       <fragment-scope>properties</fragment-scope>
>> > > >       <facet-option>concurrent</facet-option>
>> > > >       <facet-option>item-order</facet-option>
>> > > >       <facet-option>ascending</facet-option>
>> > > >       <facet-option>limit=6</facet-option>
>> > > >       </range>
>> > > >     </constraint>
>> > > >     <term>
>> > > >       <term-option>case-insensitive</term-option>
>> > > >       <term-option>diacritic-insensitive</term-option>
>> > > >       <term-option>punctuation-insensitive</term-option>
>> > > >       <term-option>whitespace-insensitive</term-option>
>> > > >       <term-option>stemmed</term-option>
>> > > >       <term-option>wildcarded</term-option>
>> > > >     </term>
>> > > >   </options>
>> > > >
>> > > > return search:resolve($query, $options)
>> > > >
>> > > >
>> > > > The result is this one:
>> > > >
>> > > > <search:response total="32" start="1" page-length="10" xmlns=""
>> xmlns:search="http://marklogic.com/appservices/search">
>> > > >   <search:facet name="author">
>> > > >     <search:facet-value name="author 1" count="1">author
>> 1</search:facet-value>
>> > > >     <search:facet-value name="author 2" count="1">author
>> 2</search:facet-value>
>> > > >     <search:facet-value name="author 3" count="1">author
>> 3</search:facet-value>
>> > > >     <search:facet-value name="author 4" count="1">author
>> 4</search:facet-value>
>> > > >     <search:facet-value name="author 5" count="1">author
>> 5</search:facet-value>
>> > > >     <search:facet-value name="author 6" count="1">author
>> 6</search:facet-value>
>> > > >   </search:facet>
>> > > >   <search:metrics>
>> > > >
>> <search:query-resolution-time>PT0.007S</search:query-resolution-time>
>> > > >
>> <search:facet-resolution-time>PT0.002S</search:facet-resolution-time>
>> > > >
>> <search:snippet-resolution-time>PT0S</search:snippet-resolution-time>
>> > > >     <search:total-time>PT0.01S</search:total-time>
>> > > >   </search:metrics>
>> > > > </search:response>
>> > > >
>> > > >
>> > > >
>> > > > In that case the facet is done using a element-range-index:
>> <element ns="http://thecompany/facet" name="author"/> this element is in
>> the properties, but the result is inconsistent too if we use an element in
>> the document. It is important to make notice that the facet and the results
>> are CONSISTENT until we include a <cts:element-query> in the query.
>> > > >
>> > > > Watching the source, I noticed the functions: cts:element-values
>> and cts:frequency are used to create the search:facet element. Then, I
>> tried them directly and had the same result:
>> > > >
>> > > >
>> > > > xquery version "1.0-ml";
>> > > > import module namespace dls = "http://marklogic.com/xdmp/dls" at
>> "/MarkLogic/dls.xqy";
>> > > >
>> > > > let $facet-options := ("type=string", "properties", "descending",
>> "frequency-order", "limit=6")
>> > > > let $query := cts:document-fragment-query(
>> > > >   cts:and-query((
>> > > >     dls:documents-query(),
>> > > >     cts:not-query(cts:collection-query(("error", "collection-A",
>> "collection-B")), 1),
>> > > >     cts:collection-query(("collection-C", "collection-D")),
>> > > >     cts:properties-query(
>> > > >       cts:element-query(
>> > > >         fn:QName("http://thecompany/facet", "author"),
>> > > >         cts:word-query(
>> > > >           "fiber",
>> > > >
>> ("case-insensitive","diacritic-insensitive","punctuation-insensitive","whitespace-insensitive","stemmed","wildcarded","lang=en"),
>> > > >           1
>> > > >         ),
>> > > >         ()
>> > > >       )
>> > > >     )
>> > > >   ), ())
>> > > > )
>> > > > return cts:element-values(fn:QName("http://thecompany/facet","author"),(),($facet-options,"concurrent"),$query,1,
>> ())
>> > > >
>> > > >
>> > > > The response was:
>> > > >
>> > > > author 1
>> > > > author 2
>> > > > author 3
>> > > > author 4
>> > > > author 5
>> > > > author 6
>> > > >
>> > > > Now, executing a cts:search with the same $query value:
>> > > >
>> > > > xquery version "1.0-ml";
>> > > > import module namespace dls = "http://marklogic.com/xdmp/dls" at
>> "/MarkLogic/dls.xqy";
>> > > >
>> > > > let $query := cts:document-fragment-query(
>> > > >   cts:and-query((
>> > > >     dls:documents-query(),
>> > > >     cts:not-query(cts:collection-query(("error", "collection-A",
>> "collection-B")), 1),
>> > > >     cts:collection-query(("collection-C", "collection-D")),
>> > > >     cts:properties-query(
>> > > >       cts:element-query(
>> > > >         fn:QName("http://thecompany/facet", "author"),
>> > > >         cts:word-query(
>> > > >           "fiber",
>> > > >
>> ("case-insensitive","diacritic-insensitive","punctuation-insensitive","whitespace-insensitive","stemmed","wildcarded","lang=en"),
>> > > >           1
>> > > >         ),
>> > > >         ()
>> > > >       )
>> > > >     )
>> > > >   ), ())
>> > > > )
>> > > > return cts:search(fn:collection(),$query)
>> > > >
>> > > >
>> > > > I got an empty sequence. Then the questions are: Why am I getting
>> this inconsistencies when there is an element-query in the search? how can
>> I avoid it?
>> > > >
>> > > > Thanks!
>> > > >
>> > > > _______________________________________________
>> > > > General mailing list
>> > > > General at developer.marklogic.com
>> > > > http://developer.marklogic.com/mailman/listinfo/general
>> > >
>> > > _______________________________________________
>> > > General mailing list
>> > > General at developer.marklogic.com
>> > > http://developer.marklogic.com/mailman/listinfo/general
>> > >
>> > >
>> > >
>> > > --
>> > > Atentamente
>> > > Mauricio Valderrama Fonseca
>> > > _______________________________________________
>> > > General mailing list
>> > > General at developer.marklogic.com
>> > > http://developer.marklogic.com/mailman/listinfo/general
>> >
>> > _______________________________________________
>> > General mailing list
>> > General at developer.marklogic.com
>> > http://developer.marklogic.com/mailman/listinfo/general
>> >
>> >
>> >
>> > --
>> > Atentamente
>> > Mauricio Valderrama Fonseca
>> > _______________________________________________
>> > General mailing list
>> > General at developer.marklogic.com
>> > http://developer.marklogic.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
>
>
> --
> Atentamente
> Mauricio Valderrama Fonseca
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20130517/de5e09b4/attachment-0001.html 


More information about the General mailing list