[MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API
Danny Sokolsky
Danny.Sokolsky at marklogic.com
Thu Nov 10 10:20:17 PST 2011
I recommend moving to 4.2-7 first, if possible.
-Danny
-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Murray, Gregory
Sent: Thursday, November 10, 2011 10:15 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API
Good point -- although we're planning to move to 5 soon, so we were holding off until then.
Thanks,
Greg
On Nov 10, 2011, at 1:13 PM, Danny Sokolsky wrote:
> Also, you should consider upgrading to 4.2-7. There are have been many bug fixes in the last year+ since 4.2-1 was released, including at least one regarding counts.
>
> -Danny
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Damon Feldman
> Sent: Thursday, November 10, 2011 9:55 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API
>
> Thanks, Greg, this helps understand your issue.
>
> Facets report on Fragments (which are identical to documents 95% of the time) rather than individual elements in the searchable expression.
>
> I think your data model may be grouping too many things into a single document. In MarkLogic a documents are more like relational DB rows and less like relational DB tables, so one doc per article is more natural than putting a large group of articles into one document. In your sample data the document seems to contain data for different items such as articles titled " Some Thoughts on Doing Theology " and " Death Threat: I Corinthians."
>
> If you break those up into separate documents many operations may become more natural, including facet counts.
>
> Yours,
> Damon
>
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Murray, Gregory
> Sent: Thursday, November 10, 2011 11:18 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API
>
> Damon,
>
> Whether the document counts are "correct" I don't know. I suppose they are, but they are not what I was expecting. Here's the situation in detail:
>
> When I insert two documents into new, empty test database, like so...
>
> xquery version "1.0-ml";
> let $doc1 :=
> <METS:mets xmlns:METS="http://www.loc.gov/METS/" xmlns:dc="http://purl.org/dc/elements/1.1/">
> <!-- metadata section follows -->
> <METS:metsHdr>
> <!-- ... -->
> </METS:metsHdr>
> <!-- data for journal as a whole -->
> <METS:dmdSec ID="dmd001">
> <METS:mdWrap MIMETYPE="text/xml">
> <METS:xmlData>
> <dc:title>The Princeton Seminary Bulletin 28:3</dc:title>
> <dc:date>2007</dc:date>
> <dc:type>Journal</dc:type>
> <dc:source>The Princeton Seminary Bulletin</dc:source>
> </METS:xmlData>
> </METS:mdWrap>
> </METS:dmdSec>
> <!-- data for first article follows -->
> <METS:dmdSec ID="dmd002">
> <METS:mdWrap MIMETYPE="text/xml">
> <METS:xmlData>
> <dc:title>Some Thoughts on Doing Theology in Public</dc:title>
> <dc:creator>John R. Bowlin</dc:creator>
> <dc:date>2007</dc:date>
> <dc:type>Article</dc:type>
> <dc:source>The Princeton Seminary Bulletin, v. 28, no. 3 (Nov. 2007)</dc:source>
> <dc:coverage>235-243</dc:coverage>
> </METS:xmlData>
> </METS:mdWrap>
> </METS:dmdSec>
> <!-- data for second article follows -->
> <METS:dmdSec ID="dmd003">
> <METS:mdWrap MIMETYPE="text/xml">
> <METS:xmlData>
> <dc:title>Death Threat: I Corinthians 11:17-34a</dc:title>
> <dc:creator>Luke A. Powery</dc:creator>
> <dc:date>2007</dc:date>
> <dc:type>Article</dc:type>
> <dc:source>The Princeton Seminary Bulletin, v. 28, no. 3 (Nov. 2007)</dc:source>
> <dc:coverage>244-250</dc:coverage>
> </METS:xmlData>
> </METS:mdWrap>
> </METS:dmdSec>
> </METS:mets>
>
> let $doc2 :=
> <METS:mets xmlns:METS="http://www.loc.gov/METS/" xmlns:dc="http://purl.org/dc/elements/1.1/">
> <!-- metadata section follows -->
> <METS:metsHdr>
> <!-- ... -->
> </METS:metsHdr>
> <!-- data for journal as a whole -->
> <METS:dmdSec ID="dmd001">
> <METS:mdWrap MIMETYPE="text/xml">
> <METS:xmlData>
> <dc:title>The Princeton Seminary Bulletin 28:2</dc:title>
> <dc:date>2007</dc:date>
> <dc:type>Journal</dc:type>
> <dc:source>The Princeton Seminary Bulletin</dc:source>
> </METS:xmlData>
> </METS:mdWrap>
> </METS:dmdSec>
> <!-- data for first article follows -->
> <METS:dmdSec ID="dmd002">
> <METS:mdWrap MIMETYPE="text/xml">
> <METS:xmlData>
> <dc:title>The Unexpected Future</dc:title>
> <dc:creator>Iain R. Torrance</dc:creator>
> <dc:date>2007</dc:date>
> <dc:type>Article</dc:type>
> <dc:source>The Princeton Seminary Bulletin 28:2 (2007)</dc:source>
> <dc:coverage>119-122</dc:coverage>
> </METS:xmlData>
> </METS:mdWrap>
> </METS:dmdSec>
> <!-- data for second article follows -->
> <METS:dmdSec ID="dmd003">
> <METS:mdWrap MIMETYPE="text/xml">
> <METS:xmlData>
> <dc:title>The Excellence of Ministry</dc:title>
> <dc:creator>Daniel L. Migliore</dc:creator>
> <dc:date>2007</dc:date>
> <dc:type>Article</dc:type>
> <dc:source>The Princeton Seminary Bulletin 28:2 (2007)</dc:source>
> <dc:coverage>123-128</dc:coverage>
> </METS:xmlData>
> </METS:mdWrap>
> </METS:dmdSec>
> </METS:mets>
>
> return
> (xdmp:document-insert("/doc1.xml", $doc1),
> xdmp:document-insert("/doc2.xml", $doc2))
>
> ... and then run the following query against that database...
>
> xquery version "1.0-ml";
> import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";
>
> let $options :=
> <options xmlns="http://marklogic.com/appservices/search">
> <searchable-expression xmlns:mets="http://www.loc.gov/METS/">
> /mets:mets/mets:dmdSec
> </searchable-expression>
> <constraint name="date-bucket" xmlns="http://marklogic.com/appservices/search">
> <range type="xs:gYear">
> <element ns="http://purl.org/dc/elements/1.1/" name="date"/>
> <bucket name="any-1800" lt="1801">-1800</bucket>
> <bucket name="1801-1810" ge="1801" lt="1811">1801-1810</bucket>
> <bucket name="1811-1820" ge="1811" lt="1821">1811-1820</bucket>
> <bucket name="1821-1830" ge="1821" lt="1831">1821-1830</bucket>
> <bucket name="1831-1840" ge="1831" lt="1841">1831-1840</bucket>
> <bucket name="1841-1850" ge="1841" lt="1851">1841-1850</bucket>
> <bucket name="1851-1860" ge="1851" lt="1861">1851-1860</bucket>
> <bucket name="1861-1870" ge="1861" lt="1871">1861-1870</bucket>
> <bucket name="1871-1880" ge="1871" lt="1881">1871-1880</bucket>
> <bucket name="1881-1890" ge="1881" lt="1891">1881-1890</bucket>
> <bucket name="1891-1900" ge="1891" lt="1901">1891-1900</bucket>
> <bucket name="1901-1910" ge="1901" lt="1911">1901-1910</bucket>
> <bucket name="1911-1920" ge="1911" lt="1921">1911-1920</bucket>
> <bucket name="1921-1930" ge="1921" lt="1931">1921-1930</bucket>
> <bucket name="1931-1940" ge="1931" lt="1941">1931-1940</bucket>
> <bucket name="1941-1950" ge="1941" lt="1951">1941-1950</bucket>
> <bucket name="1951-1960" ge="1951" lt="1961">1951-1960</bucket>
> <bucket name="1961-1970" ge="1961" lt="1971">1961-1970</bucket>
> <bucket name="1971-1980" ge="1971" lt="1981">1971-1980</bucket>
> <bucket name="1981-1990" ge="1981" lt="1991">1981-1990</bucket>
> <bucket name="1991-2000" ge="1991" lt="2001">1991-2000</bucket>
> <bucket name="2001-any" ge="2001">2001-</bucket>
> </range>
> </constraint>
> <constraint name="type" xmlns="http://marklogic.com/appservices/search">
> <range type="xs:string">
> <element ns="http://purl.org/dc/elements/1.1/" name="type"/>
> <facet-option>frequency-order</facet-option>
> <facet-option>descending</facet-option>
> </range>
> </constraint>
> <constraint name="source" xmlns="http://marklogic.com/appservices/search">
> <range type="xs:string">
> <element ns="http://purl.org/dc/elements/1.1/" name="source"/>
> <facet-option>frequency-order</facet-option>
> <facet-option>descending</facet-option>
> </range>
> </constraint>
> </options>
> return search:search("", $options)
>
> ... I get this response:
>
> <search:response total="2" start="1" page-length="10" xmlns:search="http://marklogic.com/appservices/search">
> <search:result index="1" uri="/doc2.xml" path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[1]" score="0" confidence="0" fitness="0">
> <search:snippet>
> <search:match path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[1]">The Princeton Seminary Bulletin 28:2 2007 Journal The Princeton Seminary Bulletin</search:match>
> </search:snippet>
> </search:result>
> <search:result index="2" uri="/doc2.xml" path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[2]" score="0" confidence="0" fitness="0">
> <search:snippet>
> <search:match path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[2]">The Unexpected Future Iain R. Torrance 2007 Article The Princeton Seminary Bulletin 28:2 (2007) 119-122</search:match>
> </search:snippet>
> </search:result>
> <search:result index="3" uri="/doc2.xml" path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[3]" score="0" confidence="0" fitness="0">
> <search:snippet>
> <search:match path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[3]">The Excellence of Ministry Daniel L. Migliore 2007 Article The Princeton Seminary Bulletin 28:2 (2007) 123-128</search:match>
> </search:snippet>
> </search:result>
> <search:result index="4" uri="/doc1.xml" path="fn:doc("/doc1.xml")/*:mets/*:dmdSec[1]" score="0" confidence="0" fitness="0">
> <search:snippet>
> <search:match path="fn:doc("/doc1.xml")/*:mets/*:dmdSec[1]">The Princeton Seminary Bulletin 28:3 2007 Journal The Princeton Seminary Bulletin</search:match>
> </search:snippet>
> </search:result>
> <search:result index="5" uri="/doc1.xml" path="fn:doc("/doc1.xml")/*:mets/*:dmdSec[2]" score="0" confidence="0" fitness="0">
> <search:snippet>
> <search:match path="fn:doc("/doc1.xml")/*:mets/*:dmdSec[2]">Some Thoughts on Doing Theology in Public John R. Bowlin 2007 Article The Princeton Seminary Bulletin, v. 28, no. 3 (Nov. 2007) 235-243</search:match>
> </search:snippet>
> </search:result>
> <search:result index="6" uri="/doc1.xml" path="fn:doc("/doc1.xml")/*:mets/*:dmdSec[3]" score="0" confidence="0" fitness="0">
> <search:snippet>
> <search:match path="fn:doc("/doc1.xml")/*:mets/*:dmdSec[3]">Death Threat: I Corinthians 11:17-34a Luke A. Powery 2007 Article The Princeton Seminary Bulletin, v. 28, no. 3 (Nov. 2007) 244-250</search:match>
> </search:snippet>
> </search:result>
> <search:facet name="date-bucket">
> <search:facet-value name="2001-any" count="2">2001-</search:facet-value>
> </search:facet>
> <search:facet name="type">
> <search:facet-value name="Article" count="2">Article</search:facet-value>
> <search:facet-value name="Journal" count="2">Journal</search:facet-value>
> </search:facet>
> <search:facet name="source">
> <search:facet-value name="The Princeton Seminary Bulletin" count="2">The Princeton Seminary Bulletin</search:facet-value>
> <search:facet-value name="The Princeton Seminary Bulletin 28:2 (2007)" count="1">The Princeton Seminary Bulletin 28:2 (2007)</search:facet-value>
> <search:facet-value name="The Princeton Seminary Bulletin, v. 28, no. 3 (Nov. 2007)" count="1">The Princeton Seminary Bulletin, v. 28, no. 3 (Nov. 2007)</search:facet-value>
> </search:facet>
> <search:qtext/>
> <search:metrics>
> <search:query-resolution-time>PT0.016S</search:query-resolution-time>
> <search:facet-resolution-time>PT0.016S</search:facet-resolution-time>
> <search:snippet-resolution-time>PT0S</search:snippet-resolution-time>
> <search:total-time>PT0.032S</search:total-time>
> </search:metrics>
> </search:response>
>
> To me, there are two things that are unexpected about this response. (1) The @total is 2, which is the number of documents, not the number of search results, which is 6. (2) The facet counts have the same problem: they correspond to the number of documents, not the search results.
>
> Similarly, if I run the same query except changing the qtext so it's not empty...
>
> return search:search("future", $options)
>
> ... I get this response:
>
> <search:response total="1" start="1" page-length="10" xmlns:search="http://marklogic.com/appservices/search">
> <search:result index="1" uri="/doc2.xml" path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[2]" score="104" confidence="0.669882" fitness="0.669882">
> <search:snippet>
> <search:match path="fn:doc("/doc2.xml")/*:mets/*:dmdSec[2]/*:mdWrap/*:xmlData/*:title">The Unexpected <search:highlight>Future</search:highlight></search:match>
> </search:snippet>
> </search:result>
> <search:facet name="date-bucket">
> <search:facet-value name="2001-any" count="1">2001-</search:facet-value>
> </search:facet>
> <search:facet name="type">
> <search:facet-value name="Article" count="1">Article</search:facet-value>
> <search:facet-value name="Journal" count="1">Journal</search:facet-value>
> </search:facet>
> <search:facet name="source">
> <search:facet-value name="The Princeton Seminary Bulletin" count="1">The Princeton Seminary Bulletin</search:facet-value>
> <search:facet-value name="The Princeton Seminary Bulletin 28:2 (2007)" count="1">The Princeton Seminary Bulletin 28:2 (2007)</search:facet-value>
> </search:facet>
> <search:qtext>future</search:qtext>
> <search:metrics>
> <search:query-resolution-time>PT0.015S</search:query-resolution-time>
> <search:facet-resolution-time>PT0.031S</search:facet-resolution-time>
> <search:snippet-resolution-time>PT0S</search:snippet-resolution-time>
> <search:total-time>PT0.046S</search:total-time>
> </search:metrics>
> </search:response>
>
> There is only one search result, so I would expect each facet to contain only one <search:facet-value>, but again, the facets are actually based on the entire document that the search result came from.
>
> Many thanks,
> Greg
>
>
> On Nov 10, 2011, at 8:53 AM, Damon Feldman wrote:
>
>> Greg,
>>
>> Are the overall document counts correct? The total count comes from cts:remainder() or xdmp:estimate() under the covers which are an index-only operations like facet counts. It might help if you post a small sample of the form
>>
>> xdmp:document-insert(uri1, doc1), Xdmp:document-insert(uri2, doc2)
>> ; (: transaction separator :)
>>
>> let $options := ...
>> return search:search(...)
>>
>> that shows the wrong count so we understand the type of searchable expression and facets you are having trouble with.
>>
>> Yours,
>> Damon
>>
>> -----Original Message-----
>> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Murray, Gregory
>> Sent: Thursday, November 10, 2011 8:45 AM
>> To: General MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API
>>
>> I should have mentioned that I'm using 4.2-1
>>
>> Any suggestions greatly appreciated.
>>
>> Thanks,
>> Greg
>>
>> On Nov 9, 2011, at 5:21 PM, Murray, Gregory wrote:
>>
>>> I'm having a similar problem with facet counts when using <searchable-expression>. After reading this thread, I'm afraid I still don't understand how to circumvent the problem. When using <searchable-expression>, it appears that the search results are constrained to that expression whereas the facet counts are not. Is there a facet-related option to similarly constrain a facet to an XPath expression? I've seen references to the "fragment-frequency" option, but appears to have no effect in this context.
>>>
>>> Many thanks,
>>> Greg
>>>
>>> Gregory Murray
>>> Digital Library Application Developer
>>> Princeton Theological Seminary
>>>
>>>
>>> On Oct 18, 2011, at 8:30 PM, Michael Blakeley wrote:
>>>
>>>> Will, if I can jump in.... I think your idea of using different QNames is the right way to look at it.
>>>>
>>>> Facets are built from range indexes, and range indexes contain lists of values and fragment ids for a given QName. So if the query matches the fragment, the facet will show all the values in that fragment. In your case the fragment is the entire document, so you will see all the values in the matching documents, whether they occur under /doc or under /doc//cite. Now, you *could* create a fragment root on 'cite', but I think that would be counter-productive. It's better to use different QNames and have different range indexes.
>>>>
>>>> So I think what you'd want to do is simply arrange for a different set of search options for doc vs cite, including both searchable expression and constraints. Testing for that could be as simple as a call to cts:contains($user-search, 'select:cite') before you call search:search(). Or if that might generate false positives, you could search:parse the user query and then look at the cts:query XML to see whether or not the parser found a select:cite term. If it did, then you can switch to the correct options before calling search:resolve.
>>>>
>>>> -- Mike
>>>>
>>>> On 18 Oct 2011, at 17:14 , Will Thompson wrote:
>>>>
>>>>> Micah,
>>>>>
>>>>> I think I may have explained poorly. This is essentially what I'm doing -- Docs are, generally, like this:
>>>>>
>>>>> <doc>
>>>>> <search-meta/>
>>>>> <p>...<cite><search-meta/></cite>...</p>
>>>>> <section>
>>>>> <p>...<cite><search-meta/></cite>...</p>
>>>>> ...
>>>>> </section>
>>>>> </doc>
>>>>>
>>>>> Searches operate over //doc by default, but if you add the operator/state "select:cite" it changes the searchable expression to //cite. The results are correct, but the problem is that the facet counts appear to be for *both* doc and cite metadata, and thus do not change when toggling searchable-expressions via operator/state.
>>>>>
>>>>> This won't make any sense to our users, who will expect the facet counts to match what they think they're searching for.
>>>>>
>>>>> -W
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Micah Dubinko
>>>>> Sent: Tuesday, October 18, 2011 6:56 PM
>>>>> To: General MarkLogic Developer Discussion
>>>>> Subject: Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API
>>>>>
>>>>> Hi Will,
>>>>>
>>>>> Everything you want to search exists in document fragments (not properties) right?
>>>>>
>>>>> What would happen if you switched in a different searchable-expression via operator and state? The combined query is taken into account by faceting, but the searchable-expression is not.
>>>>>
>>>>> -m
>>>>>
>>>>>
>>>>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote:
>>>>>
>>>>>> Our app has typically searched only document-type elements, but I recently added metadata to citation elements (contained within and scattered about document elements) so that they can be optionally searched using an operator. i.e.: "term1 term2 select:citations" The operator changes the searchable-expression and transform-results to search only citation elements and return citation-specific snippets.
>>>>>>
>>>>>> However, I need the facet counts to reflect the search being performed - i.e.: only show estimates for document element direct-child metadata during normal search, and only for citations when that is toggled using the operator.
>>>>>>
>>>>>> My first thought was to use different names or namespace for the citation metadata and have the operator toggle a separate set of constraints associated with those names. But constraints are not supported children of search:state under search:operator.
>>>>>>
>>>>>> Any ideas on how to accomplish this with Search API?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> General at developer.marklogic.com
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General at developer.marklogic.com
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General at developer.marklogic.com
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> General at developer.marklogic.com
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
More information about the General
mailing list