[MarkLogic Dev General] Search API results: sorting empty last

Murray, Gregory gregory.murray at ptsem.edu
Fri Mar 23 06:56:58 PDT 2012


Colleen,

You were right. Using the default collation didn't help. The real problem was that some of our documents had empty <name/> elements. If <name> is present but empty, the empty ones appear first in the sorted list. If <name> is absent, they appear last as desired.

Personally I think that ideally the Search API would sort the two situations identically, but at least now I know there's a rationale to it. I suppose it's analogous to a relational database treating an empty string value differently than NULL.

Thanks,
Greg


On Feb 21, 2012, at 11:57 AM, Colleen Whitney wrote:

> I don't think the fact that it's non-default is the issue. My off the cuff guess is that it has to do with whitespace significance with this particular collation. You could do some small scale experiments with collations to see if you can arrive at one that yields the desired result. I'll file an RFE on controlling empty in sort order, meanwhile. 
> 
> Sent from my iPhone
> 
> On Feb 21, 2012, at 8:19 AM, "Murray, Gregory" <gregory.murray at ptsem.edu> wrote:
> 
>> Colleen,
>> 
>> Thanks for the info. It looks like the problem is indeed using a non-default collation. Here's a (real but greatly simplified for illustration) example document:
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <doc xmlns="http://digital.library.ptsem.edu/ia">
>> <metadata>
>>   <id>apostlescreedlor00geeg</id>
>>   <name>Geegby, W. B.</name>
>>   <title>The Apostles' Creed and the Lord's prayer in the Kru dialect</title>
>>   <date>1840</date>
>>   <language code="eng">English</language>
>>   <class>Old Testament</class>
>> </metadata>
>> </doc>
>> 
>> I have range element indexes on date (gYear), language (string with default collation), and name (string with collation http://marklogic.com/collation//AS/T0020).
>> 
>> In the <options> to search:search(), my (again, real but simplified for illustration) sort operator looks like this:
>> 
>>     <operator name="sort">
>>       <state name="name">
>>         <sort-order type="xs:string" collation="http://marklogic.com/collation//AS/T0020">
>>           <element ns="http://digital.library.ptsem.edu/ia" name="name"/>
>>         </sort-order>
>>       </state>
>>       <state name="date">
>>         <sort-order type="xs:gYear">
>>           <element ns="http://digital.library.ptsem.edu/ia" name="date"/>
>>         </sort-order>
>>       </state>
>>       <state name="language">
>>         <sort-order type="xs:string">
>>           <element ns="http://digital.library.ptsem.edu/ia" name="language"/>
>>         </sort-order>
>>       </state>
>>     </operator>
>> 
>> If I sort by date (if the qtext passed to search:search() includes "sort:date") then I get empty last (documents with no <date> element occur last in the search results), as I would expect and prefer. Similarly, if I sort by language I get empty last. But if I sort by name, I get empty first.
>> 
>> Any way around this other than switching to the default collation for <name>?
>> 
>> Many thanks,
>> Greg
>> 
>> 
>> On Feb 20, 2012, at 7:47 PM, Colleen Whitney wrote:
>> 
>>> Hi Greg,
>>> 
>>> The Search API doesn't have support as yet for specifying "empty least" or "empty greatest" on sorting.  
>>> 
>>> You can specify a *direction* as an attribute on the <sort-order> element (direction="ascending"  or direction="descending").  When descending, the server defaults to empty least, and when ascending it defaults to empty greatest, so I think it *should* actually default to them coming out last.  But collation is important here, and it's possible that the collation you're using here could be involved.  If you have a very small set of test "name" elements you can share, along with how they're sorting, it might be helpful in understanding what you're seeing.
>>> 
>>> --Colleen
>>> 
>>> ________________________________________
>>> From: general-bounces at developer.marklogic.com [general-bounces at developer.marklogic.com] On Behalf Of Murray, Gregory [gregory.murray at ptsem.edu]
>>> Sent: Monday, February 20, 2012 10:37 AM
>>> To: MarkLogic Developer Discussion
>>> Subject: [MarkLogic Dev General] Search API results: sorting empty last
>>> 
>>> When using the Search API and using a sort operator such as this:
>>> 
>>>    <search:operator name="sort">
>>>      <search:state name="name">
>>>        <search:sort-order type="xs:string" collation="http://marklogic.com/collation//AS/T0020">
>>>          <search:element ns="http://example.com/ns" name="name"/>
>>>        </search:sort-order>
>>>      </search:state>
>>>      <!-- ... -->
>>>    </search:operator>
>>> 
>>> is there a way to specify that documents with a missing or empty <name> element should occur *last* in the sorted search results? It appears that empty values occur at the top of the search results by default.
>>> 
>>> Thanks,
>>> Greg
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list