[MarkLogic Dev General] Is there a score threshold?

Peter Hickman peter.hickman at semantico.com
Fri Jun 29 08:31:30 PDT 2007


Thanks for the response.

This seems to be odder the more I look at it. Firstly here are the 
queries that are used:

(:
    The "indian subcontinent" query
:)

declare namespace dc = "http://purl.org/dc/elements/1.1/"
declare namespace opp = "http://opp.oup.com/opp"
declare namespace grove = "http://www.grovecms.com/local/articles.dtd"

default element namespace = "http://opp.oup.com/opp"

let $query :=
  cts:and-query
  (
   (
     cts:element-query(xs:QName("opp:body"),"indian"),
     cts:element-query(xs:QName("opp:body"),"subcontinent")
    )
  )
return for $doc at $index in
 (
  (
   cts:search
   (
    /doc[not(opp:meta/opp:headword-matches/opp:self/@status = 
'secondary') ],
    $query
   )
  ))
return <result at="{$index}">{base-uri($doc)}</result>

This returns 379 results with the target document at #1. However the 
following query (just the query part) returns 27 results without the 
target document:

(:
    The "indian subcontinent bronze" query
:)

let $query :=
  cts:and-query
  (
   (
    cts:element-query(xs:QName("opp:body"),"indian"),
    cts:element-query(xs:QName("opp:body"),"subcontinent"),
    cts:element-query(xs:QName("opp:body"),"bronze")
   )
  )

At this point I started to look into "bronze" itself. In the target 
document the term occurs nearly 100 times, quite a few times in grove:P 
elements so I search for this:

(:
    Bronze in grove:P
:)

let $query := cts:element-query(xs:QName("grove:P"),"bronze")

I get 2,484 unique documents returned with the target at #304. The 
grove:P element is itself the descendant of the opp:body element so I 
search for "bronze" in opp:body.

(:
    Bronze in grove:P
:)

let $query := cts:element-query(xs:QName("opp:body"),"bronze")

Were I get 2,154 unique documents which does not include the target. The 
thing to note here is that all documents have a opp:body (all those 
grove:P matches were descendants of opp:body elements) and yet we get 
fewer matches! This alone makes no sense, and this includes the fact 
that the 2,154 documents returned by the opp:body query includes some 
documents that were not in the grove:P results list. The total number of 
unique documents is 2,836.

I have checked the target document and the grove:P element with "bronze" 
in it is definitely a descendant of opp:body and appears to be no 
different to the documents that MarkLogic did return.

Unfortunately it would seem that once again I come up against a problem 
on a Friday before I go off for a weeks holiday :) Have a happy July the 
4th, I suspect that this will still be here when I get back.

-- 
Peter Hickman.

Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 358223
f: 01273 723232
e: peter.hickman at semantico.com
w: www.semantico.com



More information about the General mailing list