[MarkLogic Dev General] cts:highlight performance

Ryan Grimm rgrimm at marklogic.com
Wed Jun 13 09:25:44 PDT 2007


Peter Hickman wrote:
> Ryan Grimm wrote:
>> Hi Peter,
>>
>> A couple questions for you.
>>
>> This document that takes 40 seconds to highlight, how large is it?
> 
> It's only 1.5M, however the majority of the other documents are in the 
> 10-100k region so it is large in comparison.

Ok, that's a good size but not out of this world.  However, I am 
surprised that it is taking 40 seconds to highlight a 1.5MB document.

>> Have you configured a fragmentation policy?
> 
> Our fragmentation policy is to keep each article in it's own fragment so 
> that we can use the estimate function with some degree of success. Or 
> have I misunderstood the effects of fragmentation on the calculation of 
> the estimate. As I understand it fragmenting an article will result in 
> that article being counted into the estimate once for each time that a 
> fragment matches. How would fragmenting the article speed up cts:highlight?

You're view of how fragmentation effects xdmp:estimate() is correct and 
fragmentation probably won't speed up cts:highlight().  I was curious 
what your fragmentation policy was so I could get a better understanding 
of how you've got things setup.

I would try my previous suggestion of using cts:search() to narrow down 
what content you want to highlight.  Here's a very basic example of how 
you could do this if your content was XHTML.

declare namespace xhtml = "http://www.w3.org/1999/xhtml"

let $query := cts:word-query("find me")
for $result in cts:search(/xhtml:html, $query)[1 to 10]
let $para := cts:search(doc(base-uri($result))//xhtml:p, $query)[1]
return cts:highlight($para, $query, <strong>{ $cts:text }</strong>)


Hope that helps.

--Ryan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4500 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20070613/8afb4daf/smime.bin


More information about the General mailing list