[MarkLogic Dev General] cts:highlight performance
Ryan Grimm
rgrimm at marklogic.com
Wed Jun 13 09:25:44 PDT 2007
Peter Hickman wrote:
> Ryan Grimm wrote:
>> Hi Peter,
>>
>> A couple questions for you.
>>
>> This document that takes 40 seconds to highlight, how large is it?
>
> It's only 1.5M, however the majority of the other documents are in the
> 10-100k region so it is large in comparison.
Ok, that's a good size but not out of this world. However, I am
surprised that it is taking 40 seconds to highlight a 1.5MB document.
>> Have you configured a fragmentation policy?
>
> Our fragmentation policy is to keep each article in it's own fragment so
> that we can use the estimate function with some degree of success. Or
> have I misunderstood the effects of fragmentation on the calculation of
> the estimate. As I understand it fragmenting an article will result in
> that article being counted into the estimate once for each time that a
> fragment matches. How would fragmenting the article speed up cts:highlight?
You're view of how fragmentation effects xdmp:estimate() is correct and
fragmentation probably won't speed up cts:highlight(). I was curious
what your fragmentation policy was so I could get a better understanding
of how you've got things setup.
I would try my previous suggestion of using cts:search() to narrow down
what content you want to highlight. Here's a very basic example of how
you could do this if your content was XHTML.
declare namespace xhtml = "http://www.w3.org/1999/xhtml"
let $query := cts:word-query("find me")
for $result in cts:search(/xhtml:html, $query)[1 to 10]
let $para := cts:search(doc(base-uri($result))//xhtml:p, $query)[1]
return cts:highlight($para, $query, <strong>{ $cts:text }</strong>)
Hope that helps.
--Ryan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4500 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20070613/8afb4daf/smime.bin
More information about the General
mailing list