[MarkLogic Dev General] search:search() - why only 1 search match per XML doc
d.sinang at gmail.com
Fri Mar 23 10:14:53 PDT 2012
By "hit", do you mean highlights ?
If so, how come I only see 1 highlight within the match within the htmlBody
On Fri, Mar 23, 2012 at 12:58 PM, Colleen Whitney <
Colleen.Whitney at marklogic.com> wrote:
> Yes, from the Search API's perspective, your document is all one big node,
> so there will never be more than one match (but that match should contain
> more than one hit).
> We also limit the number of hits per match (internally, not configurable),
> because snippeting tends to get very expensive on very large text documents
> with repeated terms.
> So I think hooking in a custom snippeting function (there are good
> instructions for how to do that in the Search Developer's Guide) is
> probably your best bet, unless you have the luxury of changing your content
> From: general-bounces at developer.marklogic.com [
> general-bounces at developer.marklogic.com] On Behalf Of Danny Sinang [
> d.sinang at gmail.com]
> Sent: Friday, March 23, 2012 9:46 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] search:search() - why only 1 search
> match per XML doc
> Hi Will,
> Someone else wrote our search module (that uses search:search) and we
> discovered today that it returns only 1 search match even if the word being
> looked up occurs several times in the 'htmlBody' element of a particular
> XML document.
> We were hoping the search:search would return all matches within the
> 'htmlBody' element.
> But it looks like search:search won't do so because 'htmlBody' contains
> escaped html. If it were unescaped, then search:search would return all the
> matches for the word we're looking for.
> I don't know if search:search can be told to treat the contents of
> 'htmlBody' as unescaped.
> So what I'm trying to do is get search:search to return all the matches
> within the escaped string content of 'htmlBody'.
> On Fri, Mar 23, 2012 at 12:37 PM, Will Thompson <
> wthompson at jonesmcclure.com<mailto:wthompson at jonesmcclure.com>> wrote:
> Without more details I’m not sure what you’re trying to do exactly, but it
> sounds like you may need to write your own snippet module.
> From: general-bounces at developer.marklogic.com<mailto:
> general-bounces at developer.marklogic.com> [mailto:
> general-bounces at developer.marklogic.com<mailto:
> general-bounces at developer.marklogic.com>] On Behalf Of Danny Sinang
> Sent: Friday, March 23, 2012 8:22 AM
> To: general
> Subject: [MarkLogic Dev General] search:search() - why only 1 search match
> per XML doc
> Am trying to search for the word 'populations' in an XML doc which
> mentions that word around 5 times in its htmlBody element.
> search:search() returns only the first occurrence of that word in that
> Is there an option or way to make search:search return matches for the
> other occurrences of population ?
> Note that the contents of the htmlBody element (shown below) is a string.
> <htmlBody><body xmlns="http://www.w3.org/1999/xhtml">
> <h5>Control of Bacterial Populations</h5>
> <p class="Indent00" id="xpp-2014582732321794086-1">The diseases
> and many kinds of environmental problems caused by bacteria are actually
> population control problems. Small numbers of bacteria cause little harm.
> However, when the population increases, their negative effects are
> multiplied. Despite large investments of time and money, scientists have
> found it difficult to control bacterial populations. Three factors operate
> in favor of the bacteria: their reproductive rate, their ability to form
> resistant stages, and their ability to mutate and produce strains that
> resist antibiotics and other control agents.</p>
> <p class="Indent01" id="xpp-2014582732321794086-2">Under ideal
> conditions, some bacteria can grow and divide every 20 minutes. If one
> bacterial cell and all its offspring were to reproduce at this ideal rate,
> in 48 hours there would be 2.2 &times; 10 43 cells. In reality,
> bacteria cannot achieve such incredibly large populations, because ...
> General mailing list
> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
> General mailing list
> General at developer.marklogic.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General