[MarkLogic Dev General] search:search() - why only 1 search match per XML doc

Danny Sinang d.sinang at gmail.com
Fri Mar 23 10:14:53 PDT 2012


Hi Colleen,

By "hit", do you mean highlights ?

If so, how come I only see 1 highlight within the match within the htmlBody
element ?

Regards,
Danny

On Fri, Mar 23, 2012 at 12:58 PM, Colleen Whitney <
Colleen.Whitney at marklogic.com> wrote:

> Yes, from the Search API's perspective, your document is all one big node,
> so there will never be more than one match (but that match should contain
> more than one hit).
>
> We also limit the number of hits per match (internally, not configurable),
> because snippeting tends to get very expensive on very large text documents
> with repeated terms.
>
> So I think hooking in a custom snippeting function (there are good
> instructions for how to do that in the Search Developer's Guide) is
> probably your best bet, unless you have the luxury of changing your content
> model.
>
> --Colleen
>
> ________________________________________
> From: general-bounces at developer.marklogic.com [
> general-bounces at developer.marklogic.com] On Behalf Of Danny Sinang [
> d.sinang at gmail.com]
> Sent: Friday, March 23, 2012 9:46 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] search:search() - why only 1 search
> match per XML doc
>
> Hi Will,
>
> Someone else wrote our search module (that uses search:search) and we
> discovered today that it returns only 1 search match even if the word being
> looked up occurs several times in the 'htmlBody' element of a particular
> XML document.
>
> We were hoping the search:search would return all matches within the
> 'htmlBody' element.
>
> But it looks like search:search won't do so because 'htmlBody' contains
> escaped html. If it were unescaped, then search:search would return all the
> matches for the word we're looking for.
>
> I don't know if search:search can be told to treat the contents of
> 'htmlBody' as unescaped.
>
> So what I'm trying to do is get search:search to return all the matches
> within the escaped string content of 'htmlBody'.
>
> Regards,
> Danny
>
> On Fri, Mar 23, 2012 at 12:37 PM, Will Thompson <
> wthompson at jonesmcclure.com<mailto:wthompson at jonesmcclure.com>> wrote:
> Danny,
>
> Without more details I’m not sure what you’re trying to do exactly, but it
> sounds like you may need to write your own snippet module.
>
> -Will
>
>
> From: general-bounces at developer.marklogic.com<mailto:
> general-bounces at developer.marklogic.com> [mailto:
> general-bounces at developer.marklogic.com<mailto:
> general-bounces at developer.marklogic.com>] On Behalf Of Danny Sinang
> Sent: Friday, March 23, 2012 8:22 AM
> To: general
> Subject: [MarkLogic Dev General] search:search() - why only 1 search match
> per XML doc
>
> Hello.
>
> Am trying to search for the word 'populations' in an XML doc which
> mentions that word around 5 times in its htmlBody element.
>
> search:search() returns only the first occurrence of that word in that
> element.
>
> Is there an option or way to make search:search return matches for the
> other occurrences of population ?
>
> Note that the contents of the htmlBody element (shown below) is a string.
>
> Regards,
> Danny
>
>
> <htmlBody>&lt;body xmlns="http://www.w3.org/1999/xhtml"&gt;
>
>  &lt;div&gt;
>
>  &lt;div&gt;
>
>   &lt;h5&gt;Control of Bacterial Populations&lt;/h5&gt;
>
>   &lt;p class="Indent00" id="xpp-2014582732321794086-1"&gt;The diseases
> and many kinds of environmental problems caused by bacteria are actually
> population control problems. Small numbers of bacteria cause little harm.
> However, when the population increases, their negative effects are
> multiplied. Despite large investments of time and money, scientists have
> found it difficult to control bacterial populations. Three factors operate
> in favor of the bacteria: their reproductive rate, their ability to form
> resistant stages, and their ability to mutate and produce strains that
> resist antibiotics and other control agents.&lt;/p&gt;
>
>   &lt;p class="Indent01" id="xpp-2014582732321794086-2"&gt;Under ideal
> conditions, some bacteria can grow and divide every 20 minutes. If one
> bacterial cell and all its offspring were to reproduce at this ideal rate,
> in 48 hours there would be 2.2 &amp;times; 10 43 cells. In reality,
> bacteria cannot achieve such incredibly large populations, because ...
> </htmlBody>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com<mailto:General at developer.marklogic.com>
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120323/2876edb1/attachment.html 


More information about the General mailing list