[MarkLogic Dev General] search:search() - why only 1 search match per XML doc

Colleen Whitney Colleen.Whitney at marklogic.com
Fri Mar 23 10:51:43 PDT 2012


That's right.

________________________________________
From: general-bounces at developer.marklogic.com [general-bounces at developer.marklogic.com] On Behalf Of Will Thompson [wthompson at jonesmcclure.com]
Sent: Friday, March 23, 2012 10:49 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:search() - why only 1 search match per XML doc

Yes, I think that’s what she means. Also I think the built in snippeting uses a MinWindow type algorithm to create snippets, so if your hits/highlights are too far apart, they don’t make it into the snippet.

-Will

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny Sinang
Sent: Friday, March 23, 2012 10:15 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:search() - why only 1 search match per XML doc

Hi Colleen,

By "hit", do you mean highlights ?

If so, how come I only see 1 highlight within the match within the htmlBody element ?

Regards,
Danny

On Fri, Mar 23, 2012 at 12:58 PM, Colleen Whitney <Colleen.Whitney at marklogic.com<mailto:Colleen.Whitney at marklogic.com>> wrote:
Yes, from the Search API's perspective, your document is all one big node, so there will never be more than one match (but that match should contain more than one hit).

We also limit the number of hits per match (internally, not configurable), because snippeting tends to get very expensive on very large text documents with repeated terms.

So I think hooking in a custom snippeting function (there are good instructions for how to do that in the Search Developer's Guide) is probably your best bet, unless you have the luxury of changing your content model.

--Colleen

________________________________________
From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] On Behalf Of Danny Sinang [d.sinang at gmail.com<mailto:d.sinang at gmail.com>]
Sent: Friday, March 23, 2012 9:46 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:search() - why only 1 search match per XML doc

Hi Will,

Someone else wrote our search module (that uses search:search) and we discovered today that it returns only 1 search match even if the word being looked up occurs several times in the 'htmlBody' element of a particular XML document.

We were hoping the search:search would return all matches within the 'htmlBody' element.

But it looks like search:search won't do so because 'htmlBody' contains escaped html. If it were unescaped, then search:search would return all the matches for the word we're looking for.

I don't know if search:search can be told to treat the contents of 'htmlBody' as unescaped.

So what I'm trying to do is get search:search to return all the matches within the escaped string content of 'htmlBody'.

Regards,
Danny

On Fri, Mar 23, 2012 at 12:37 PM, Will Thompson <wthompson at jonesmcclure.com<mailto:wthompson at jonesmcclure.com><mailto:wthompson at jonesmcclure.com<mailto:wthompson at jonesmcclure.com>>> wrote:
Danny,

Without more details I’m not sure what you’re trying to do exactly, but it sounds like you may need to write your own snippet module.

-Will


From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com><mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com><mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>>] On Behalf Of Danny Sinang
Sent: Friday, March 23, 2012 8:22 AM
To: general
Subject: [MarkLogic Dev General] search:search() - why only 1 search match per XML doc

Hello.

Am trying to search for the word 'populations' in an XML doc which mentions that word around 5 times in its htmlBody element.

search:search() returns only the first occurrence of that word in that element.

Is there an option or way to make search:search return matches for the other occurrences of population ?

Note that the contents of the htmlBody element (shown below) is a string.

Regards,
Danny


<htmlBody>&lt;body xmlns="http://www.w3.org/1999/xhtml"&gt;

 &lt;div&gt;

 &lt;div&gt;

  &lt;h5&gt;Control of Bacterial Populations&lt;/h5&gt;

  &lt;p class="Indent00" id="xpp-2014582732321794086-1"&gt;The diseases and many kinds of environmental problems caused by bacteria are actually population control problems. Small numbers of bacteria cause little harm. However, when the population increases, their negative effects are multiplied. Despite large investments of time and money, scientists have found it difficult to control bacterial populations. Three factors operate in favor of the bacteria: their reproductive rate, their ability to form resistant stages, and their ability to mutate and produce strains that resist antibiotics and other control agents.&lt;/p&gt;

  &lt;p class="Indent01" id="xpp-2014582732321794086-2"&gt;Under ideal conditions, some bacteria can grow and divide every 20 minutes. If one bacterial cell and all its offspring were to reproduce at this ideal rate, in 48 hours there would be 2.2 &amp;times; 10 43 cells. In reality, bacteria cannot achieve such incredibly large populations, because ... </htmlBody>

_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com><mailto:General at developer.marklogic.com<mailto:General at developer.marklogic.com>>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list