[MarkLogic Dev General] Restricting Search
Hits ToImmediateParentContainers
Mike Sokolov
sokolov at ifactory.com
Wed Jul 2 06:27:55 PDT 2008
How about
cts:element-query(xs:QName("p"), "searchTerm")/parent::section
or
cts:element-word-query(xs:QName("p"), "searchTerm")/parent::section ?
Danny Sokolsky wrote:
> Do you want your searches to always return the top-level "section", but
> return it if the match is in a p tag child of *any* section element?
> Your concern about returning dups implies that. If so, then you can
> rename your top-level section in your xml, and then perform a search
> something like:
>
> let $q := cts:word-query("searchterm")
> return
> cts:search(/path/to/top-level-section, $q)[( cts:contains(./p, $q) or
> cts:contains(.//section/p,
> $q) )]
>
> Given that you want to search a more complicated set of elements,
> however, another option to consider is creating a field, specifying the
> needed included and excluded elements. Then you could use
> cts:field-word-query to search. I am not positive the field will work
> for your content, but it sounds like it is worth pursuing. To find out
> more about field, see the "Fields Database Settings" chapter of the
> Administrator's Guide
> (http://developer.marklogic.com/pubs/3.2/books/admin.pdf).
>
> -Danny
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of John Craft
> Sent: Tuesday, July 01, 2008 6:30 PM
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] Restricting Search Hits
> ToImmediateParentContainers
>
> Danny-
>
> Thanks for the suggestions. One thing I didn't mention, as I was trying
> to keep the example simple, is that I would eventually like to search
> additional child elements of <section> (like a <title> element and
> possibly <indexterm>) in addition to <p>, weighting them appropriately.
> That rules out your third suggestion and may rule out your first
> suggestion (not quite sure).
>
> The second approach won't work because there could be a <section> that
> contains a <p> that also contains a <section> that contains a <p> that
> contains the search terms. Example:
>
> <section>
> <p />
> <section>
> <p>search terms</p>
> </section>
> </section>
>
> Using the predicate [fn:exists(./p)], the markup above would return two
> results when I would like for it to return one.
>
> If you think there is an approach that uses cts:query() I would be very
> interested. Our content is pretty simple and I have included an outline
> of the basic structure below. Of course, I could also send you more (or
> a file) if that would be more helpful.
>
> Content structure (nested sections can go eight levels deep):
>
> <chapter>
> <title />
> <subchapter>
> <title />
> <section>
> <title />
> <p />
> <section>
> <title />
> <p />
> <section>
> <title />
> <p />
> </section>
> </section>
> </section>
> </subchapter>
> </chapter>
>
> I'm willing to add/edit elements and attributes if necessary. I just
> don't know what would make things easiest for MarkLogic.
>
> Thanks again.
>
> John Craft
>
>
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny
> Sokolsky
> Sent: Tuesday, July 01, 2008 4:27 PM
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] Restricting Search Hits To
> ImmediateParentContainers
>
> Hi John,
>
> This can be a little tricky, as it sounds like your "section" elements
> can mean different things in different places in the document. One
> approach can be to change your section element names for the ones that
> have p children to something different, and then search over those. It
> would be relatively easy to write a transformation in XQuery to do that.
> Ultimately, this might prove to make your content the most searchable
> for what you want.
>
> Another approach is to filter out the results that do not have a direct
> p child from the search results. This will probably be OK if the number
> of results to filter is small relative to the number of results returned
> from the search. This might look something like:
>
> cts:search(//section, "searchterm")[fn:exists(./p)]
>
> You can also search below the section element (//section/p), but that
> would return p elements. Depending on your content, that might work.
>
> There may be a cts:query solution here, too, but without knowing your
> content very well, it is harder for me to see that.
>
> -Danny
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of John Craft
> Sent: Tuesday, July 01, 2008 12:58 PM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Restricting Search Hits To Immediate
> ParentContainers
>
> I am evaluating MarkLogic and have been playing around with the
> cts:element-query() and cts:element-word-query() expressions. So far, I
> am having difficulty restricting search results to elements that are
> direct parents of the elements that contain the search terms.
>
> Our content is made up of nested <section> elements and most <section>
> elements contain <p> elements, which are our containers for paragraph
> text. The <section> elements contain <title> elements and other
> information as well. When performing a search, I would like to limit
> the results to only the <section> elements whose direct <p> children
> contain search terms. I began by creating the following cts:search()
> string:
>
> cts:search(fn:doc()//section, cts:element-query(xs:QName("section"),
> cts:element-query(xs:QName("p"), "searchTerm") ))
>
> This approach was flawed because the search results included <section>
> elements that were further up the tree and didn't directly contain <p>
> elements (or, rather, <p> elements that contained the search terms).
>
> My next approach was to use cts:element-word-query() and create an
> element-word-query-through for the <p> element:
>
> cts:search(fn:doc()//section,
> cts:element-word-query(xs:QName("section"), "searchTerm") )
>
> Again, the search results contain <section> elements that aren't direct
> parents of <p> elements that contain search terms. The end result is
> that I end up with a lot of <section> elements that are false positives.
>
> I'm beginning to think the path information on the first cts:search()
> argument may be the problem, but I'm not sure. And if it is the
> problem, how else can I get search results returned as <section>
> elements
>
> I appreciate any help or suggestions you can provide.
>
> Thanks.
>
> John Craft
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
More information about the General
mailing list