[MarkLogic Dev General] Restricting Search Hits ToImmediateParentContainers

Danny Sokolsky dsokolsky at marklogic.com
Tue Jul 1 19:18:13 PDT 2008


Do you want your searches to always return the top-level "section", but
return it if the match is in a p tag child of *any* section element?
Your concern about returning dups implies that.  If so, then you can
rename your top-level section in your xml, and then perform a search
something like:

let $q := cts:word-query("searchterm") 
return
cts:search(/path/to/top-level-section, $q)[( cts:contains(./p, $q) or 
                                             cts:contains(.//section/p,
$q) )]

Given that you want to search a more complicated set of elements,
however, another option to  consider is creating a field, specifying the
needed included and excluded elements.  Then you could use
cts:field-word-query to search.  I am not positive the field will work
for your content, but it sounds like it is worth pursuing.  To find out
more about field, see the "Fields Database Settings" chapter of the
Administrator's Guide
(http://developer.marklogic.com/pubs/3.2/books/admin.pdf).

-Danny

-----Original Message-----
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of John Craft
Sent: Tuesday, July 01, 2008 6:30 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Restricting Search Hits
ToImmediateParentContainers

Danny-

Thanks for the suggestions.  One thing I didn't mention, as I was trying
to keep the example simple, is that I would eventually like to search
additional child elements of <section> (like a <title> element and
possibly <indexterm>) in addition to <p>, weighting them appropriately.
That rules out your third suggestion and may rule out your first
suggestion (not quite sure).

The second approach won't work because there could be a <section> that
contains a <p> that also contains a <section> that contains a <p> that
contains the search terms.  Example:

<section>
 <p />
 <section>
  <p>search terms</p>
 </section>
</section>

Using the predicate [fn:exists(./p)], the markup above would return two
results when I would like for it to return one.

If you think there is an approach that uses cts:query() I would be very
interested.  Our content is pretty simple and I have included an outline
of the basic structure below.  Of course, I could also send you more (or
a file) if that would be more helpful.

Content structure (nested sections can go eight levels deep):

<chapter>
 <title />
 <subchapter>
  <title />
  <section>
   <title />
   <p />
   <section>
    <title />
    <p />
    <section>
     <title />
     <p />
    </section>
   </section>
  </section>
 </subchapter>
</chapter>

I'm willing to add/edit elements and attributes if necessary.  I just
don't know what would make things easiest for MarkLogic.

Thanks again.

John Craft



-----Original Message-----
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny
Sokolsky
Sent: Tuesday, July 01, 2008 4:27 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Restricting Search Hits To
ImmediateParentContainers

Hi John,

This can be a little tricky, as it sounds like your "section" elements
can mean different things in different places in the document.  One
approach can be to change your section element names for the ones that
have p children to something different, and then search over those.  It
would be relatively easy to write a transformation in XQuery to do that.
Ultimately, this might prove to make your content the most searchable
for what you want.  

Another approach is to filter out the results that do not have a direct
p child from the search results.  This will probably be OK if the number
of results to filter is small relative to the number of results returned
from the search.  This might look something like:

cts:search(//section, "searchterm")[fn:exists(./p)]

You can also search below the section element (//section/p), but that
would return p elements.  Depending on your content, that might work.

There may be a cts:query solution here, too, but without knowing your
content very well, it is harder for me to see that.  

-Danny

-----Original Message-----
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of John Craft
Sent: Tuesday, July 01, 2008 12:58 PM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Restricting Search Hits To Immediate
ParentContainers

I am evaluating MarkLogic and have been playing around with the
cts:element-query() and cts:element-word-query() expressions.  So far, I
am having difficulty restricting search results to elements that are
direct parents of the elements that contain the search terms.

Our content is made up of nested <section> elements and most <section>
elements contain <p> elements, which are our containers for paragraph
text.  The <section> elements contain <title> elements and other
information as well.  When performing a search, I would like to limit
the results to only the <section> elements whose direct <p> children
contain search terms.  I began by creating the following cts:search()
string:

cts:search(fn:doc()//section, cts:element-query(xs:QName("section"),
cts:element-query(xs:QName("p"), "searchTerm") ))

This approach was flawed because the search results included <section>
elements that were further up the tree and didn't directly contain <p>
elements (or, rather, <p> elements that contained the search terms).

My next approach was to use cts:element-word-query() and create an
element-word-query-through for the <p> element:

cts:search(fn:doc()//section,
cts:element-word-query(xs:QName("section"), "searchTerm") )

Again, the search results contain <section> elements that aren't direct
parents of <p> elements that contain search terms.  The end result is
that I end up with a lot of <section> elements that are false positives.

I'm beginning to think the path information on the first cts:search()
argument may be the problem, but I'm not sure.  And if it is the
problem, how else can I get search results returned as <section>
elements

I appreciate any help or suggestions you can provide.

Thanks.

John Craft
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general


More information about the General mailing list