The Search API

The Search API provides conveniences on top of the lower-level cts:query underpinnings. It can parse a user's typed query, execute the corresponding search, return highlighted results, and return facet values. It has numerous extension points where you can still drop down to the cts:query layer.

The Search API lets you control the syntax of user query strings, but out of the box it supports simple words or quoted phrases, like this:

semi-conductor "moore's law"

It supports constraints on where in a document to match:

list:apache from:ibm.com subject:release "proud to announce"

Along with negation:

list:apache -list:tomcat

As well as things like boolean grouping:

(cat OR dog) AND horse

As a programmer it's easy to construct queries like these using cts:query objects, and you've seen it done above. But it can take some work to parse the user's string and construct the cts:query objects. That's a core feature of the Search API.

The Search API is provided along with MarkLogic Server but because it's written in XQuery, you need to import it like any other module. Try this query:

import module namespace search =
	"http://marklogic.com/appservices/search" at
	"/MarkLogic/appservices/search/search.xqy";
search:search('"proud to announce"')

What you see is an XML summary of the results. It tells you which documents matched, and with what score. It includes by default a short snippet showing some text around the match with hits highlighted. Handy!

You can control the search:search() behavior to give it extra capabilities or specify preferences by passing in an options node (or configuring an options node in your configuration). The below example specifies three new constraints: "list", "from", and "subject".

import module namespace search =
	"http://marklogic.com/appservices/search" at
	"/MarkLogic/appservices/search/search.xqy";

search:search("list:apache from:ibm.com subject:release patches",
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="list">
      <word>
        <element ns="" name="message"/>
        <attribute ns="" name="list"/>
      </word>
    </constraint>
    <constraint name="from">
      <word>
        <element ns="" name="from"/>
        <attribute ns="" name="address"/>
      </word>
    </constraint>
    <constraint name="subject">
      <word>
        <element ns="" name="subject"/>
        <attribute ns="" name="normal"/>
      </word>
    </constraint>
  </options>)

If you change search:search() to search:parse() and you'll see the underlying cts:query that's being executed. Give it a try. You'll notice the result is XML. Every cts:query construct has an XML serialization and that's what you're seeing here (with a few extra notations from the Search API).

Results are sorted by relevance score. Often that's appropriate, but sometimes a user might want to sort by date. We can offer that option via the options node. The following lets the user sort by "relevance", "date-forward", or "date-backward". If sorting by date, relevance score breaks any ties.

import module namespace search =
	"http://marklogic.com/appservices/search" at
	"/MarkLogic/appservices/search/search.xqy";

search:search("list:hadoop bug sort:date-forward",
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="list">
      <word>
        <element ns="" name="message"/>
        <attribute ns="" name="list"/>
      </word>
    </constraint>
    <constraint name="from">
      <word>
        <element ns="" name="from"/>
        <attribute ns="" name="address"/>
      </word>
    </constraint>
    <constraint name="subject">
      <word>
        <element ns="" name="subject"/>
        <attribute ns="" name="normal"/>
      </word>
    </constraint>

    <operator name="sort">
      <state name="relevance">
        <sort-order>
          <score/>
        </sort-order>
      </state>
      <state name="date-forward">
        <sort-order direction="ascending" type="xs:dateTime">
          <element ns="" name="message"/>
          <attribute ns="" name="date"/>
        </sort-order>
        <sort-order>
          <score/>
        </sort-order>
      </state>
      <state name="date-backward">
        <sort-order direction="descending" type="xs:dateTime">
          <element ns="" name="message"/>
          <attribute ns="" name="date"/>
        </sort-order>
        <sort-order>
          <score/>
        </sort-order>
      </state>
    </operator>

  </options>)

You'll need to specify a range index for the QName being sorted, in order to make the sort efficient. We have pre-configured such an index on message/@date so the above should work just fine for you.

Using our message/@date range index we can also let the user query against specific date ranges. Here's an example that demonstrates constraining by pre-defined by dynamically-calculated named date ranges: today, yesterday, 30-days, 60-days, year, and decade. Because mail on our site here isn't constantly updating, we'll search for date:decade to match all mails in the last 10 years that include the phrase "web server".

import module namespace search =
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

search:search('"web server" date:decade',
<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="date">
    <range type="xs:dateTime">
      <element ns="" name="message"/>
      <attribute ns="" name="date"/>
      <computed-bucket name="today" ge="P0D" lt="P1D"
           anchor="start-of-day">Today</computed-bucket>
      <computed-bucket name="yesterday" ge="-P1D" lt="P0D"
           anchor="start-of-day">yesterday</computed-bucket>
      <computed-bucket name="30-days" ge="-P30D" lt="P0D"
           anchor="start-of-day">Last 30 days</computed-bucket>
      <computed-bucket name="60-days" ge="-P60D" lt="P0D"
           anchor="start-of-day">Last 60 Days</computed-bucket>
      <computed-bucket name="year" ge="-P1Y" lt="P0D"
           anchor="now">Last Year</computed-bucket>
      <computed-bucket name="decade" ge="-P10Y" lt="P0D"
           anchor="now">Last Decade</computed-bucket>
    </range>
  </constraint>
</options>)

In the resulting XML, look what comes after the document results. There's a <search:facet name="date"> section. That's automatically added when you have a range constraint. It lists how many matches there are to the query in each range bucket. You can use it to present a sidebar offering the user the ability to click on a facet to isolate the search to just documents matching that facet value, like how MarkMail lets you slide over the date histogram.

Just be mindful of performance. The <search:metrics> element shows you how much time was taken doing the facet work. If you're not interested in the facets, you can add <return-facets>false</return-facets>. If you only want facets and not results, you can use <return-results>false</return-results>.

You don't always need buckets for facets. Here's how to extract the top 100 senders (without results):

import module namespace search =
	"http://marklogic.com/appservices/search" at
	"/MarkLogic/appservices/search/search.xqy";

search:search("list:hadoop bug",
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="list">
      <word>
        <element ns="" name="message"/>
        <attribute ns="" name="list"/>
      </word>
    </constraint>
    <constraint name="from">
      <word>
        <element ns="" name="from"/>
        <attribute ns="" name="address"/>
      </word>
    </constraint>
    <constraint name="subject">
      <word>
        <element ns="" name="subject"/>
        <attribute ns="" name="normal"/>
      </word>
    </constraint>

    <constraint name="sender">
      <range type="xs:string" facet="true">
        <element ns="" name="from"/>
        <attribute ns="" name="personal"/>
        <facet-option>frequency-order</facet-option>
        <facet-option>limit=100</facet-option>
      </range>
    </constraint>

    <return-results>false</return-results>
  </options>)

Query-Limited Facets

Extending Search API

Contents

The Search API