The Search API
The Search API provides conveniences on top of the lower-level cts:query
underpinnings. It can parse a user's typed query, execute the corresponding
search, return highlighted results, and return facet values. It has numerous
extension points where you can still drop down to the cts:query
layer.
The Search API lets you control the syntax of user query strings, but out of the box it supports simple words or quoted phrases, like this:
semi-conductor "moore's law"
It supports constraints on where in a document to match:
list:apache from:ibm.com subject:release "proud to announce"
Along with negation:
list:apache -list:tomcat
As well as things like boolean grouping:
(cat OR dog) AND horse
As a programmer it's easy to construct queries like these using cts:query
objects, and you've seen it done above. But it can take some work to parse
the user's string and construct the cts:query
objects. That's a core feature
of the Search API.
The Search API is provided along with MarkLogic Server but because it's written in XQuery, you need to import it like any other module. Try this query:
What you see is an XML summary of the results. It tells you which documents matched, and with what score. It includes by default a short snippet showing some text around the match with hits highlighted. Handy!
You can control the search:search()
behavior to give it extra capabilities or
specify preferences by passing in an options node (or configuring an options
node in your configuration). The below example specifies three new
constraints: "list", "from", and "subject".
If you change search:search()
to search:parse()
and you'll see the underlying
cts:query
that's being executed. Give it a try. You'll notice the result is
XML. Every cts:query
construct has an XML serialization and that's what
you're seeing here (with a few extra notations from the Search API).
Results are sorted by relevance score. Often that's appropriate, but sometimes a user might want to sort by date. We can offer that option via the options node. The following lets the user sort by "relevance", "date-forward", or "date-backward". If sorting by date, relevance score breaks any ties.
You'll need to specify a range index for the QName being sorted, in order to
make the sort efficient. We have pre-configured such an index on
message/@date
so the above should work just fine for you.
Using our message/@date
range index we can also let the user query against
specific date ranges. Here's an example that demonstrates constraining by
pre-defined by dynamically-calculated named date ranges: today
, yesterday
,
30-days
, 60-days
, year
, and decade
. Because mail on our site here isn't
constantly updating, we'll search for date:decade
to match all mails in the
last 10 years that include the phrase "web server".
In the resulting XML, look what comes after the document results. There's a
<search:facet name="date">
section. That's automatically added when you have
a range constraint. It lists how many matches there are to the query in each
range bucket. You can use it to present a sidebar offering the user the
ability to click on a facet to isolate the search to just documents matching
that facet value, like how MarkMail lets you slide over the date histogram.
Just be mindful of performance. The <search:metrics>
element shows you how
much time was taken doing the facet work. If you're not interested in the
facets, you can add <return-facets>false</return-facets>
. If you only want
facets and not results, you can use <return-results>false</return-results>
.
You don't always need buckets for facets. Here's how to extract the top 100 senders (without results):
Query-Limited Facets
Extending Search API
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.