Drilling in with XPath

Sometimes you don't want to fetch whole messages, just parts of them, and in those cases you can use XPath to specify what part of a message you want. The following query gets the first email and returns its subject element:

(/message)[1]/headers/subject

This does the same for the first ten mails:

(/message)[1 to 10]/headers/subject

This returns the subjects as strings instead of XML elements, by executing the string() function on each subject:

(/message)[1 to 10]/headers/subject/string()

This returns the first (random) ten paragraphs that contain URLs:

(//para[url])[1 to 10]

The double slash means any depth under the parent is fine. The [url] predicate says the <para> element has to have a <url> child.

Why are we using parentheses so often? It's good practice when extracting a subset of items from a sequence. In XPath, the following query doesn't say to get one paragraph, it says to get all first paragraphs. It will return about 5,000,000 paragraphs, the first paragraphs from all emails, and take a very long time to execute (and yes, smiley faces are how you surround comments in XQuery):

(: Don't do it this way :)
//para[1]

That's powerful, but when you want just one paragraph, you use parentheses. The following query returns the first item across all paragraphs. It executes close to instantly.

(//para)[1]

Looking at a Mail Message

Formatting Results

Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

Contents

Drilling in with XPath