Punctuation in XPath, part 4: predicates ("[…]")

by Evan Lenz

We've already seen some examples of predicates, using square brackets ("[…]"), in action. In this article, we'll look at exactly how they work, using the following sample document:

declare variable $doc := document {
  <people>
    <group>
      <person>Peter</person>
      <person>Paul</person>
      <person>Mary</person>
    </group>
    <group>
      <person>June</person>
      <person>Ward</person>
      <person>Beaver</person>
    </group>
  </people>
};

Predicates are used to filter a sequence based on some test. Consider the following expression:

$doc/people/group/person[. eq 'June']

This expression selects all <person> elements and then filters out those elements whose string-value is not equal to "June". The test expression . eq 'June' must return true for the node to be included in the final result.

Positional predicates

Predicates can also be used to select nodes at a particular position within the sequence. For example, this expression selects each first <person> child of its parent:

$doc/people/group/person[1]

In this case, since there are two <group> elements, we end up with two people in the result: Peter and June. As you can see, a number value in a predicate is treated differently than a boolean. If the test expression returns a number (as in the above case), then the predicate is interpreted like this:

$doc/people/group/person[position() eq 1]

However, you shouldn't think of "[1]" merely as syntax sugar for "[position() eq 1]". Any expression that returns a number is evaluated this way. For example, the number could be returned by a function call or stored in a variable, as in this case:

$doc/people/group/person[$var]

If the value of $var is a number, then it is treated as a positional predicate. However, if it's anything else, then it's treated like a normal test expression, using the normal rules for converting values to a boolean. For example, an empty string or an empty sequence are converted to false.

What if you only want the first <person> among all the <person> elements in the document, rather than every first child? In that case, you'd have to apply the predicate to the whole expression to its left ($doc/people/group/person), rather than just the last step (person). This can be done by using parentheses:

($doc/people/group/person)[1]

In this case, the predicate is no longer a part of the "person" axis step. Instead, it filters the entire expression to its left, returning only Peter.

Forward and Reverse Axes

Whenever a predicate is part of an axis step, it is treated specially depending on which axis is being used. In particular, what position() returns inside a predicate is dependent on whether a forward or reverse axis is being used. For forward axes, positions are assigned using document order. For reverse axes, positions are assigned using reverse document order. As you may recall from the last article, $doc/people/group/person is actually short for:

$doc/child::people/child::group/child::person

Since the "child::" axis is one of the forward axes, that means that position() is assigned in document order. Putting it into the context of the document above, that means the context positions for elements returned by the last step (person) are assigned as follows:

Node

Context position

<person>Peter</person>

1

<person>Paul</person>

2

<person>Mary</person>

3

<person>June</person>

1

<person>Ward</person>

2

<person>Beaver</person>

3

Hence $doc/people/group/person[1] returns both Peter and June, as we saw above. The "person[1]" step is evaluated twice (once for each <group>), which is why the numbering restarts for June in the above table.

Things are different if we use one of the five reverse axes (the other eight axes are all forward axes):

  • parent::
  • ancestor::
  • ancestor-or-self::
  • preceding::
  • preceding-sibling::

In axis steps that use one of these axes, the context positions are assigned in reverse document order. Let's start with a node deep within the document:

declare variable $beaver := $doc/people/group/person[. eq 'Beaver'];

Starting from <person>Beaver</person>, we can select some node sequences that come before it, using the reverse axes:

Expression

What/"who" it selects

$beaver/preceding::person

Peter, Paul, Mary, June, and Ward

$beaver/preceding-sibling::person

June and Ward

$beaver/ancestor::*

<people> and <group>

If you were to then add a positional predicate to the step, it would select the first one in reverse document order. In other words, "[1]" selects the last node in document order.

Expression

What/"who" it selects

$beaver/preceding::person[1]

Ward

$beaver/preceding-sibling::person[1]

Ward

$beaver/ancestor::*[1]

<group>

Taking the first example, using the "preceding" axis, here are the context positions as they're assigned, working backwards from <person>Beaver</person>:

Node

Context position

<person>Ward</person>

1

<person>June</person>

2

<person>Mary</person>

3

<person>Paul</person>

4

<person>Peter</person>

5

It's easy to see from this that $beaver/preceding::person[1] returns Ward, $beaver/preceding::person[2] returns June, etc.

Now, here's the surprising part: axis steps always return nodes in document order. What? Didn't we just see an example of them being returned in reverse document order? Well, no. What we saw was the context positions being assigned in reverse document order. The node sequence that is actually returned will still always be in document order. To prove this, we can take the predicate outside the step (again, by adding parentheses):

Expression

What/"who" it selects

($beaver/preceding::person)[1]

Peter

($beaver/preceding-sibling::person)[1]

June

($beaver/ancestor::*)[1]

<people>

In the above cases, the predicate is not a part of an axis step and so it doesn't matter what expression is to the left. It is simply filtered in sequence order. In each case, the parenthesized expression returns a sequence of nodes in document order (because path expressions returning nodes always return nodes in document order).

This is true for axis steps in general, even if "/" isn't used. If a context node is defined (as it normally is in XSLT), then (ancestor::*)[1] is a legal expression and always returns the outermost element ancestor of the context node (first in document order), whereas ancestor::*[1] always returns the parent element of the context node (first in reverse document order).

Summary

To understand positional predicates, you need to be clear about what sequence of nodes is being filtered and how the context positions are assigned. In the general case, context positions are assigned according to the order of the sequence being filtered. The exception to this is when the predicate is part of an axis step that uses a reverse axis.

I'll leave you with a teaser for the next (and final) article in this series (about what "//" means): What nodes does the following expression select?

$doc//person[1]

And why?

Comments

  • Evan, excellent article series;  regarding filtering a sequence of nodes I find people grok a bit more when given a common scenario e.g. the 'default value' scenario. For example, let $a := ($somevar,'default value')[1] if fn:exists($somevar) is false then the default value is chosen. Jim Fuller
    • Thanks, Jim. That's a great example. Quicker than writing: <code>if (exists($somevar)) then $somevar else 'default value'</code> And it's especially useful when you've got a bunch of choices you want to prioritize: <code>(firstChoice, secondChoice, thirdChoice, fourthChoice, 'default')[1]</code>