Punctuation in XPath, part 5: "//"

by Evan Lenz

In this final article in the series on XPath punctuation, we'll learn about the "//" operator, a convenient shorthand for selecting nodes at any level in a document. Let's start with a similar sample document to the one we used in the last article:

declare variable $doc := document {
  <people id="all">
    <group id="group1">
      <person id="peter">Peter</person>
      <person id="paul">Paul</person>
      <person id="mary">Mary</person>
    </group>
    <group id="group2">
      <person id="june">June</person>
      <person id="ward">Ward</person>
      <person id="beaver">Beaver</person>
    </group>
  </people>
};

Most people intuit that "//" is short for "/descendant::" (hint: that's wrong), and indeed that's how it behaves in many cases. For example, this expression:

$doc//person

selects the exact same sequence as:

$doc/descendant::person

But I ended the last article with a trick question. How many nodes does the following expression select?

$doc//person[1]

If "//" were short for "/descendant::", then it should select just one <person> (the first one). But in fact, it selects two: <person>Peter</person> and <person>June</person>. That's because "//" is not short for "/descendant::". The truth is that "//" is short for "/descendant-or-self::node()/". So the above expression is actually short for:

$doc/descendant-or-self::node()/child::person[1]

From this, we see that there are actually three steps in the expression (not two). The second (middle) step (descendant-or-self::node()) selects the "self" node, i.e. the context node ($doc), and all its descendants. The third and final step (child::person[1]), selects, of all those nodes, each first <person> child.

How to avoid the gotcha

The upshot is that using positional predicates (such as "[1]" or "[last()]") in combination with the "//" operator is a major gotcha in XPath. If you ever see them together, be suspicious. It might be evidence that the programmer doesn't know what they're doing. Quite often, what is actually intended is this:

$doc/descendant::person[1]

But have no fear, you can still use the convenient "//" shorthand, provided that you separate the predicate from the person step, using parentheses:

($doc//person)[1]

The above expression selects all <person> elements in the document, and, of those, returns only the first. This is usually what's intended.

Why, why, why?

Okay, so that's the gotcha, and now we know why the parentheses are necessary. But you may be asking, why in the world is this the case? Why was it defined this way? To trip us all up? Well, no, of course not. It turns out that you can also write expressions like this:

$doc//@id

In other words, select all (nine) @id attributes in the document, regardless of where they occur. We can also be more specific:

$doc/people/group//@id

In this case, we're selecting all @id attributes from <group> on down. Expanding this out, we can see exactly what it means. First of all, we see that the fully-expanded expression has five steps:

$doc/child::people/child::group/descendant-or-self::node()/attribute::id

In the fourth step, the "or-self" part is important. It ensures that the context node (<group>) is included in the intermediate result, so that the fifth (and final) step returns, not just the "id" attributes of its descendants but also its own "id" attributes (id="group1" and id="group2").

So now you know why it was defined this way: to make it easy to select not only elements but also attributes from any level.

When should I use "//"?

The "//" operator is convenient and easy to type, but it's also dangerous (and not just because of the gotcha outlined above). It can make your code run very slowly if you're not careful. When you use "//", you're telling the XPath/XQuery/XSLT processor that it has to look everywhere beneath your current context to find the nodes you're looking for. If you already know where the <person> elements are, for example, then you should be more explicit:

$doc/people/group/person

Only in cases where elements truly can occur at any or many different levels should you use "//", e.g., the <section> elements in a DocBook document, or the <div> elements in an XHTML document. If you need to process all of them, then //section or //div is quite reasonable. The task may be inherently expensive, but you're not making it unnecessarily so.

In other words, don't be lazy. Use "//" only when it's necessary—when you mean it—and not just to save typing a few characters. Not only does it make your code slower, but it also makes your code harder to read, since your intentions aren't being accurately reflected. If someone sees that you wrote "//", they are going to naturally assume that the elements you're looking for (e.g., <person> elements) may occur at many different levels. But that would be a lie, and if they suspect it's wrong, now you're making them go check the schema or look at the source docs to see if that's in fact the case. Save them the trouble by typing out the full path, rather than using "//".

Summary

Use "//" sparingly and intentionally. And when used in combination with a positional predicate, watch out! (You may need to add some parentheses.)

To review, here once again are all the axis-related syntax abbreviations in XPath 2.0:

This:

is short for this:

[notes or exceptions]

@ attribute::
.. parent::node()
foo child::foo

where "foo" is any node test except attribute(...) or schema-attribute(...)

attribute(foo) attribute::attribute(foo)

where "attribute(id)" is any attribute(...) or schema-attribute(...) node test

// /descendant-or-self::node()/

For your convenience, here are the rest of the articles in the series:

As always, if you have any questions or comments, please join the discussion by using the form below!

Comments