name() is a code smell

by Evan Lenz

I hadn't intended to do a series on XQuery code smells, but this post is in the same vein as the "text() is a code smell" post from last month. If text() slightly smells, then name() really reeks. There are exceptions of course, and I'll enumerate the ones I can think of, but you'll see that they're pretty particular. Your best policy may be to avoid fn:name() altogether and only use the other XPath functions that have to do with getting the name of a node:

  • fn:node-name()
  • fn:namespace-uri()
  • fn:local-name()

This came up recently on the developer list. One of our users had a subtle bug in their code that was hard to track down, and the root of the problem was due to the use of the name() function.

Before I explain what's dangerous about the name() function, it may help to step back and think about the different parts of a node's name. Let's focus on elements, since, along with attributes, those are the node names we deal with most often in XML. Consider the following simple XML document (I encourage you to follow along in Query Console):

xdmp:document-insert("/test.xml",
<doc>
  <title>Hello</title>
</doc>
)

Sometimes it's useful to get a node's name, because you don't already know what it is, or because you want to query for a node by a substring in its name, such as "get me the element whose name ends with a particular value". Otherwise, we can access, for example, the <title> element above using a simple XPath expression:

doc("/test.xml")/doc/title

This is an equivalent (though not necessarily equivalently performing) query to the following:

doc("/test.xml")/*[node-name(.) eq xs:QName("doc")]
                /*[node-name(.) eq xs:QName("title")]

These are equivalent regardless of whether you have declared a default element namespace in the queries. It will select the same nodes in either case. Of course, given our sample document, it will only work if a default namespace is not declared in the query, because the sample document doesn't use any namespaces.

The above query illustrates what the node-name() function does: it returns the name of a node, which is represented using an xs:QName value. Now here's the surprising part. An xs:QName value is actually a tuple of three (not two) strings. It's just that the first two are deemed to be more significant than the third:

  1. local name
  2. namespace URI
  3. namespace prefix

The local name (#1) is always a non-empty string, whereas #2 and #3 can both be empty. In the case of our sample doc, the <title> element's QName consists of these parts:

Name part

Value

local name

"title"

namespace URI

""

namespace prefix

""

A hint as to the importance of the local name and namespace URI (as opposed to the lowly prefix) is that XPath provides you direct ways of accessing the first two, but not the third. So:

local-name(doc("/test.xml")/doc/title)

yields "title", and

namespace-uri(doc("/test.xml")/doc/title)

yields "" (the empty string). But there is no "namespace-prefix()" function.

Now let's add a new sample document, different from the first in that it adds a namespace:

xdmp:document-insert("/test2.xml",
<doc xmlns="http://example.com">
  <title>Hello</title>
</doc>
)

As far as application code is normally concerned, this document uses entirely different names than the first one. In other words, the namespace URI is an essential part of an element's name. If you change an existing document's namespace (or add one, as we did here), then you should fully expect your application code to break. In other words, to keep your code from breaking, you'll need to update it accordingly. The <title> element's QName now looks like this:

Name part

Value

local name

"title"

namespace URI

"http://example.com"

namespace prefix

""

Now let's add a third document that's almost identical to the second one we added. This time, we'll use the exact same local names and namespace URIs, but add a prefix:

xdmp:document-insert("/test3.xml",
<my:doc xmlns:my="http://example.com">
  <my:title>Hello</my:title>
</my:doc>
)

This may look like a big change, but as far as application code should be concerned, the above document is completely equivalent to the previous one. Even though the prefix differs, the essential name of the <title> element hasn't changed:

Name part

Value

local name

"title"

namespace URI

"http://example.com"

namespace prefix

"my"

With a change like this (adding a prefix), you should not expect your application code to break. Similarly, your code should be able to handle, for example, an XHTML document whether or not it uses namespace prefixes. All of this is to say that there is a widely followed convention that prefixes (or the lack thereof) are not significant.

This is demonstrated most clearly by the XPath language itself. Two xs:QName values that differ only in their prefixes compare to be equal. Thus the following query returns true:

node-name(<my:doc xmlns:my="foo"/>) eq
node-name(<doc xmlns="foo"/>)

Similarly, the deep-equal() function considers prefixes to be insignificant. The following query returns true:

deep-equal(doc("/test2.xml"), doc("/test3.xml"))

So if your code does break just from changing the prefixes, then you've got a problem. And that brings me back to the name() function. It is the most common cause of this problem. That's because, unlike node-name(), it doesn't return the full xs:QName value. Instead, it returns a string that is lexically a QName (local name preceded by an optional prefix). In other words, it doesn't include the namespace URI, which is an essential part of the name. But it does include the prefix, a non-essential part of the name. A recipe for disaster. For example, given an ad hoc collection of our last two docs:

declare variable $docs := (doc("/test2.xml"),doc("/test3.xml"));

the following query is unequivocally bad practice:

$docs/*/*[starts-with(name(),'my:title')]

That's because it depends on the use of a particular prefix ("my") in your source document (and will only return the <title> from test3.xml. The same is true if you don't use a prefix:

$docs/*/*[starts-with(name(),'title')]

This will only select <title> elements if they don't use a prefix. (So it will only return the <title> from test2.xml.) Bad, bad, bad. The upshot is that the name() function is almost always a code smell. Do yourself a favor and use local-name() or node-name() instead. In this case, to equally select both documents, you should instead use local-name():

$docs/*/*[starts-with(local-name(),'title')]

Okay, I promised to give you some exceptions. Here are some cases where the name() function is perfectly safe (if not particularly essential):

  • When you’re logging diagnostic or debugging information, e.g., xdmp:log(name($element))
  • When the argument node is a node other than an element or attribute, in which case the result of name() will always be the same as the result of local-name().
  • When you’re testing for xml:space or xml:preserve, as in @*[name() != 'xml:space']. This is safe because the "xml" prefix is fixed 1:1 to that namespace.
  • When you’re testing for an attribute that is not in a namespace, as in @*[name() != 'id']. This is safe because an unprefixed attribute name always means “not in a namespace”. That’s because the default namespace is not in effect for attribute nodes.
  • When, in XSLT, you’re creating a node using <xsl:element name="{name(.)}" namespace="{namespace-uri(.)}">. This is safe (as is an analogous use of <xsl:attribute>), because a prefix that might be present in the name attribute won’t be used to look up the namespace URI as long as you provide it explicitly using the namespace attribute. The reason this usage of name() is handy is that the XSLT processor can then use that prefix in the element name that it creates in the result tree.

As I said, these are pretty particular exceptions. You may decide that it's easier to just avoid name() altogether. As always, if you have a different perspective (or want to add to my list of exceptions), feel free to comment below!

Comments

  • I would just add that you left out the smelliness of the test [name()='abc'], which should preferably be written [self::abc]; and in XSLT, the xsl:choose with a long sequence of tests like xsl:when test="name='abc'" which reveals a developers who isn't yet confortable with template rules and apply-templates.
    • Good point. I originally had *[name() eq 'my:title'], but then I decided to change it to something that would actually require getting the name, hence starts-with(). But that's definitely the smelliest use, you're right. On the second point, perhaps I need to write a version of <a href="http://developer.marklogic.com/blog/tired-of-typeswitch">this article</a> geared toward budding XSLT users too. :-)