A peek inside RunDMC, Part 1

by Evan Lenz

RunDMC is the name affectionately given to the MarkLogic application that runs the Developer Community website, i.e. the site you're viewing right now. As far as I know, it's the first XSLT-based MarkLogic application in production. You can go and steal all its ideas and techniques from the code repository, since it's all open-source. But I thought one way to make things more accessible would be to write about some of the basic architectural approaches used in the application. That way, should you decide to glean some ideas from the code, you'll be one step ahead, knowing what to look for when you do dive in.

I'll be sharing two main techniques we used in the design of RunDMC. The first was to use a single XML file to globally configure the site's navigational structure (grouping of pages into sections and sub-sections); in other words, we used a sitemap to drive the site structure. The second was to use an XML-based tag library to insert dynamic content into the XML pages. This post is about the first technique. I'll cover the second in part 2.

Sitemap-driven site

Using an XML-based sitemap allowed us to keep URI structure independent of the site's navigational structure, which was a principal design goal. You can view the sitemap XML, called navigation.xml, in the online code repository.

The external URL of each page is determined by the location of its corresponding XML file in the database's directory structure. For example, the page you're reading right now has the URL "/blog/a-peek-inside-rundmc" which tells you that its content is stored in an XML document with this URI: /blog/a-peek-inside-rundmc.xml. In other words, the path is exactly the same, except that the ".xml" part is stripped out in the external URI. (This is achieved using a simple URL rewriter XQuery script.)

If you later decide you want to change a page's URL, you'd have to change the document's URI, e.g., move it to a different directory. Isn't that kind of a pain though? Well, yes. But that's by design. Cool URIs, you see, don't change. Besides, you'd also have to go and update all your links to that page or at least create a redirect from the old page to the new page. So it should be hard to change a page's URL.

But don't websites get redesigned, or at least re-organized, all the time? Yes, they do. However, that doesn't mean the URLs need to change, unless you have the misfortune of using a Web framework that forces you to map URL structure to navigational structure. URI design and information architecture, though related, should not be dependent on each other. With RunDMC, if you want to move a page to a different section of the site, all you'd have to do is move the corresponding <page> element in navigation.xml. No need to change the URL. This, by the way, is not the first time I've used this technique (or preached about it).

How does it work? Well, if you think about it, the only thing that makes a page part of one section versus another is what menus are displayed on the page when you view it in the browser. For example, the DMC page devoted to RunDMC appears in the "Code" section of the site not because its URL starts with "/code/" (that's maintained only by convention, and the application doesn't enforce it), but because the page is configured in navigation.xml to appear with the menus you see on that page:

The above configuration excerpt is what causes the following menu to be displayed on the page, with RunDMC appearing highlighted in the "Applications" sub-menu of the "Open Source Projects" menu:

The sub-navigation menu for 'Open Source Projects' containing 'Applications' containing 'RunDMC'

And since this menu configuration <group> appears as a child of the <page> element for the top-level "Code" page, a glance at the top-level site menu shows that we're in the "Code" section of the site (highlighted):

Top-level site menu with 'Code' highlighted

But how do these menus actually get generated? The answer to that lies in our use of an XML-based tag library, which I'll be covering in part 2. In the meantime, for a sneak preview, look at the XHTML template we're using for every page of the site, and search for <ml:top-nav/> and <ml:sub-nav/>. Those are the magical incantations that tell RunDMC where to put the menus on the page.

Comments

  • Ryan, thanks for the kind words. I made an architectural commitment to using URIs referentially in the content, because that's what URIs were meant for, at least "cool" ones (descriptive and unchanging). Can you give a reason why an extra level of indirection would be helpful? Obviously, it means you could then change the URIs later on, if you want. But the whole point was never to change them. (Besides, redirects can always be added if necessary.) Or perhaps you're suggesting that the folder structure in the database should be decoupled from the external URI structure. Again, I didn't see a reason for the indirection, provided everyone understands the contract to never move the documents around in the database. You can still of course use collections (orthogonal to directory structure), and in fact, you can add this layer of indirection later on, if it becomes necessary for some reason.
  • I like the sitemap driven approach. It keeps presentation information out of your content, which is especially good if you need to delivery your content across multiple channels. As far as the uri concern, why not just create a permalink element in the content and use that value in the href of the sitemap? Then you can reorder your files any way you want in the database and the links will remain the same. In fact, I try to never store a uri (or derivation thereof) anywhere in the content, but rather any pointers to documents are always done by value (usually unique), and let the uri just be the system handle on a particular doc, but not a referential way in any content. This sitemap approach is also really good because you can easily transform it into a sitemap for search engines. Really good idea. Thanks