The new docs.marklogic.com (under the hood)

by Evan Lenz

In part 1, we highlighted some of the basic features of the new docs.marklogic.com. Now let's take a quick look under the hood. Powering this application is—you'll never guess it—MarkLogic! In almost all cases, each page corresponds to one document in the database. This is a good guideline in general for search applications. Although the Search API can return results at sub-document levels, the most common and natural approach is for there to be a 1:1 correspondence between documents (fragments) and search results.

This is all well and good, but what if your content isn't already in the format and structure you want? You have a couple of choices (touched on in "Navigating a jungle of data"):

  1. write the brute-force XQuery & XSLT to adapt the content to the output you want, at run time
  2. write pre-processing code so that, as much as possible, the content loaded into the database suits the needs of the application.

I took the latter approach for a couple of reasons:

  • The application code was much easier to write, since the data was in a natural structure for building a search application.
  • The application code ran much faster since all the heavy lifting had already been completed during the "build" phase.

If you're curious, you can take a look at the master script that kicks off the complete build. This includes a lot of heavy-duty XSLT for converting content from the format it was originally authored in.

Once the content is in place (build completed), it's now time for the application code to do its work (at run time). When you make a request to a page on docs.marklogic.com (as well as on developer.marklogic.com), the URL is mapped to the underlying document in the database. That document is then transformed by an XSLT stylesheet (page.xsl, which imports many other XSLT scripts used by the developer site). In fact, although the look and feel has changed, the basic architecture of the site hasn't changed for two years. (See A peek inside RunDMC, part 1 and part 2.) One of the advantages of using XSLT here is that we can re-use all the existing template rules for rendering the rest of the developer site but override specific ones to achieve a different effect on docs.marklogic.com.

So far we've talked about:

  1. getting the content in place (build process)
  2. transforming it via XSLT (at run time)

Now let's briefly touch on what the build- and run-time code outputs. For the table of contents on the left, we're using the jQuery Treeview plugin:

Machine generated alternative text: B Getting Started Guides I Getting Started With Mark Logic Servei I i Installation Guide for All Platforms rn Release Notes Developer’s Guides Application Developer’s Guide Search Developer’s Guide l Application Builder Developer’s Guide Information Studio Developer’s Guide l XCC Developer’s Guide [] MarkLogic Connector for Hadoop Developer’s Guide

The HTML of the TOC itself is pre-generated at build time, since it doesn't need to change at run time. An additional optimization is that parts of the TOC are lazily loaded as pre-rendered static HTML files (also generated at build time).

The tabs for switching between TOCs is implemented using the jQuery UI Tabs plugin:

Machine generated alternative text: XQueryi REST Java Guides Other XSLT API API Doca

An early (and still common) approach to developing web applications was to load one container page, including the header, footer, TOC, navigation, etc., and then use AJAX calls to grab the content, each time the user clicks a link. The advantage of this approach is that each page loads very quickly, since the browser doesn't have to download the whole template for each page. In this scenario, the base URL in the browser doesn't change, but only the fragment identifier, using what's called the "hash-bang" technique, e.g., #!mypage. The disadvantage is that, to put it bluntly, this approach breaks the Web.

So we took the opposite approach. Rather than load the page content via Ajax, load the TOC via Ajax. Since we use static HTML for the TOC, the browser caches it, so that when you re-load the page the TOC doesn't have to get fetched again. No hash-bangs necessary! But now we're back to having to download the whole template (including the TOC, even if it's cached) for every page load. Have no fear: PJAX to the rescue! PJAX is an excellent, evolving JavaScript library that utilizes the HTML5 History API in newer browsers. It lets you have the best of both worlds in new browsers (while degrading gracefully for older ones):

  • clean URLs that change when you go from page to page (no hash-bangs)
  • fast page loads via Ajax

I hope you enjoyed this look under the hood! Let me know if you have any questions.

Note: not only is the application code open-source, but our issues list is too. If you notice anything wrong with the behavior of our online docs, or want to suggest an improvement, we'd love it if you could report it by submitting a new issue on GitHub.

Comments