Developing XQuery Applications: Part 2

This version of the tutorial applies to MarkLogic 8 and later. For MarkLogic 6 and MarkLogic 7, see the earlier version.

Before we get into building an application, we have to talk about architecture for a minute. MarkLogic provides a database, but is also a search engine and an application server. This means you have some choices about how you structure an application based on MarkLogic.

One option is the three-tier approach you’re likely familiar with — a browser (or other client) talks to an application-tier server, which gets its data from a database. When using MarkLogic this way, you can either build custom endpoints to support your database queries or use the out-of-the-box functionality of the MarkLogic REST API. In either case, you can think of XQuery (and its colleague, Server-side JavaScript) as a stored procedure language. The best way to explore this option is the REST API tutorial. The information below will still be useful to you, however.

You can also use MarkLogic in a simplified architecture, where MarkLogic’s application server hosts your entire web application. This part of the tutorial will show you this approach.

In this section, we will walk through creating a simple web-based MarkLogic application. This tutorial builds upon the foundation laid in Part 1. If you haven’t completed that tutorial yet, now would be a good time as we are going to pick up where that tutorial left off and start building an application on the setup we used in that tutorial. This tutorial is not designed to teach you to be some AJAX-wielding web ninja nor is it designed to make you an XQuery guru. What it will do is show you how to create a simple web-based application that uses the power of MarkLogic and along the way we’ll pick up some best practices for building our applications. Enough already, on to the actual tutorial!

Creating an HTTP Server

In addition to being the industry’s only operational database for Big Data, MarkLogic is also an HTTP server. Surprise! It is this feature that allows us to build web applications directly on the server using XQuery and to expose functionality to other services via an XML-RPC style interface. Query Console is really nothing more than a web application running directly on MarkLogic that provides a programming interface to the server.

To create an HTTP application server, we’ll use the Management API again. In your config directory, create a file called http-server.xml and paste in the following:

<http-server-properties xmlns="https://marklogic.com/manage">
  <server-name>Shakespeare</server-name>
  <root>/Users/dcassel/git/shakespeare/src</root>
  <port>8010</port>
  <content-database>shakespeare-content</content-database>
</http-server-properties>

Notice that the root is an absolute path to somewhere on your filesystem, pointing to a directory where you will store your source code for the application. Adjust the root with the path you are using. We don’t have any source code yet, but I’ve created a src directory at the same level as the config directory.

Tell MarkLogic to apply this configuration by sending http-server.xml:

curl --digest --user admin:mypassword -H "Content-type: application/xml" \
  -d @config/http-server.xml \
  'https://localhost:8002/manage/v2/servers?group-id=Default&server-type=http'

Hello World!

You knew this was coming, didn’t you? We’re going to start with a very simple application.

At this point, we have not provided any code for our application. Let’s try something: go to https://localhost:8010 and see what happens. You should have been prompted for your login credentials and then you should have received a 404 Not Found error. This seems logical as we just haven’t started our application yet. Let’s fix that.

In the src directory you created above, create a file called default.xqy and paste the following code into that file:

xdmp:set-response-content-type("text/html"),
('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">',
<html xmlns="https://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>My first MarkLogic web page</title>
  </head>
  <body>
    <p>Hello World!</p>
  </body>
</html>)

Now, let’s try our application again. Reload https://localhost:8010. Ah ha! Now we see our Hello World web page. Now that things are working we should take a moment to talk about what exactly is happening. First off, when the HTTP server is presented with a request that does not specify a page (as in the request we just made) it automatically looks for a file named default.xqy in the HTTP server’s root directory (this is all assuming that no url rewriter has been setup, which is a topic for another time). Now that we have created just such a page it was processed and the results were returned in our browser. We would have seen the exact same results if we had instead gone to https://localhost:8010/default.xqy.

Creating Web Content

OK, that’s all well and good but what was all that weird code we put into the file? Well, I’m glad you asked. The MarkLogic HTTP server provides us a way to dynamically create web pages much in the same way that you can with JSP or PHP pages. However, because MarkLogic supports XQuery and because we gave our file the .xqy extension, the server is expecting that this file will contain valid XQuery code. More specifically, the server is expecting that the file will contain a main module. A main module is simply some code that can be directly executed as an XQuery program. It must include, at a minimum, a query body consisting of an XQuery expression (which in turn can contain other XQuery expressions, and so on). Our main module contains an XQuery sequence expression whose first part is a call to xdmp:set-response-content-type. This function is used to set the response encoding. We used this call to set a response encoding of text/html so that the browser would know to interpret the results as HTML because most browsers do not intrinsically know what to do with content ending in .xqy.

However, as you will note from the documentation, the call to xdmp:set-response-content-type returns an empty sequence. Clearly, an empty sequence is not what we want in order to create a valid web page. In order to get our HTML returned to the browser we have to include it as part of the sequence that is returned. We did that by adding to the empty sequence returned by xdmp:set-response-content-type. The ‘,‘ that we placed after xdmp:set-response-content-type("text/html") indicated that what followed next was the next part of the sequence: the string DOCTYPE declaration followed by the HTML element that we wanted sent to the browser. I realize that all of this returning of sequences appended to sequences sounds a bit daunting at first but I assure you that with just a little practice it becomes second nature in no time at all. Additional information on sequences as return types from XQuery expressions can be found in the “Expressions return items” section in the XQuery and XSLT Reference Guide.

Dynamic Content

That was a good start but returning static HTML really isn’t very useful for actually building applications. In order to really do something useful and interesting we need to return dynamic content. Well, as I alluded to earlier, we have the ability to include script that will be evaluated dynamically much as you can with JSP or PHP pages. The main difference here is that instead of embedding Java or Python in our pages we’re going to embed XQuery to provide our dynamic functionality. Adding that functionality couldn’t be simpler. All we need to do is to take the XQuery code that we want evaluated and enclose it within {}. So, let’s try that out by adding a very simple XQuery expression to our default.xqy page.

xdmp:set-response-content-type("text/html"),
('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">',
<html xmlns="https://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>My first Mark Logic web page</title>
  </head>
  <body>
    <p>Hello World! This application is running on MarkLogic Server version {xdmp:version()}</p>
  </body>
</html>)

Now when we view this page in our browser we see the version of the MarkLogic server dynamically displayed as part of the HTML. This is due to the server evaluating the XQuery expression xdmp:version() that it encountered within the {} and returning the result as part of the HTML response. This ability to embed XQuery directly within our HTML will serve as the foundation for building up much more complex web applications. Let’s continue our exploration of this capability by creating a small application that actually leverages the Shakespeare content that we went through so much effort (well…at least a little effort) to load into the server. Rather than having you go through all of the effort of copying and pasting some code, you can download the simple application that I wrote so that we can discuss it in some more depth.

Let’s get modular with it

Go ahead and expand the zip file your just downloaded in the “src” directory, where you placed the default.xqy file that we worked with earlier. What I want to do now is to take a little bit of time to talk about how this very simple application is structured. However, before we do let me provide a disclaimer. There is no single correct way to structure your XQuery application. However, there are some good fundamental practices and concepts that will help you to create a good structure for your projects. What we will be looking at now are some of those fundamental practices and concepts. So, after unzipping the application you should notice the addition of two new files, search.xqy and results.xqy as well as a new directory named modules. Let’s ignore the modules directory for a moment and focus on those two new XQuery files. search.xqy is a very simple bit of code that creates a form allowing users to enter some text for the speaker they are searching for and then submits that form to the results.xqy page.

OK, clearly the results.xqy page is where a lot of the work must be happening. Let’s dive in and see what’s going on. A quick peek at this page shows that starting on line 10 we are looping through some sequence of SPEECH elements and displaying the LINE elements contained in each speech. Where did we get these search results from? Let’s look more closely at line 10. Here we’re calling some function called find-speech in the search-lib namespace. That sounds promising but what is that function and where did it come from? Well, if we look at the code on line 1 we see that we are importing a module in the search-lib namespace and that we expect to find the file containing that module at the relative path modules/search-lib.xqy. Hmmmmmm…that’s interesting. Do you remember how we talked about main modules earlier? Well, there is another type of module called a library module and that is what we are importing. Library modules, unlike main modules, are not directly executable by the server. Instead they house reusable bits of code, typically functions, that we can access from elsewhere in our application as we did here. Think of library modules like JAR files in Java or DLLs in .NET. It’s not exactly the same thing but the idea is close enough. So, according to that import statement we just looked at on line 1 we should be able to find this library module in a file called search-lib.xqy within the modules directory. Let’s pop that file open and see what we find!

xquery version "1.0-ml";
module namespace search-lib = "https://www.marklogic.com/tutorial2/search-lib";

declare function search-lib:find-speech($query-term as xs:string) as element(SPEECH)* {
  cts:search(//SPEECH, cts:element-value-query(xs:QName("SPEAKER"),$query-term, ("wildcarded", "case-insensitive")))	
};

The first interesting thing in this rather short and simple file appears on line 2 where we are declaring the namespace that is associated with this module. Note that this is the same namespace we used when we imported the module into our results.xqy file. After that little bit of module housekeeping is taken care of we jump right into declaring functions to be defined in this module. In this case there is only one function, search-lib:find-speech. This is the function that we called from our results.xqy page in order to find lines spoken by a particular speaker. As you can see, this function takes a single string as the search parameter and it returns a sequence of zero of more SPEECH elements. This query is accomplished in a single line of XQuery where we do a case-insensitive query (also allowing for wildcards) to find all SPEECH elements with a child element, SPEAKER, that matches our search term. While powerful, this query is simple enough that we are able to easily accomplish it in a single line of code. Why then did we go through all of the hassle to put this very simple query into a module in a completely separate file that we then had to import in order to use? Surely it wasn’t just a completely arbitrary example to demonstrate the use of modules, was it?

Of course, the answer is no. There is a much more important reason for why we separated out that search function and that has to do with the fundamental concepts of code modularity and reuse. Simply put, we are employing a technique to separate the implementation of our search (contained in the search-lib module) from the use of those results, which in this case is to display some very simple XHTML in our results.xqy page. This simple technique is going to allow us to reuse our search code from within other portions of application. Additionally, if we need to modify the way search works, perhaps by making the search case-sensitive, we have a single place to make that modification instead of having to track down everyplace we pasted the search code in order to maintain consistent search behavior. The concept behind this technique is probably not new to you if you have been programming for any period of time. The really important point here was to demonstrate how to implement that technique in XQuery.

Modules Database

So far, our modules are in a directory on the file system. That works fine for exploring, but for real applications, the common practice is to set up a modules database and deploy our code there. A modules database is just like our content database, except that we’ll put different kinds of files into it. When we’re running application in a MarkLogic cluster, using a modules database that is available to all servers in the cluster is a good way to ensure all servers are running the same code. This also simplifies configuration, as an HTTP server’s root directory can be “/”, rather than an absolute path on someone’s laptop.

Once you start working with modules databases, you’ll need to deploy your source code to that database. To make this an easy part of your process, take a look at community-built tools like the Roxy Deployer or ml-gradle (for Java environments).

Written Tutorial