Blog(RSS)

Node.js and Express.js sessions using MarkLogic 8

by Matt Pileggi

Overview

Most web applications benefit from the use of sessions. Apps built with Node.js are no different, and the Express server makes it very easy to apply middleware to help you manage your user's sessions. In this quick start guide we will be using connect-marklogic, which is an implementation of connect-session. Connect-session provides the plumbing for creating and managing sessions between browser coookies and the db, and connect-marklogic allows us to use MarkLogic 8 as the persistence layer. It's very simple!

Requirements

Before you get started with the steps below, you'll need to be familiar with and have already installed the following:

Express

In its own words, Express is a fast, unopinionated, minimalist web framework for Node.js. It is a very popular choice when building Node applications. I will take the minimalist approach here, as well, and show you just how easy it is to get started.

The first thing we will need to do is install express itself.

npm install express

Then we can create our server.js file.

// very basic express server with MarkLogic 8
var express = require('express');
var app = express();

app.get('/', function(request, response) {
  response.send('Hello');
});

var server = app.listen(3000, function () {
  var host = server.address().address;
  var port = server.address().port;
  console.log('Example app listening at http://%s:%s', host, port);
});

That's it for now. Save this file and run 'node server.js' ('npm start' will do the same thing if you have a package.json file). You'll notice the output telling you the server is running at 3000. If you visit http://localhost:3000 you should be greeted with a "Hello". This is the app.get('/') path that we've set up in our server. Unfortunately, it is the only path that is supported by our server and it will always serve the static message. Not exactly a capable application, so let's add some session magic!

Express-Session

We'll need to add another dependency in order to work with sessions.

npm install --save connect-session

Connect-session is the glue between Express, the browser, and the session store for managing a client's session. This glue is called "middleware" and it is a common pattern when dealing with Express. In order to use the connect-session middleware we must configure it and inform our Express server of how to use it. Add the following to your server.js file before the app.get('/') line from earlier.

var session = require('express-session');

app.use(session({ secret: 'enterprise nosql'}));

What we've done now is import the express-session module and then created a new default instance of it. App.use without a path tells Express to use the session middleware for ALL requests, regardless of path or method. This way the session data will be evaluated and updated on each request. The middleware also exposes the session directly on the request object so it's easy to use! Update server.js and replace the previous route with the following:

app.get('/', function (req, res) {  

  if(req.session.username) { 
    res.send('<p>Hello, ' + req.session.username+ '!</p><p><a href="/logout">logout</a></p>');  
  } else {
    res.send('<form action="/login"><label for="username">Username</label> <input type="text" name="username" id="username"> <p><input type="submit" value="Submit"></p></form>');  
  }

});

This will inspect the req.session object from the middleware for a property called 'username'. If username is in the session then we will greet the user by name, otherwise we will display a form for them to login. In order for that to work properly, we need to add two more routes. Add the following below the previous route in server.js:

app.get('/login', function(req, res) { 
  if (req.query.username) {  
     req.session.username = req.query.username;
  } 
  res.send('<p>You have been logged in, ' + req.session.username+'</p><p><a href="/">Home</a></p>');
});
app.get('/logout', function(req,res) {
  delete req.session.username;
  res.send('<p>Sorry to see you go!</p><p><a href="/">Home</a></p>');
});

We now have three paths: /, /login, and /logout. /login expects a query param named 'username', which will be set onto the session. Start up server.js and give it a shot. You should have a series of login and logout steps that are able to remember who you are!

MarkLogic

The application we've built so far makes easy work of managing a user's session. However, the default SessionStore implementation of the middleware is in-memory storage. This means that all of your session data will disappear when the server restarts. Our app is not exactly production-ready just yet! In order to fix this, we need to configure the express-session middleware so that it uses a more permanent session store. Enter MarkLogic.

npm install marklogic connect-marklogic

We will need the marklogic module to create our connection to a MarkLogic database, and the connect-marklogic module provides the session store implementation that express-session requires. Sound complicated? Don't worry, it's quite easy. Add the following lines to your server.js file right below the declaration of var session:

var marklogic = require('marklogic');
var MarkLogicStore = require('connect-marklogic')(session);
var mlClient = marklogic.createDatabaseClient({  host: 'localhost',  port: '8000',  user: 'admin',  password: 'admin',});

This imports both of our new modules. We initialize connect-marklogic with the session middleware, and create a new database client that we will use to persist the sessions. The only step left is to modify the configuration of our session middleware so that it can use our session store. Modify the app.use line so that it looks like this:

app.use(session({
  store: new MarkLogicStore({ client: mlClient }),
  secret: 'enterprise nosql'
}));

The only thing we needed to add was the 'store' option when instantiating our middleware. Express-session will now read and write all session data to our MarkLogic database! Start up your server.js and you can visit the same endpoints from earlier. However, you can now stop and restart the server and the session data will remain!

All Together Now

Here's the final version of server.js in case you want to compare notes.

// very basic express server with MarkLogic 8
var express = require('express');
var app = express();
var session = require('express-session');
var marklogic = require('marklogic');
var MarkLogicStore = require('connect-marklogic')(session);
var mlClient = marklogic.createDatabaseClient({  host: 'localhost',  port: '8000',  user: 'admin',  password: 'admin',});

app.use(session({  store: new MarkLogicStore({ client: mlClient }),    secret: 'enterprise nosql'}));

app.get('/', function (req, res) {  
  if(req.session.username) {
     res.send('<p>Hello, ' + req.session.username+ '!</p><p><a href="/logout">logout</a></p>');
    } else {
        res.send('<form action="/login"><label for="username">Username</label> <input type="text" name="username" id="username"> <p><input type="submit" value="Submit"></p></form>');
    }
});

app.get('/login', function(req, res) {
   if (req.query.username) {
       req.session.username = req.query.username;
  }
  res.send('<p>You have been logged in, ' + req.session.username+'</p><p><a href="/">Home</a></p>');
});

app.get('/logout', function(req,res) {
  delete req.session.username;
  res.send('<p>Sorry to see you go!</p><p><a href="/">Home</a></p>');
});

var server = app.listen(3000, function () {
  var host = server.address().address;
  var port = server.address().port;
  console.log('Example app listening at http://%s:%s', host, port);
});

More Options

There are more options available when creating your connect-marklogic instance. For more details visit the repository https://github.com/withjam/connect-marklogic.

Be sure to keep up with MarkLogic 8 and MarkLogic's other tools for Node.js development!

An Exploration into Android Gaming, Powered by MarkLogic

by Brent Perry

Last year before I joined MarkLogic, I took it on myself to start the development of an Android-based geo-location game that we’ll call “Contagion”. I had ideas for how it would work, the game design, and all kinds of areas to expand if it were successful. There are many technical challenges to overcome with this kind of project, though. In a matter of days a successful title on Google Play can go from a few users to a few hundred thousand users. Additionally, based on user feedback I would need to modify the software frequently to add features and possibly modify game design elements. All of this added up to some traditionally nasty software requirements: a technical stack with high scalability and flexibility.

Coming from a software background in enterprise java development, my instincts were to go for a classic 3-tiered architecture. The game’s business logic would live in a DropWizard (Jetty, Jersey, Jackson for HTTP, REST, and JSON handling respectively) layer running as a standoff service from MarkLogic. Writing pages of stored-procedures to hold the game’s business logic at the database level sounded like an anti-pattern to say the least. Java was a more appropriate language, I thought, than a niche functional script like XQuery. As the scope of the project grew, however, managing the hand-off between the MarkLogic platform, the service code, and the Android client code became more and more ungainly. State management issues plagued the system, introduced latency, and brought to light serious concerns about scalability.

Once I joined MarkLogic, I discovered many use cases with full enterprise applications running directly against the MarkLogic platform. In some cases tens or hundreds of thousands of concurrent users were hitting scalable MarkLogic clusters with great performance. Complex business logic was implemented directly on the platform further challenging my previously held 3-tier or bust philosophy. I do maintain there are many cases where decoupling the database platform and the business logic provides better separation of concerns and cohesion, but in the case of Contagion I decided to go with the simpler architecture given the limited resources.

I invited Adam Lewis to assist in the project and together we reviewed the existing code and possibilities. This new design would have the Android client contacting REST end-points supplied by the MarkLogic platform. We set off converting the existing Java business logic into XQuery and eventually JavaScript modules as MarkLogic 8 neared release and incrementally added support for JavaScript REST API extensions. These end-points would handle all the basic client interactions and access additional JavaScript modules (hooray for ML8) executing the business logic of the game. This architecture dramatically simplified interactions between the client and server and reduced latency to boot! As most gamers know, latency kills.

When I was using a middle-tier, I wrote dense mathematical Java code calculating great circle distance between actors in the game, calculating bearings and distances traveled. Using the geo-spatial indexes and tools available in MarkLogic, much of this code could now be shelved. Less code to maintain is a major risk reducer for the project so this was a big win.

The game’s progress continues at a measured pace as we both have full time jobs with MarkLogic and families, but it’s a fun side-activity and we are both learning the nuances of the technologies involved. “Contagion” has forced us to tackle many complex details not only at the architectural level, but also in the weeds of the MarkLogic platform itself. As the project scales up, we’ll be examining classic sizing issues like when documents are created, how frequently we are updating them, and what kind of query performance we are getting with hundreds (and hopefully tens of thousands) of concurrent requests.

In addition to using this project to learn more about our product firsthand, my hope is to further explore the ways to exploit MarkLogic to the benefit of mobile platform users. In future posts, I intend to delve into our usage of groovy/grails for package management, our extensions of the REST interface on MarkLogic 8, and how we’re exercising geospatial search.

A UDF for Ranged Buckets

by Dave Cassel

Last week I wrote a blog post about working with Ranged Buckets. To summarize the problem, we have data that look like this:

<doc>
  <lo>2</lo>
  <hi>9</hi>
  <id>1154</id>
</doc>

We want to build a facet with buckets like 0-4, 5-8, 9-12, 13-16, and 17-20. The "lo" and "hi" values in the sample document represent a range, so the document should be counted for the 0-4, 5-8, and 9-12 buckets, even though no value from 5-8 appears in the document. 

In my earlier post, I showed how to solve this problem using a normal custom constraint. Today, I took a crack at it with a more involved technique -- a User Defined Function. Also referred to as "Aggregate User Defined Functions", UDFs let MarkLogic application developers write C++ code to implement map/reduce jobs. For me, this took some effort as I haven't written much meaningful C++ since I came to MarkLogic about 5 years ago (the notable exception being the other UDF that I wrote). I got through it, though, and I found some interesting results. (Feel free to suggest improvements to the code.)

Implementation

I'll refer you to the documentation for the general background on UDFs, but essentially, you need to think about four functions.

start

The start function handles any arguments used to customize this run of the UDF. In my case, I needed to pass in the buckets that I wanted to use. I dynamically allocate an array of buckets that I'll use throughout the job. 

map

Two range indexes get passed in -- one for the "lo" element and one for the "hi" element. The map function gets called for each forest stand in the database, examining the values in the input range indexes. When two indexes are passed in, the map function sees the values as tuples. For instance, the values in the sample document above show up as the tuple (2, 9). Always check the frequency of that tuple, in case the same pair occurs in multiple documents. Once this function has been called for a stand, we know the counts for each bucket for the values in that particular stand. 

reduce

The reduce function combines the per-stand counts, aggregating them until a set of values for the entire database is known. My implementation just needed to add the counts for each bucket. 

finish

The last step is to organize the results in a way that they can be sent back to XQuery. The finish function builds a map, using "0-4" as the key for the first bucket and the count as the value. 

Encoding and Decoding

When working in a cluster, the encode and decode functions are important too. For my simple tests, I implemented them but used the UDF on a single MarkLogic instance, so these functions weren't called. 

Deploying

Building the UDF is pretty simple using the Makefile provided by MarkLogic. I customized the two places where the name needed to match my filename, but otherwise left it alone. 

After compiling, I uploaded the UDF to MarkLogic using Query Console. I exported the workspace and that's available on GitHub

You can call a UDF using the /v1/values endpoint, but I decided to wrap it in a custom constraint to provide a straightforward comparison with the custom constraint built in the previous post. After all, the goal is to provide a facet. A custom constraint requires some XML for the search options and some XQuery

The Results

I figured UDFs would be more interesting with multiple forests, as mapping a job to a single forest that has just one stand doesn't gain any parallelism. With that in mind, I bumped my database up to four forests, then to six, and compared my UDF implementation with the two-function approach I described in the previous approach. I tested with the same 100,000 documents used in the previous post. 

Median Seconds 4 forests 6 forests
UDF 0.002898 0.002858
two-function 0.003909 0.004261

The numbers are the median seconds returned in the facet-resolution-time part of the response to /v1/search?options=udf or /v1/search?options=startfinish. A couple of things jump out at me. First, the UDF out-performed the two-function XQuery custom facet. Second, the UDF had a very slight improvement while moving from four forests to six -- slight enough that let's call it even. The two-function approach, however, increased a noticable amount. 

Thoughts on UDFs

When should you reach for a UDF? When your data don't support directly getting your values, it might be worthwhile. For instance, with ranged buckets we can't simply do a facet on "lo" or "hi", because we wouldn't represent the values in between. Writing a UDF is more complicated and more dangerous than other approaches, but appears to have some performance benefits. 

There is usually an alternative. For instance, in this case I could have supplemented my data such that the sample document would have all values from two through nine inclusive, allowing me to use a standard facet. That leads to the tradeoff -- do I want to spend a little more time at ingest and take up a little more space, or do I want to dynamically compute the values I need? The answer to that question is certainly application specific, but UDFs provide a handy (and sharp!) tool to work with. 

 


[1] You'll find the MarkLogic Makefile for UDFs in /opt/MarkLogic/Samples/NativePlugins/ (Linux), ~/Library/MarkLogic/Samples/NativePlugins/ (Mac), or C:\Program Files\MarkLogic\Samples\NativePlugins\ (Windows). 

blogroll Blogroll