Blog(RSS)

Keeping Reputation Consistent

by Kasey Alderete

In designing Samplestack, a sample MarkLogic application that provides search and updates across Question & Answer content, our team wanted to demonstrate how the database’s built-in capabilities enhance the application developer’s experience. One of the differentiating features of MarkLogic’s enterprise NoSQL database is having ACID transactions, and more specifically its support for multi-document, multi-statement transactions

It was a no-brainer that we would look for ways to meet requirements and keep the data consistent through the use of transactions where appropriate.

Once we defined the application requirements, we ended up with a scenario that required the database to successfully execute multi-statement transactions.

  • When an answer is selected as the ‘accepted’ answer, parallel updates are required for the content and the user(s):
    • Update the answer to indicate its status as ‘accepted’
    • Increase the reputation of the user with the ‘accepted’ answer
    • Decrease the reputation of the user with the previously ‘accepted’ answer (if applicable)

But how does this apply to YOUR application? What considerations did we take into account to determine this was the best course of action, and how is it implemented? The documentation gives a great overview of the mechanics of transactions in MarkLogic, but I’m going to provide a little more context. I'll walk through how we implemented the scenario above while ensuring that user reputation stayed consistent with the state of the Q&A data. 

Note: When discussing “we” during the implementation – I mean my talented engineering colleague Charles Greer who provided both the brains and the muscle behind the operation.

Document Data Model

Before diving into transactions, I need to explain how the data is modeled in Samplestack since that played a large role in determining where the reputation and related updates needed to occur upon answer acceptance.

When setting up our data model we thought about types of information we’d capture:

  • Questions
  • User votes
  • Answers
  • Votes on questions and answers
  • Comments
  • Answer acceptances
  • User name
  • User reputation
  • User location
  • Question metadata (dates, tags)

We also thought about the most common types of updates users would be making:

  • Asking questions
  • Answering questions
  • Voting
  • Accepting answers

And the range of queries (searches) we needed to support for end users:

  • Using keywords/full text search
  • By user
  • By tag
  • Whether questions were resolved (had accepted answers)
  • By date

We wanted to denormalize the data where sensible to enhance searchability, but to keep frequent updates scalable and bounded.

Much of the data could be logically grouped into either “Question & Answer” (QnA) content tracking the thread of a conversation and associated metadata (tags, votes on content) or “User” data with specifics on the user’s activity and profile. Users participate in QnA threads, so the user name appeared in both groupings. Including it in the QnA document provided a way of searching for their content updates. User records allowed us to keep fields that might be more frequently changed (user location, user votes) in a separate document so we wouldn’t have to update every QnA thread where the user participated in the case of a vote or a physical move.

One key decision was to leave user reputation out of the QnA document. Reputation could change constantly (when users had their answers accepted and their content voted on), meaning every document containing a user’s reputation would have to be touched during an update. This could translate into thousands of documents for an active user participating in many QnA threads. We did not have an explicit requirement to search or sort documents by reputation, so we chose to normalize reputation and keep it in the user record only. We still wanted to show reputation alongside user names, but we accomplished that with a transform that joins search results with user reputations. Joining user reputation with QnA documents to display one page of search results cost less than performing a join for sort or search across all results.

Here’s a look at where we landed with our 2 record types modeled as JSON documents:

User Record

Key fields used for the “Contributor” role in the application (simplified for this walk-through)

Question and Answer document

Basic structure of a QnA thread (simplified for this walk-through)

This meant that for our anticipated user updates, there were never more than 3 or 4 documents requiring simultaneous database updates. We chose this limit as it made sense based on our project requirements. The key outcome was that it was a known, constrained set of document updates as a basis for future scale and performance.

Considering Transactions

Given our data model, we knew the updates required as a part of accepting an answer would span multiple documents. But what if there was a system failure? Or another user searched the database while an update was in progress? Without transactions there would exist the potential for a user reputation to be inconsistent with the QnA document denoting the accepted answer.

Q: How do I solve this problem? –Mary
A: Look it up in the documentation. -Joe (-> √ Accepted!)
Joe User Record
Reputation: 0
?!

We wanted to be production-ready for an enterprise environment and knew that having eventually consistent data would not be good enough. If a failure or another query happened mid-update, we did not want to present an ‘unstable’ state where an answer had been accepted but no one received credit. We’d like to either roll back all updates or complete them all at once.

In the User Interface, when the Question Asker selects ‘accept’…

Q: How do I solve this problem? -Mary
A1: Look it up in the documentation. –Joe

Upon click, simultaneous updates to both the QnA and User documents must be made:

QnA Document
“accepted”: true
“acceptedAnswerId”: A1
JoeUser
Reputation: 1

We concluded database transactions allowed us to avoid the risks of system failure or mid-update access by another application to the same dataset. With MarkLogic, we could update multiple documents in a single transaction – keeping the reputation consistent with the QnA data.

The most common example illustrating the need for transactions are debits and credits. As Samplestack demonstrates, data integrity is not only relevant for financial applications. Situations which demand that data meet all validation rules at any given point in time require consistency. Also keep in mind when designing your data model, that normalized data does not become inconsistent. For denormalized data you may need transactions to keep redundant or related data synchronized.

Implementing Multi-Statement Transactions

Samplestack is a three-tiered application based on the Reference Architecture. The Java version of the application primarily uses the Java Client API for managing interactions between the application middle tier and the database, including in the case of updating reputation using multi-statement transactions.

Let’s walk through a selection of the application code to highlight the key components to successfully executing a transaction upon answer acceptance. Keep in mind the following code is specific to the Samplestack application and includes references to private functions defined elsewhere in the codebase (not necessarily cut-and-paste for your application).

1. Open a transaction

2. Perform the required updates

This application uses DocumentPatchBuilder to make the document changes.

3. Either rollback or commit

One tricky part is to make sure and account for error scenarios and to include the rollback. Remember too that because this is a multi-statement transaction, updates will not be available to others until you commit. The updates will, however, be available to you in real-time, for search for example during the transaction. Part of the benefit of performing the update via MarkLogic, is that search and other indexes are updated real-time during a transaction. You’ve made the latest information available while keeping reputation consistent.

Armed with this overview of the design and implementation considerations for multi-document, multi-statement transactions, you should be well on your way maintaining data consistency in your own applications!

Additional Resources

By the way, MLCP just got better

by Dave Cassel

MarkLogic 8 came out recently and it has an amazing set of new features. With all the big new things, MarkLogic Content Pump (MLCP) got some improvements that you might have missed.

Alternate Database

With MLCP 1.3-1, you can now specify the database that you want to interact with. In the past, if you didn't already have an XDBC app server pointing to your target database, you'd have to set one up so that MLCP could run import, export, or copy operations there. Not any more -- now with the "-database" option, you can specify which database you want to work with.

Even better, with the Enhanced HTTP Server on port 8000, you don't even need to set up an XDBC app server. Out of the box, I can do this:

$ mlcp.sh export -host localhost -port 8000 -username admin -password admin -database Documents -output_file_path docs
$ mlcp.sh export -host localhost -port 8000 -username admin -password admin -database Modules -output_file_path modules

More Data Types

The updated MLCP also helps you with new data types.

MLCP just got that much more helpful!

Node.js and Express.js sessions using MarkLogic 8

by Matt Pileggi

Overview

Most web applications benefit from the use of sessions. Apps built with Node.js are no different, and the Express server makes it very easy to apply middleware to help you manage your user's sessions. In this quick start guide we will be using connect-marklogic, which is an implementation of connect-session. Connect-session provides the plumbing for creating and managing sessions between browser coookies and the db, and connect-marklogic allows us to use MarkLogic 8 as the persistence layer. It's very simple!

Requirements

Before you get started with the steps below, you'll need to be familiar with and have already installed the following:

Express

In its own words, Express is a fast, unopinionated, minimalist web framework for Node.js. It is a very popular choice when building Node applications. I will take the minimalist approach here, as well, and show you just how easy it is to get started.

The first thing we will need to do is install express itself.

npm install express

Then we can create our server.js file.

// very basic express server with MarkLogic 8
var express = require('express');
var app = express();

app.get('/', function(request, response) {
  response.send('Hello');
});

var server = app.listen(3000, function () {
  var host = server.address().address;
  var port = server.address().port;
  console.log('Example app listening at http://%s:%s', host, port);
});

That's it for now. Save this file and run 'node server.js' ('npm start' will do the same thing if you have a package.json file). You'll notice the output telling you the server is running at 3000. If you visit http://localhost:3000 you should be greeted with a "Hello". This is the app.get('/') path that we've set up in our server. Unfortunately, it is the only path that is supported by our server and it will always serve the static message. Not exactly a capable application, so let's add some session magic!

Express-Session

We'll need to add another dependency in order to work with sessions.

npm install --save connect-session

Connect-session is the glue between Express, the browser, and the session store for managing a client's session. This glue is called "middleware" and it is a common pattern when dealing with Express. In order to use the connect-session middleware we must configure it and inform our Express server of how to use it. Add the following to your server.js file before the app.get('/') line from earlier.

var session = require('express-session');

app.use(session({ secret: 'enterprise nosql'}));

What we've done now is import the express-session module and then created a new default instance of it. App.use without a path tells Express to use the session middleware for ALL requests, regardless of path or method. This way the session data will be evaluated and updated on each request. The middleware also exposes the session directly on the request object so it's easy to use! Update server.js and replace the previous route with the following:

app.get('/', function (req, res) {  

  if(req.session.username) { 
    res.send('<p>Hello, ' + req.session.username+ '!</p><p><a href="/logout">logout</a></p>');  
  } else {
    res.send('<form action="/login"><label for="username">Username</label> <input type="text" name="username" id="username"> <p><input type="submit" value="Submit"></p></form>');  
  }

});

This will inspect the req.session object from the middleware for a property called 'username'. If username is in the session then we will greet the user by name, otherwise we will display a form for them to login. In order for that to work properly, we need to add two more routes. Add the following below the previous route in server.js:

app.get('/login', function(req, res) { 
  if (req.query.username) {  
     req.session.username = req.query.username;
  } 
  res.send('<p>You have been logged in, ' + req.session.username+'</p><p><a href="/">Home</a></p>');
});
app.get('/logout', function(req,res) {
  delete req.session.username;
  res.send('<p>Sorry to see you go!</p><p><a href="/">Home</a></p>');
});

We now have three paths: /, /login, and /logout. /login expects a query param named 'username', which will be set onto the session. Start up server.js and give it a shot. You should have a series of login and logout steps that are able to remember who you are!

MarkLogic

The application we've built so far makes easy work of managing a user's session. However, the default SessionStore implementation of the middleware is in-memory storage. This means that all of your session data will disappear when the server restarts. Our app is not exactly production-ready just yet! In order to fix this, we need to configure the express-session middleware so that it uses a more permanent session store. Enter MarkLogic.

npm install marklogic connect-marklogic

We will need the marklogic module to create our connection to a MarkLogic database, and the connect-marklogic module provides the session store implementation that express-session requires. Sound complicated? Don't worry, it's quite easy. Add the following lines to your server.js file right below the declaration of var session:

var marklogic = require('marklogic');
var MarkLogicStore = require('connect-marklogic')(session);
var mlClient = marklogic.createDatabaseClient({  host: 'localhost',  port: '8000',  user: 'admin',  password: 'admin',});

This imports both of our new modules. We initialize connect-marklogic with the session middleware, and create a new database client that we will use to persist the sessions. The only step left is to modify the configuration of our session middleware so that it can use our session store. Modify the app.use line so that it looks like this:

app.use(session({
  store: new MarkLogicStore({ client: mlClient }),
  secret: 'enterprise nosql'
}));

The only thing we needed to add was the 'store' option when instantiating our middleware. Express-session will now read and write all session data to our MarkLogic database! Start up your server.js and you can visit the same endpoints from earlier. However, you can now stop and restart the server and the session data will remain!

All Together Now

Here's the final version of server.js in case you want to compare notes.

// very basic express server with MarkLogic 8
var express = require('express');
var app = express();
var session = require('express-session');
var marklogic = require('marklogic');
var MarkLogicStore = require('connect-marklogic')(session);
var mlClient = marklogic.createDatabaseClient({  host: 'localhost',  port: '8000',  user: 'admin',  password: 'admin',});

app.use(session({  store: new MarkLogicStore({ client: mlClient }),    secret: 'enterprise nosql'}));

app.get('/', function (req, res) {  
  if(req.session.username) {
     res.send('<p>Hello, ' + req.session.username+ '!</p><p><a href="/logout">logout</a></p>');
    } else {
        res.send('<form action="/login"><label for="username">Username</label> <input type="text" name="username" id="username"> <p><input type="submit" value="Submit"></p></form>');
    }
});

app.get('/login', function(req, res) {
   if (req.query.username) {
       req.session.username = req.query.username;
  }
  res.send('<p>You have been logged in, ' + req.session.username+'</p><p><a href="/">Home</a></p>');
});

app.get('/logout', function(req,res) {
  delete req.session.username;
  res.send('<p>Sorry to see you go!</p><p><a href="/">Home</a></p>');
});

var server = app.listen(3000, function () {
  var host = server.address().address;
  var port = server.address().port;
  console.log('Example app listening at http://%s:%s', host, port);
});

More Options

There are more options available when creating your connect-marklogic instance. For more details visit the repository https://github.com/withjam/connect-marklogic.

Be sure to keep up with MarkLogic 8 and MarkLogic's other tools for Node.js development!

blogroll Blogroll