Why 3.1 Should Make You Smile

by Ian Small

Categories: Features; 3.1
Altitude: 10,000 feet

There's a whole lot about 3.1 that makes me smile. Whether you're a product manager, an architect/developer or a DBA, I'm guessing that our new release will put a smile on your face too. With this posting, I'll start to explain why. And over the course of the coming weeks, this column will explain more and more about why MarkLogic Server 3.1 can make your job easier and your applications more compelling.

Not only is 3.1 the fastest, most scalable product we've ever released, it's also chock-full of improvements and enhancements. Most of those are driven by your feedback - improvements based on real-world experience with real-world deployments. But as usual, we've gone beyond the call to add some new game-changing capabilities. This mix of performance enhancements, operational improvements, extensions to existing functionality, and completely new stuff makes it pretty likely that there's something in there that looks like it was built just for you.

If you'd like to know more about the new features in 3.1, start by taking a look at the feature overview on developer.marklogic.com. If you'd curious any why you might think MarkLogic Server 3.1 was built just for you, read on.

Product Managers

I spent part of my day as a product manager. That means figuring out what should go into the product - making sure you get what you need for your business. Some of that is pretty straight-forward, and some of it lets us take a walk on the wild side. I get excited our engineers take a narrowly focussed problem and turn it into something mind-expanding - a feature that solves the initial problem, but also opens up entire vistas of possibilities we hadn't originally contemplated. Seen from this viewpoint, there's lots in 3.1 to be excited about.

With 3.1, we've introduced the industry's first XML classification engine, a new dictionary construct we call XML lexicons, support for diacritics and more kinds of scoring and ranking. And, of course, we made it faster, faster, faster. Let's look at those one at a time.

Once you wrap your head around the idea of XML classification, you'll wonder why anyone ever did it any other way. You already know we're all about XML, so for us, XML classification was a natural idea. XML classification lets you classify at any level of granularity - from the individual element to the entire document. It also makes classifications more precise by taking account of content, structure and the combination of content and structure. When you stop to think about it, it's not rocket science to realize that "heart attack" in a title should be treated differently from "heart attack" in a footnote when you're comparing two documents. And yet, most classifiers can't tell the difference. Our XML classification engine delivers big improvements over the state of the art, and that has big ramifications for your content applications.

XML lexicons are a building block that your developers can use to enable a wide range of functionality. Need to do drill-downs on your query results? XML lexicons can help. Want to understand what words are in use in your database? XML lexicons to the rescue! Want to prompt your end-user to help them complete their query? XML lexicons again. "Yes, we can do that" is nice to hear when you're a product manager, and XML lexicons mean you should hear that more often from your developers.

Diacritics, scoring and ranking. These may not be the sexiest subjects in the universe, but in the real-world, they're all important. So 3.1 improves recall by supporting diacritic-insensitive search. It also gives you the choice of multiple relevance algorithms, so you can align relevance ranking with your application and content. Finally, everyone likes to know just how good a search result is, so our new confidence measure give you the metrics you need to choose the right graphical widget to place beside your five-star results.

And finally, the server is just plain faster, in lots and lots of ways, as you'll see below. Faster means you can scale better, supporting more queries per second. Faster can also mean make it possible to run more complex and powerful queries in the same amount of time as before. And both of those options probably make you smile.

Architects and Developers

All the toys that I outlined above will be a blast to put into production. But if you're an architect or developer, there are other parts of the 3.1 release calling you by name.

Start by checking out one of the most outside-the-box new features in MarkLogic Server 3.1. Point-in-time queries allow you to specify the time at which a specific query is to be evaluated - effectively letting you travel back into the past for the duration of that query. (And no, Cosmo, you can't travel into the future...) Most database systems that support similar capabilities do so for disaster-recovery purposes, because rolling back the journal so that you can "travel back in time" is such an expensive operation. Our implementation has pragmatically zero run-time overhead - making time travel pretty much free at query time. So go ahead, the sky's the limit on what you can do with this in in both production deployments and end-user applications, whether it's content promotion, time-based comparative queries, or archival and regulatory applications.

In 3.1, there's a new client-side connector technology for Java and .Net called XCC (XML Contentbase Connector). XCC is a direct result of what we've all learned from writing content applications using XDBC over the last three years. XCC delivers four big wins: cleaner abstractions and APIs, a pluggable infrastructure, improved resiliency, and higher performance. We've had rave reports from developers who have moved from XDBC to early versions of XCC, so we hope you feel the same way. Looking into the future, we're also using XCC as a vehicle for opening up a little more of the system. We're already seeing the benefits as the developer community is already putting the finishing touches on a C-language XCC implementation - with ports for perl, python and ruby.

And finally, I like it when the server runs faster, and I'll bet you do too. 3.1 has a whole bunch of performance improvements buried deep in its internals, the impact of which will be felt most with large contentbases and/or large result sets. Index resolution is faster. Sorting is faster. Traversing complex XML documents is faster. Trailing wildcard searches are faster (assuming you enable trailing wildcard indexes). And cached modules makes repeated evaluation of complex XQuery modules faster. But what we've done inside the server is only the first part of our performance work.

With MarkLogic Server 3.1, we've started opening up the internals of the server like never before. We've introduced registered cts:queries(), which can yield enormous speedups in scenarios involving complex user subscriptions or progressive query refinement. Unfiltered searches let you jump deep into result sets at very low cost. And extending cts:query constructors to support parameter sequences (for both QNames and text strings) greatly accelerate searches that involve repeated terms or elements.

The bottom line is that 3.1 gives you even more functionality for building breakthrough applications - along with a bunch of tools for making those applications run screamingly fast. I certainly hope that makes you smile.

DBAs and Operations Managers

I've reviewed some of the features that will enable powerful new applications. I've discussed a bunch of features that make it easier to make applications go fast. The last piece of good news is that we've also made it easier to deploy and operate MarkLogic Server 3.1 in enterprise-class mission-critical situations.

With 3.1, we've expanded the operating system and chip platforms that we support to include Sun Solaris 10 on both Sparc and x64 (eg. Opteron) architectures. We're also supporting Microsoft Windows 2003 Server 64-bit Edition on x64 architectures. Our motivation for expanding platform support is to provide operations groups with access to the most commonly deployed operating systems running on state-of-the-art chip architectures that offer market-leading price-performance. 3.1 delivers on two fronts. First, we now support the Opteron chip architecture (which the industry is starting to call x64) across the three major OS's on which we ship: Linux, Solaris and Windows. Second, the convergence of 64-bit Windows support from Microsoft and us makes the Windows operating system a first-class citizen for highly scalable high-performance deployments of MarkLogic Server.

In addition to platforms, we've taken steps to improve support for operational tasks in 3.1. As well as the administrative interface just being plain snappier to navigate through (a tip of the hat to cached modules is deserved here), we've added a few new capabilities. First of all, we've improved the breadth of control you have over merge policy, allowing you to manage maximum merge sizes, merge blackout periods and even to disable merges entirely. Now a quick warning: by giving you this new level of control, we've made it quite possible for you to get yourself into a lot of trouble. So remember, with great power (over merge) comes great responsibility (to use that power in a measured way). Second, we've made it possible to cancel both queries and merges mid-flight, so if something is bogging down your server you probably now have the power to do something about it without having to resort to bouncing the offending server.

With every major and minor release, we make strides in improving the manageability and operability of the server. Based on the conversations I've had with operations staff from across our customer base, I'm betting that these last two capabilities alone should put a big smile on the face of any DBA or operations professional who works with MarkLogic Server on a daily basis.


As you can tell, I'm pretty pumped about MarkLogic Server 3.1. Rocking new features along with focussed improvements for existing deployments is a pretty compelling combination. And while "something for everyone" is not how we plan product releases, I think it's fair to say that this release has goodies for everyone. And that's something to smile about.

There's a lot to say about MarkLogic Server, and a lot to say about our new release. So this article is just the first in a continuing series that I will be posting on developer.marklogic.com. Expect a new posting most weeks, with topics ranging from new features, clever ideas for application widgets and how they might be implemented, XQuery programming tricks, and the inside scoop on server optimization techniques. From time to time, we might even have a guest "columnist".

I've got the first ten or so postings charted out, so it's just a question of writing them. If you've got ideas for something you'd like to hear about, let me know.