MarkLogic: Beyond NoSQL

MarkLogic: Beyond NoSQL

by Eric Bloch

NoSQL Matters

Even though the term, NoSQL, has issues, it's become important. 

Recently, leaders from several NoSQL projects (Riak, HBase, CouchDB, Neo4j) came together for a session at Gluecon.  And while they came from divergent perspectives, they all basically agreed that the term had been very helpful to developers and architects in identifying their systems as new database and/or database-alternative technologies. 

There have been numerous NoSQL taxonomies, discussions about them, and calls to move beyond them.  And while it's clear to us, as well as our friends and customers, that MarkLogic Server sits among these technologies, we haven't yet fully described why NoSQL folks should pay attention.  To that end, this post is a first step at explaining why and how we're more than "yet another NoSQL system".  And I'll start with some context for NoSQL folks.

MarkLogic Server is a NoSQL Document Store

MarkLogic shares several key architectural components with well-known NoSQL Document Stores.  While this alone doesn't make us interesting to NoSQL folks, it should help facilitate some understanding of our architecture.  So here are a few high-level details that will sound like motherhood and apple pie to NoSQL folks. 

MarkLogic Server has a:

  • Document-oriented data model, no schemas required.
  • Simple key-value interface (we use URIs for keys).
  • Highly-configurable partitioning including support for consistent hashing.
  • Shared-nothing cluster architecture
  • Multi-value-consistency-control (MVCC)
  • REST/HTTP interface (actually full WebDAV support)
  • Use of memory-mapped files for performance gains

If you haven't already, you can read about how these choices facilitate a number of NoSQL solutions at places like the CouchDB book or the MongoDB site.  These choices get us the same benefits in MarkLogic Server - high-scale, high-performance, no bottlenecks, and simple programming interfaces. But to us, that's just what gets us invited to the party.  What's interesting is what separates us.

What Distinguishes MarkLogic Server from Other NoSQL Alternatives?

While MarkLogic is like these systems in someways, it is quite distinct in others.  At a high-level, MarkLogic is distinguished by

  • Focus on intra-cluster consistency and ACID transactions
  • Use of the XML data model
  • Fundamentally MarkLogic is designed and optimized to store, index, update, and search XML data.  MarkLogic can also store plain text, JSON, and binaries data as well.  But MarkLogic is optimized for XML.  And as much as JSON is simpler and ideal for interaction with web browsers, XML still reigns supreme when dealing with structured documents that contain lots of text.  

  • The MarkLogic Universal Index
  • Because MarkLogic is based on a real-time search-engine core, we index the structure (elements, attributes, hierarchy) of documents as well as the full text.   In MarkLogic, you can compose meaningful queries for semi-structured (or un-structured) data - the kind of data that is most prevalent in the real world.  And you can do this without having to bolt on separate processes or separate pieces of technology (e.g. Lucene, SOLR).  And because MarkLogic is also focused on ACID, every document insert or update results in a fully-transactional update of the index as well.  The query results are always based exactly on what's currently in the database.

So that's a quick summary of the highlights.  

And for those who don't know us well, I thought I'd include one more high-level distinction in terms of how we're organized as a business.  We have, what appears to many developers today, to be a crazy business model.  We have this notion that you can craft kick-ass software and ask people to pay you for it.  Yeah, we're proud of our stuff, we know it kicks ass, and we're not afraid to be upfront about it.

A Bit More Detail From Chris Biow and Wayne Feick

Ok, that all sounded pretty cool.  Can I get the details? 

Chris Biow, MarkLogic Federal CTO, recently gave a great talk at our 2010 MarkLogic User's Conference (#mluc10) on the MarkLogic Universal Index.  And while not originally intended as a multi-part talk, we've broken it up into a few pieces that we are releasing for those who want to learn a bit more.  We know we need to get more details out there, this is just a first step.

This first clip provides a great introduction to MarkLogic by examining it from each of the three potential starting places: a database management system (DBMS), a search engine, and an application server.  To see the video in full, click here or you can use the embedded player below.  I've got a few more posts planned here that launch into the meat of the talk, but if you're interested in a particular set of details, drop me a line or a comment below.

Shout-outs to Kelly Stirman and MarkLogic University for helping produce these clips. 

blogroll Blogroll

Comments

  • Beyond the brief points above in the post and rather than trying to stuff a lot of content into this comment, I'll do my best to write up a couple posts that describe (a) MarkLogic compared to other Document Databases (CouchDB, MongoDB) and (b) MarkLogic compared to Lucene/SOLR.
  • Sounds great - I'm watching the video you linked now! But I have to ask the obvious Q that was not addressed: How does the ML-based solution with everything you described above compare with MongoDB, CouchDb, Solr, etc.? Thanks.