How does MarkLogic fit into your world?
If you are new to MarkLogic, you may be wondering how an Enterprise NoSQL database fits into your development stack and processes.
Even though MarkLogic is not a relational database, the core architectural features of a MarkLogic-based application will seem familiar to anyone with an RDBMS background.
Many RDBMS-based architectures, at least initially, look something like the one below on the left:
At the center, there is an application server, where logic is written in Java, C#, Python, PHP, Ruby, or some other language. The application server provides an HTTP stack and a mechanism for developers to map incoming HTTP requests to their code. There are typically additional frameworks, libraries, or tools included in this layer that help with tasks like user management, user interface, and a variety of other commonly needed and commoditized functions.
Below these layers are often (but not always) a set of data-access objects (DAOs) and often an object-relational mapper (ORM) that serve as interfaces to a low-level database driver. This driver provides access, typically via a database protocol like JDBC or ODBC, to a persistent relational database management system (RDBMS). Sometimes these layers are integrally tied to layers above (like in Ruby on Rails).
In some instances, there are caching (or other processing) layers between the application server and the clients. And sometimes, such caching layers may exist between the application logic and the lower layers of data access.And then there is search. Search functionality is typically added on in the application server, with a library like SOLR or Lucene. This library manages on-disk indexes. And the search functionality is typically kept somewhat separate from DAOs.
MarkLogic-based applications often have a similar architecture, like the on the right above.
Because there are no tables, there is no ORM. The mapping between native programming language objects and documents (be they XML or JSON) is often quite straightforward. And because MarkLogic provides search, there is no need to "bolt-on" another indexing mechanism.
MarkLogic Architecture Basics
MarkLogic runs as a single process on each host in a cluster.*
On each host, you will find the same MarkLogic stack running. Starting at the bottom, there is the Data Layer. On top of it, the Evaluation Layer. And, at the top, a set of programming Interfaces. Again, all of this is housed inside that same operating-system level process.
The REST API is an optimized set of end-points that provide access to MarkLogic database, search, and analytic functionality via well-known REST idioms. If you are new to MarkLogic, you may find this layer the easiest to get started with.
If you are a Java expert, you may prefer to work with the MarkLogic Java Client that sits on top of the REST API.
And when you need additional functionality (custom database joins, custom analytics, and the like), you can extend the REST (or Java) API. The extension mechanism provides several hooks that give you access to the underlying XQuery layer.
As all diagrams are, this one is a simplification. Each of the boxes below is a rich functional area. You can get more detail by hovering over each box.
CRUD, Search, Hadoop, Transforms, ...
CRUD, Search, Transforms, ...
XSLT | XPath | XQuery
Multiversion Concurrency Controller
Value | Structure | Text | Scalar | Metadata | Security | Geospatial | Reverse
XML | JSON | Binary | Text
- * MarkLogic actually starts one other small process whose only purpose is to serve as a watchdog over the main MarkLogic process. The watchdog has no significant effect on system resources.
Common Technical Use Cases
As it turns out, most of the world's data, some say as much as 80% to 90% of it, is not well-structured. The practice of storing all your data inside the rows and columns of an RDBMS is no longer gospel. MarkLogic is suited to a number of the NoSQL use cases, but because it is an Enterprise NoSQL database, it covers a few additional ones. Below is a subset of common use cases:
Key Value store
MarkLogic can be used as an opaque Key Value store, where all access is done via document keys (URIs in MarkLogic). MarkLogic scales broadly on commodity hardware and its lock-free reads provide high performance to up to petabytes of storage.
Plain and simple, MarkLogic is a Document-oriented database. Its schema-less data model enables it to support JSON and XML, as well as plain-text and binary document formats. You can write queries based not only on the key (URI) but any of the documents structural features (JSON slots, XML elements/attributes, and so on).
But, because it is an Enterprise NoSQL database, MarkLogic provides some features typically missing from NoSQL document databases: fulltext-search queries, ACID transactional updates, and rich role-based security.
MarkLogic has ACID transactions and it's completely suited to traditional transaction processing use cases. MarkLogic is used at major banks for storing operational data and is also used to store online virtual currency in some very popular mobile games. Because MarkLogic is an Enterprise NoSQL database, you can trust it to handle transactional workloads.
At its core, MarkLogic uses search-style indexing. If you are looking to solve a search problem, MarkLogic has all the bells and whistles you'd expect from a search engine, including word and phrase search, boolean search, proximity, wildcarding, stemming, tokenization, decompounding, case-sensitivity options, punctuation-sensitivity options, diacritic-sensitivity options, document quality settings, numerous relevance algorithms, individual term weighting, topic clustering, faceted navigation, custom-indexed fields, support for hundreds of languages, and more. You get all that, plus the added benefits of a real-time core. Updates to your documents do not require a full re-index. Indexes are updated in the same transaction, when the document is updated (or inserted/removed).
Show me more
If you'd like to know more about MarkLogic architecture, check out: