(with apologies to Cookie Monster)
Recently I was involved in a customer inquiry about transaction isolation. The customer had observed an unexpected interaction between two concurrent transactions and wanted an explanation of what they were seeing. This customer is fairly sophisticated, with many folks familiar with isolation principles and even a PhD in distributed computing, which made me think that a lot of people probably aren’t very familiar with the nuances of transaction isolation, especially as it relates to MarkLogic.
The TL/DR is that there are multiple levels of isolation (the I in ACID). Like most other DBMSes (including Oracle), MarkLogic by default provides a level of isolation for update transactions that ANSI refers to as REPEATABLE READ isolation (actually we’re a bit better than that as you’ll see if you read on). With this level of isolation, transactions can experience a read phenomenon called phantom reads. Most types of phantom reads are not problematic for most types of transactions, however developers writing applications for MarkLogic should understand what a phantom read is and how to avoid them if necessary. A meta-takeaway is that people looking to write their own transaction controllers in application code to work around other technologies’ lack of ACID transactions are signing up for some pretty sophisticated work. Read on for (much) more detail.
Understanding Isolation in MarkLogic
MarkLogic’s ability to perform ACID transactions is a critical capability for many of our customers, and an important differentiator for us in the NoSQL space. We talk about ACID a lot, and isolation is an important but not broadly-understood part of the ACID guarantee. I wanted to take a few minutes to explain what isolation means, what level of isolation you can expect from MarkLogic, how that compares with other ACID database management systems, and what the implications are for developers writing applications using MarkLogic. I’ll take a side-trip into how we provide the level of isolation that we provide, and the resulting implications on performance and concurrency.
What Is Isolation?
Isolation is the ACID property that, in a perfect world, allows developers to write code as if their code were the only code executing against the database. With perfect isolation, developers don’t have to worry about what concurrently-running transactions will do to the database. This makes coding their applications much easier. If the DBMS they’re using does not provide any level of isolation, developers have to constantly worry if data they’re working on is changing out from under them as some concurrent transaction executes. In some cases, this can make it impossible to write reliable code that keeps your database consistent.
Imagine for example that you’re writing code that processes a payment. You want to first check whether there’s enough money in the account, then you want to process the payment, and then deduct that amount from the account balance. If the account balance can change between the time you check it and the time you process the payment, then you may end up processing a payment and then not having the funds to cover it. This is a great way to steal money from bitcoin exchanges that use non-ACID DBMSes.
Perfectly-isolated transactions would be serializable, meaning running them concurrently would in all cases have the same effect as running them serially. In reality, there is a high performance cost to providing this level of isolation, so most DBMSes (including both MarkLogic and Oracle) do not provide this level of isolation by default.
What? Level of isolation? Isn’t something either isolated or not isolated?
Actually, there are many possible levels of isolation, so it’s important for application developers to know what they can expect from their DBMS so that they don’t accidentally make assumptions that will result in buggy application code. There are many ways to categorize isolation levels as well. Here I’ll use the isolation levels defined by ANSI SQL. It’s not the best taxonomy, but it will serve, and it’s one that people familiar with the RDBMS world may be familiar with.
ANSI SQL defines four isolation levels, defined based on read phenomena that are possible at each level. A read phenomenon happens when information leaks from concurrently-running transactions. In other words, it’s when a transaction can read something that, in a perfectly-isolated environment, it would not be able to read. As you’ll see in a minute, not all read phenomena result in errors, but all of them have the potential to result in errors, some more egregious than others. Also, some read phenomena are impossible to protect against in your application, which is why a DBMS that provides poor isolation is a huge problem if you’re an enterprise (or if you’re running a bitcoin exchange).
ANSI has a definition for each phenomenon, but these definitions are open to interpretation. Generally, ANSI defines the read phenomena based on whether they produce an error or an anomaly. Here I use the broader interpretation, which defines them based on whether they have the potential to produce an error or anomaly, regardless of whether they actually do or not. This results in the strictest definitions for each isolation level. In the definitions below I use a notation that specifies an operation (w, r, c, or a for write, read, commit or abort, respectively), the transaction performing the operation, and the item or domain on which the operation is performed. Example: w1[x] means transaction1 writes to item x.
Now the phenomena:
A dirty read happens when a transaction T2 reads an item that is being written by concurrently running transaction T1. In other words: w1[x]…r2[x]…((c1 or a1) and (c2 or a2) in any order). This phenomenon could lead to an anomaly in the case where T1 later aborts, and T2 has then read a value that never existed in the database. Note that there is such thing as a dirty write as well: w1[x]…w2[x]…((c1 or a1) and (c2 or a2) in any order). ANSI doesn’t talk about this, but it’s obviously important because dirty writes can make rollback impossible.
A non-repeatable read happens when a transaction T2 writes an item that was read by a transaction T1 prior to T1 completing. In other words: r1[x]…w2[x]…((c1 or a1) and (c2 or a2) in any order). Non-repeatable reads don’t produce the same anomalies as dirty reads, but can produce errors in cases where T1 relies on the value of x not changing between statements in a multi-statement transaction (e.g. reading and then updating a bitcoin account balance).
A phantom read happens when a transaction T1 retrieves a set of data items matching some search condition and concurrently running transaction T2 makes a change that modifies the set of items that match that condition. In other words: (r1[P] and w2[x in P] in any order)…((c1 or a1) and (c2 or a2) in any order), where P is a set of results. Phantom reads are usually less serious than dirty or non-repeatable reads because it generally doesn’t matter if item x in P is written before or after T1 finishes unless T1 is itself explicitly reading x. And in this case the phenomenon would no longer be a phantom read, but would instead be a dirty or non-repeatable read per the definitions above. That said, there are some cases where phantom reads are problematic, in particular where T1 writes [x in P] and later aborts (a dirty phantom read?). As you’ll see below, we prevent these.
The isolation levels ANSI defines are based on which of these three phenomena are possible at that isolation level. They are:
- READ UNCOMMITTED – All three phenomena are possible at this isolation level. This is basically no isolation at all.
- READ COMMITTED – Dirty reads are not possible, but non-repeatable and phantom reads are.
- REPEATABLE READ – Dirty and non-repeatable reads are not possible, but phantom reads are.
- SERIALIZABLE – None of the three phenomena are possible at this isolation level.
For read-only transactions, MarkLogic provides fully serializable isolation. For update transactions, MarkLogic is somewhere between ANSI REPEATABLE READ and ANSI SERIALIZABLE in that we prevent dirty and non-repeatable reads, dirty writes, and some but not all types of phantom reads.
I don’t know of any DBMS that provides fully-serializable update transactions as a default. The cost in terms of transaction throughput so far exceeds the benefit that it just doesn’t make sense. There are DBMSes (like Oracle, for e.g.) that provide settings that can increase the isolation level at the expense of throughput. MarkLogic customers can manage the level of isolation on a granular basis to choose the balance between concurrency and isolation that makes sense for their applications. Read on to learn how.
Although there are other ways of providing isolation (e.g. Optimistic Concurrency), many DBMSes (including MarkLogic) do it by coordinating locks on records in the database (called item locks) across transactions. Locking helps ensure a level of isolation, but it does this at the expense of concurrency, since a transaction that requires a lock on an item must wait until that lock is available. Locks are either shared locks (which can be held by more than one transaction) or exclusive locks (which can be held by only one transaction at a time). In most DBMSes (including MarkLogic), locks taken when reading an item are shared and locks taken when writing an item are exclusive.
To avoid what’s called a wormhole transaction, no lock should be taken in a transaction after any lock is released within that transaction. A system that observes this rule is said to use two-phase locking (not to be confused with two-phase commit), because there is a phase in the transaction when locks are taken, and a second phase when they are released. MarkLogic releases locks only after a commit or rollback is completed, so we conform to this rule.
Preventing Dirty and Non-repeatable Reads
So MarkLogic prevents dirty and non-repeatable reads (and dirty writes) by taking item locks on items that are being read or written during a transaction and releasing those locks only on completion of the transaction (post-commit or post-abort). When a transaction needs a shared lock on an item on which another transaction has an exclusive lock, or when a transaction needs an exclusive lock on an item on which another transaction has a shared or exclusive lock, that transaction waits until the conflicting lock is released. Deadlock detection helps prevent cases where two transactions are waiting on each other for exclusive locks. If MarkLogic detects a deadlock condition, it will attempt to resolve it by aborting and retrying one of the transactions.
Preventing Phantom Reads
In addition, MarkLogic prevents some types of phantom reads by taking item locks on the set of items in a search result (this is why Support will tell you that searching for millions of documents in an update transaction is a bad idea). This prevents phantom reads involving T2 removing an item in a set that T1 previously searched, or T2 finding an item in a set that T1 wrote but hasn’t yet committed. It does not prevent phantom reads involving T2 inserting an item in a set that T1 previously searched, or those involving T2 searching for items and not seeing an item because it was deleted by concurrently-running transaction T1.
To avoid all phantom reads via locking, it is necessary to take locks not just on items that currently match the search criteria, but also on all items that could match the search criteria, whether they currently exist in the database or not. Such locks are called predicate locks. Because you can search for pretty-much anything in MarkLogic, guaranteeing a predicate lock for arbitrary searches would require locking the entire database. From a concurrency and throughput perspective, this is obviously not desirable. MarkLogic therefore leaves the decision to take predicate locks and the scope of those locks in the hands of application developers. Because the predicate domain can frequently be narrowed down with some application-specific knowledge, this provides the best balance between isolation and concurrency. To take a predicate lock, you lock a synthetic URI representing the predicate domain in every transaction that reads from or writes to that domain. You can take shared locks on a synthetic URI via fn:doc(URI). Exclusive locks are taken via xdmp:lock-for-update(URI).
Note that predicate locks should only be taken in situations where the types of phantom reads we don’t already prevent are intolerable. Generally this is rare. If your application can get by with the level of isolation we provide out of the box, then you should generally not take predicate locks, because any additional locking results in additional serialization and will impact performance.
It may sound strange to talk about isolation for read-only transactions, but read-only transactions run concurrently with update transactions, so the question of how isolated they are from these concurrent updates is valid. Because of MVCC, in MarkLogic all read-only transactions are fully isolated from all concurrently running update transactions. Not only that, but all read-only transactions run completely lock-free which contributes to MarkLogic’s high query performance. We do this by running read-only queries at a specific timestamp that is guaranteed to be lower than the timestamp of any currently-running update. There are three options for what timestamp we run them at:
- Contemporaneous – if you choose this option (via admin:appserver-set-multi-version-concurrency-control), MarkLogic will execute your read-only transaction at the latest timestamp for which any transaction has committed. If other updates are in flight at that timestamp, MarkLogic will hold your transaction until they commit as well to ensure that your query is isolated from them. The intent of this setting is to get the most up-to-date result possible. If you need to be able to query for something right after committing it and guarantee that you’ll see it, use this setting. (Note that if you need to do this, you may also want to consider a ‘strict’ value for admin:appserver-set-distribute-timestamps). Contemporaneous is the default value for new application servers.
- Nonblocking – if you choose this option (also via admin:appserver-set-multi-version-concurrency-control), MarkLogic will execute your read-only query at the latest timestamp for which it knows there are no in-flight transactions. This will be faster since your transaction doesn’t have to wait for any update transactions to settle, but may not be the absolute latest timestamp. So if you run a non-blocking read-only transaction right after running an update transaction, your read-only transaction will have a consistent view of the database, but may not see the most recent update because it may be running at an earlier timestamp.
- Specific timestamp – you can also choose to execute a point-in-time query at a specific timestamp. If your app server is set to contemporaneous and you want to run a specific query at a nonblocking timestamp, you can use this in conjunction with xdmp:database-nonblocking-timestamp or xdmp:database-global-nonblocking-timestamp.
So to summarize, MarkLogic automatically provides a level above ANSI REPEATABLE READ isolation for update transactions and true serializable isolation for read-only (query) transactions. MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.
For more on this, please refer to our excellent documentation, in particular Understanding Transactions in MarkLogic Server.