Consistency is Consistently Undervalued

Micro-services and NoSQL have been trendy over the last few years due mainly to their success at large companies like Google. I’m not going to tell you not to use them, but I am going to try to explain one of the things you should understand well before you do.

One of the most important things you lose in both cases are transactions. Transactions have been around so long and are so mundane that I think many developers have forgotten why they are so useful. They can allow you to maintain invariants on your data. For example, you can say if A is present, so is B and C. I hope we can first agree that an application that can safely assume the presence A, B and C together will be simpler and less likely to have bugs than an application that has to handle any combination of A, B and C.

User Profiles

Here’s a more concrete example of a constraint: A user must have a profile. With transactions you can create them both together like in this pseudo-python:

# Either a user and profile are created together, or nothing is created
with atomic_transaction():
    user = User.create(...)
    Profile.create(user, ...)

No problem!

It’s not uncommon to have a separate service for user login - let’s say you make the decision to split users and profiles in to two services:

user = user_service.create(...)
profile_service.create(user, ...)

This is obviously not atomic. If the profile service fails, you will have a user with no profile. If you don’t handle this possibility they’re going to log in to your web site and probably get a 500 error. Can we fix this?

user = user_service.create(...)
try:
    profile_service.create(user, ...)
except:
    # Roll back user creation.
    user_service.delete(user)
    raise

Seems reasonable, but there are at least two failure cases that can lead to inconsistency:

user = user_service.create(...)

# <--- Failure at this point means a user with no profile

try:
    profile_service.create(user, ...)
except:
    # Roll back user creation. If the user_service is no longer
    # available here we have a user with no profile.
    user_service.delete(user)
    raise

First, if your program completely fails after the user_service call (somebody pulled the power cord), your data will be in an inconsistent state. Secondly, if your user_service fails on the delete, your data will also be in an inconsistent state.

Can we just create a new profile when we try to access it and it’s not there? No, because even if your services are 100% reliable this is still not atomic. In a concurrent system other threads and processes can see the system in an inconsistent state:

user = user_service.create(...)

# <--- A process or thread executing at this point will
#      see the user but not the profile

try:
    profile_service.create(user, ...)
except:
    user_service.delete(user)
    raise

Distributed transactions are hard!

Bank Accounts

User accounts and profiles might not seem like such a big deal, but consider a situation like using a NoSQL store that doesn’t support transactions for financial data:

def transfer(bank_account1, bank_account2, amount):
   bank_account1.deposit(amount)
   # What if another thread or process deducts from 'bank_account1' here?
   # What if it's left with no more money?
   # What if your data store or your application crashes here?
   bank_account2.withdraw(amount)

Or the other way around:

def transfer(bank_account1, bank_account2, amount):
   bank_account2.withdraw(amount)
   # What is the total amount of money in the system here?
   # What if your data store or your application crashes here?
   bank_account1.deposit(amount)

With an atomic transfer the amount of money in the system (the sum of the bank accounts) will always stay the same. You see a consistent snapshot. With a non-atomic transfer the total amount of money in the system depends on if there are any ongoing transfers.

Hopefully you can see how serious this can be, and I haven’t even touched on referential integrity or other kinds of constraints!

In conclusion, if you use micro-services or NoSQL databases, do your homework! Choose appropriate boundaries for your systems.

Further Reading

Errata and Feedback

On Hacker News vidarh holds the opinion that consistency is overvalued. I’d like to emphasise that I’m not against using distributed systems as long as the resulting problems are well understood. I’d also like to say that if you have the option, ACID transactions are usually simpler than eventual consistency or distributed transactions.

A few people dislike the bank example because real world banks use eventual consistency.

First of all, I agree (from experience) with 4ad that this is not the greatest counter-argument against ACID.

Secondly, the real world banking system is (hopefully) well thought through and understood. Don’t think of this article as advocating ACID versus eventual consistency, think of it as advocating a proper understanding of your tools. That said, ACID is usually the simpler solution.

Thirdly, the example was to show potential financial consequences for ignoring consistency issues. It was not intended to be a tutorial on how to implement a bank.

There was some discussion on the term ‘NoSQL’. I would like to point out that when I use the term ‘NoSQL’ I am specifically talking about non-ACID databases and not the absence of the SQL language. ‘NoSQL’ has become associated with ‘non-ACID’ through an accident of history and imprecise marketing.

Thorsten Möller mentions (via email) that it is important to understand isolation, the ‘I’ of ACID. Even ACID databases do not always fully enforce isolation in their default configuration. Check out the PostgesSQL documentation for more information on isolation levels.