A Few Django ORM Mistakes

See if you can figure out what's wrong with the code snippets below! Ask yourself what the problem is, what effect will it have, and how can you fix it?

These examples are for Django, but probably apply to many other ORMs.

Bug 1

def create():
    with transaction.atomic():
        thing = Thing.objects.create(foo=1, bar=1)
        set_foo(thing.id)
        thing.bar = 2
        thing.save()

def set_foo(id):
    thing = Thing.objects.get(id=id)
    thing.foo = 2
    thing.save()

Bug 2

class Thing(Model):
    foo = ...
    bar = ...

def thing_set_foo(id, value):
    thing = Thing.objects.get(id=id)
    thing.foo = value
    thing.save()

def thing_set_bar(id, value):
    thing = Thing.objects.get(id=id)
    thing.bar = value
    thing.save()

Bug 3

def increment(id)
    counter = Counter.objects.get(id=id)
    counter.count = counter.count + 1
    counter.save()

Isolation Levels

READ COMMITTED Isolation Level

This is the default for PostgreSQL. Transactions can read updates from other transactions after they have been committed.

REPEATABLE READ Isolation Level

This is the default for MySQL. A snapshot is established on the first read in the transaction, and all subsequent reads are from the snapshot.

Going forward, assume we are using MySQL in its default configuration.

Bug 4

def state_transition(id):
    with transaction.atomic():
        stateful = Stateful.objects.get(id=id)
        if stateful.state == DONE:
            raise AlreadyDone
        do_state_transition()
        stateful.state = DONE
        stateful.save()

Bug 5

def create_payment(collection_id, amount):
    with transaction.atomic():
        payment_collection = PaymentCollection.objects.get(id=collection_id)
        Payment.objects.create(
            amount=amount, payment_collection=payment_collection)
        payment_collection.total = (
            payment_collection.payment_set.all()
              .aggregate(total=Sum('amount'))['total'])
        payment_collection.save()

Bug 6

def foo(id):
    with transaction.atomic():
        foo = Foo.objects.get(id=id)
        bar = Bar.objects.create(...)

        with lock():
            foo.refresh_from_db()

            # If foo.bar has already been set in another thread,
            # raise an exception and rollback the transaction
            if foo.bar:
                raise Exception

            foo.bar = bar
            foo.save()

How Common Are These Bugs?

These bugs are simplified versions of bugs found in production code, and written by experienced Django developers. Remember, these bugs won't be as easy to spot as they are here. They can be buried deep in between hundreds of lines of code.

After spending a very short time looking at a few open source Django projects, I can tell that these bugs are common. In fact, at the time of writing the Django example app contains two of them! There are probably a few reasons for this:

  • Thinking about concurrency is just hard.
  • Developers are not taking enough time to understand the underlying database technology.
  • In most cases the lost writes will happen very rarely.
  • When they do happen they are often relatively harmless.
  • It's a silent failure. How would you tell if your app was losing an occasional write?

So it's your call if you want to ignore these problems. But if you decide you don't care, please don't write financial or medical software. ;)

Tips

  • Remember the ORM is an in-memory cache.
  • ORMs can obscure bugs. Look at the SQL!
  • Avoid read-modify-write
    • If you don't you'll probably need a lock.
  • If it's not locked, it's not up to date.
  • Lock before reads to avoid weird MySQL behaviour.
    • Locking reads don't use the snapshot in MySQL.
  • Prefer immutable database design if practical.
  • Consider using a serializable isolation level.
    • You don't have to worry about locking if you use serializable transactions.
    • Has other drawbacks.
    • PostgreSQL implementation is nicer than the MySQL one IMO.