17 September 2020

My Principles for Building Software

These are my personal principles for building software. I hope to frequently update them as my views change. There can be valid reasons for breaking them (they are principles, not laws), but in general I believe following them works out well.

Most of them revolve around making the system simpler in some way. It’s my belief that simpler systems are more reliable, easier and quicker to modify, and generally easier to work with.

Make Invalid States Unrepresentable
Data Consistency Makes Systems Simpler
Design “Data First”
Measure Before You Cut
Avoid Trading Local Simplicity for Global Complexity
Recognise Intrinsic Complexity
Fewer Technologies Result in Simpler Systems
Focus Your Learning on Concepts, not Technologies
Code Consistency is Important
Shared Principles are Important

Make Invalid States Unrepresentable

I have put this first because I think it is one of the most important and most powerful principles.

You may have heard this phrase in relation to designing your program’s types, but the principle applies everywhere you represent data – for example database design.

Not only does this reduce the number of states your system can be in (and thus make it simpler), but it reduces the number of invalid states, which is even better! Your system does not have to handle these states because they literally cannot be represented in your program.

This is not just a minor convenience, it can drastically simplify your system and prevent entire classes of bugs from occurring.

I have put together some examples.

Data Consistency Makes Systems Simpler

Consistency enforces rules on your data, and so reduces the number of states your system needs to handle. This follows on from the “make invalid states unrepresentable” principle.

Definition

I am using consistency here in a general sense: that your data adheres to certain rules, and that it always obeys those rules at every point in time. This definition relates to ACID consistency, so don’t confuse it with CAP consistency.

The rules can be any pretty much anything; for example, that your credit should never be able to go negative, or that private posts should not be visible to others. It is not restricted to foreign keys or unique indexes, although they are also valid examples.

As well as your database, consistency may be enforced by your application utilising ACID transactions. It is preferable to enforce them at the database level, but this is not common practice for anything more complex than simple checks for practical reasons.

Practical Advice

Anything which restricts or compromises consistency results in complexity. This leads to the following practical advice:

It is simpler to have:

Fewer databases (ideally one)
Normalised, less redundant data
A ‘good’ database design (big topic)
ACID transactions
More data constraints

It is more complex to have:

Multiple databases
Redundant or denormalised data
A poor database design
Fewer (or no) data constraints

Of course, there are valid reasons to make your system more complex, and I don’t intend complexity to be a dirty word, but see “measure before you cut”.

I consider this principle to be one of the most undervalued in software engineering today. Consistency issues often go unrecognised. Many problems, I daresay most problems, are consistency issues at an essential level – data that does not conform to some expectation.

Design “Data First”

What is more likely to be around in 10 years: your code or your data?

Code can be thrown away and re-written, but this is rarely the case with data.

Data is more important than code. The purpose of code is to transform data.

When designing a new system, it’s best to start with your database and your data structures and build your code on top of that. Consider the constraints you can place on your data and enforce them, ideally by the way your represent your data.

Code design flows naturally from data design. The simpler and more consistent your data model is, the simpler your code will be.

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious

Fred Brooks

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Linus Torvalds

Measure Before You Cut

This is the most common mistake made by software developers. It’s responsible for many self-inflicted problems.

The principle is to ensure that empirical evidence backs the need for a trade-off.

Common mistakes:

Trying to build a complex “scalable” system that scales to a size you’ll never need.
Making services as small as possible without considering need or cost.
Adding inconsistency or complexity for performance in a part of the system that is not a performance bottleneck.

Advice:

Start with the simplest, most correct system possible.
Measure performance.
Do not pay complexity costs or violate the other principles until it solves an actual problem, not an imaginary one.
Some optimisations can be made without measurement, because they have little or zero cost. For example, using the correct data structures that support favourable performance for the operations you want to perform.
It’s true that sometimes experience alone can tell you if you’re making the correct trade-off. It’s still better if you can prove it.
When you have to choose, prefer correctness and simplicity over performance.
In some cases correct and simple code is the best performing code!

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

Donald Knuth

Avoid Trading Local Simplicity for Global Complexity

i.e. avoid making a part of the system simpler in exchange for making the system as a whole more complex.

This trade is usually not an even one; chasing after local simplicity can cause and order of magnitude increase in global complexity.

For example, smaller services can make those services simpler, but the reduction in consistency and the need for more inter-process communication makes the system as a whole much more complicated.

Recognise Intrinsic Complexity

Sometimes things are just complicated. You cannot make problems simpler than they are.

Any attempt to do so will ironically make your system more complex.

Fewer Technologies Result in Simpler Systems

It is better to understand fewer technologies deeply than many technologies at a surface level. Fewer technologies mean fewer things to learn, and less operational complexity.

Focus Your Learning on Concepts, not Technologies

Do not concern yourself too much with intricate details of the software you use – you can always look them up. Learn the underlying fundamental concepts.

Technologies change, concepts are eternal. The concepts you learn will help with newer technologies, and you will be able to learn them much quicker.

For example, do not concern yourself so much with the surface level details of React, Kubernetes, Haskell, Rust, etc.

Focus on learning:

Pure functional programming
The relational model
Formal methods
Logic programming
Algebraic data types
Typeclasses (in general and specific ones)
The borrow checker (affine/linear types)
Dependant Types
The Curry-Howard Isomorphism
Macros
Homoiconicity
VirtualDOM
Linear regression
etc.

Code Consistency is Important

Sometimes writing the consistent thing is more important than writing the “correct” thing. If you want to change the way something works in your codebase, change all instances of it. Otherwise, try to stick with it.

The readability of your code has more to do with consistency than it does with any notion of simplicity. People understand code by pattern recognition, so repeat (and document) patterns!

Shared Principles are Important

The more principles you have in common with your teammates, the better you will work together, and the more you will enjoy working together.

Appendix A: Inconsistency Results in Complexity

This is the simplest example I can think of to illustrate this principle. I hope it doesn’t require too much imagination to relate to realistic problems.

Consider a database with two Boolean variables x and y. Your application has a rule that x = y, and it can enforce this rule by using a transaction to atomically change both variables.

If this rule is correctly enforced, your data can only be in two states: (x = True, y = True) or (x = False, y = False).

Writing the function ‘toggle’ with this rule in place is straightforward. You atomically read one of the values and set both values to the negation.

Now consider what happens if you split those variables into their own databases and they can no longer be atomically changed together.

Because you can no longer consistently ensure that x = y, your data can be in two more states: (x = True, y = False) or (x = False, y = True).

Which value should you use if your system is in one of these states?
What should your ‘toggle’ function do in one of these states?
How do you ensure that both writes are successful when writing a new value?

There are no correct answers to these questions.

Of course, if we’d followed the “make invalid states unrepresentable” principle in the first place, there would only be one variable! :)