13 January 2016

Tests vs. Types

When should you use tests and when should you use types? What information and guarantees do we get for our efforts?

I’ll walk through a simple, slightly contrived, example with Python, C, Haskell, and Idris. I’ll look at what you can tell about the implementation without knowing or examining it.

I’m going to ignore ‘escape hatches’ which can explicitly violate the guarantees of the language (For example: C extensions, Haskell’s unsafePerformIO, unsafe type coercion). To do otherwise would mean you couldn’t say much at all and this blog post would be very short.

Contents:

The Specification
Python
Python With Tests
Haskell
Haskell With Signature
Haskell With Tests
C
Idris
Idris With Tests
Idris With Proofs
Comparison
Opinion

The Specification

Given a list of values and a value, return the index of the value in the list or signify that it is not present in the list.

In this case, the implementation is very simple. You may wonder what value tests or types have for this problem. The properties I’m going to talk about here also apply in general to much more complicated code. You can imagine the implementation is ten thousand lines of incomprehensible spaghetti code if that helps you see the utility.

Python

def x(y, z):
    # Implementation elided

Note that I’m not interested in exploring unchecked properties of the program like variable names and textual documentation, so I have deliberately been unhelpful. I’m interested in information that, given your tests pass and your type checker succeeds, cannot be wrong.

There’s pretty much no useful information in this first example other than we have a function that takes two arguments. For all we know, this could find the index of a value in a list, or it could send an offensive email to your grandmother.

Analysis

Without tests or types, not only are you writing brittle code, you’re almost entirely dependent on unchecked documentation. Because this kind of documention is checked by people and not by machines it can easily become out of date and wrong.

Documentation
- ✗ We know the intended behaviour
  
  You have no indication of the behaviour of this function. You hate your grandmother. You’re a monster.
Guarantees
- ✓ Memory safe
  
  Python is a garbage collected language and handles this for us.

Python With Tests

def test_happy_path():
    assert x([1, 2, 3], 2) == 1

def test_missing():
    assert x([1, 2, 3], 4) is None

So, now we know our function works and that when the value is missing the result will be None?

Well… no. This is only one example. Unfortunately, we’re working with an infinite domain, and no number of examples can constitute a proof that our function works as intended. More testing can increase our confidence but never sate our doubt.

It might seem absurd that it would return None for 4 but not 5, and for this example you’re probably right. We may be happy with our level of confidence and stop here. Again, this would make for a very short blog post, so let’s imagine the implementation is not so obvious.

If tests cannot show us something in general, only particular cases, this implies that tests cannot show an absence of errors. For example, there is no test we could write that would show that our function never throws an exception or never goes in to an infinite loop, or contains no invalid references. Only static analysis can do this.

If examples are not good for proof, they are at least good for documentation. From only these examples we can infer the complete specification.

Analysis

While tests can show how to use our function and they can give us some confidence that our code works for some select examples, they cannot really prove anything about our code in general. Annoyingly, this means common errors can only be partially guarded against.

Documentation
- ✓ We know the intended behaviour
- ✓ We have an example of usage
- ✓ We know some types of values that will be handled correctly
- ✗ We know all types of values that will be handled correctly
  
  We have no restriction on the argument types, so while we can see some examples of what the function can accept, we cannot know about types that haven’t been tested.
Specification
- ✓ Works in at least one case
- ✗ The returned index is always a valid index
- ✗ The returned index always refers to a matching value
- ✗ A missing element always returns None/Nothing
Common Errors
- ✗ No typos or incorrect names
  
  Static analysis can help here, but because Python is a dynamic language and things can be redefined at run-time (and often are), we can never prove the absence of mistakes.
  
  In particular, it can be extremely difficult or impossible to check if a method name is correct, as the validity of a method call depends on the type of object it is called on.
- ✗ No unexpected null
  
  A scourge on many typed languages, dynamically typed languages do nothing to help here.
- ✗ The failure case is always handled
  
  In my book this is one of the most common sources of error: in this example, our function returns None to indicate the element is missing, but the caller could assume it will be returning an integer. In general, this can also surface as an unhandled exception.
Guarantees
- ✓ Memory safe
- ✗ Cannot be called with an unhandled type
- ✗ No side effects
- ✗ No exceptions
- ✗ No errors
- ✗ No infinite loops

Haskell

x y z = -- implementation elided

If you’re not familiar with Haskell’s syntax, this is the definition of a function x with parameters y and z. You do not need to specify types in Haskell. The types are inferred from the implementation.

It doesn’t look like there is any more here than in the first Python example, but by virtue of the fact we have written our function in Haskell and it type checks, we already know some interesting properties. Types can help us even when they’re invisible.

Analysis

Obviously, we have very little information for documentation purposes here, but there are some things to note:

Documentation
- ✗ We know the intended behaviour
Common Errors
- ✓ No typos or incorrect names
  
  Because Haskell is a static, compiled language, all references must be resolved at compile time. The program cannot compile if you have made this mistake.
- ✓ No unexpected null
  
  Haskell has no null value. Problem solved!
Guarantees
- ✓ Cannot be called with an unhandled type
  
  If the program type checks, we know we haven’t called this function with incorrect types.

Haskell With Signature

x :: Eq a => [a] -> a -> Maybe Int
x y z = -- implementation elided

Earlier, we were discussing how you had a ‘laissez-faire’ attitude to the safety of your grandmother. We could tell from tests that the function didn’t intend any harm, but is grandma really safe? Are you sure this function can’t send out any expletive-laden emails?

Haskell is famously a pure functional language. This doesn’t mean it can’t do side effects, but that any side effects it does are present in the type signature. We know the type of this function and can see it is pure, so we know it doesn’t modify any external state.

This is a very interesting property for another reason: because we know there are no side effects, we can actually figure out what this function does based only on its type signature! Search for this signature on Hoogle and note the first result. This is not the only possible implementation for this type signature, but we can have enough confidence for documentation purposes.

Analysis

Documentation
- ✓ We know the intended behaviour
- ✗ We have an example of usage
- ✓ We know some types of values that will be handled correctly
- ✓ We know all types of values that will be handled correctly
We have lost our examples. However, in exchange, we now know every type of value the function should correctly handle.
Specification
- ✗ Works in at least one case
  
  In the absence of tests or proofs, we don’t know if our function actually works as intended!
- ✗ The returned index is always a valid index
- ✗ The returned index always refers to a matching value
- ✗ A missing element always returns None/Nothing
Common Errors
- ✓ No typos or incorrect names
- ✓ No unexpected null
- ✓ The failure case is always handled
  
  If our function returns Nothing, the type system ensures this is handled appropriately by the caller. It is possible to ignore the failure case, but only explicitly.
Guarantees
- ✓ Memory safe
- ✓ Cannot be called with an unhandled type
- ✓ No side effects
- ✗ No exceptions
  
  For the these properties I make a distinction between exception and errors, where exceptions are recoverable and errors are not (partial functions etc.).
  
  For the most part, exceptions are described by the type system (e.g. in the IO monad). We should know our function won’t throw an exception from its type signature. However Haskell breaks this by allowing exceptions to be thrown from pure functions.
- ✗ No errors
  
  We can still use partial functions. For example, dividing by zero. This will result in an unrecoverable error.
- ✗ No infinite loops

Haskell With Tests

Remember when I said that tests couldn’t prove the absence of errors? I lied. When the stars align and the tests are combined with types, they can! The first catch is your domain has to be a finite set. The second catch is this will have to be quite a small set or it will be too computationally expensive to be practical.

Example:

not :: Bool -> Bool
not x = ...

The input can either be true or false. Once you have tested the result of these two possibilities, you have the holy grail. No exceptions, no infinite loops, no incorrect results, no errors.

There is a hitch: this is not quite true in Haskell. Haskell values can also be a bottom type (such as undefined or error) so we don’t have ‘complete’ coverage.

Further reading: There are Only Four Billion Floats–So Test Them All!

Regardless, our domain in this example is infinite, so tests can only show that our code works for a finite set of examples.

Analysis

Tests complement our types and fill some of the holes that Haskell’s type system left behind. We can have much more confidence in our code than if we used types or tests alone.

C

/* C doesn't do polymorphism well, so we're using ints */
int x(int *y, size_t n, int z) {
  /* implementation elided */
}

I’ve included C to make a point about the more antiquated type systems. In C, the types are not really there for the programmer, they’re there for the compiler. Their primary purpose is to make your code fast.

In this example, we don’t know what our function will return if the element is missing. We have to rely on convention or documentation (in this case it could be a sentinel value of -1).

We could also use ‘out values’ so we can return an error and write out a value. This is slightly more descriptive but you also need to rely on documentation to know which parameters are written to and which are read from. In both cases it is difficult to infer the behaviour from the types.

/* The same function with an 'out' parameter */
error_t x(int *y, size_t n, int z, size_t* w) {
  /* implementation elided */
}

Analysis

A type system alone does not always give us many guarantees. We get some information from these types, but compare C to Haskell.

Idris

x : Eq x => List x -> x -> Maybe Int
x y z = ...

This is the function with the same type as our Haskell example. With a more expressive type system, we can do better. Your choice of types can tell you more about the implementation.

%default total

x : Eq x => Vect n x -> x -> Maybe (Fin n)
x y z = ...

This signature can be read as “given a list of size n and an element, return either a number less than n or nothing.” This guarantees that our function can only return a valid index in to the list.

The function is also total, which means it is checked to see if it terminates. This rules out infinite loops and errors.

Analysis

Specification
- ✗ Works in at least one case
- ✓ The returned index is always a valid index
- ✗ The returned index always refers to a matching value
- ✗ A missing element always returns None/Nothing
Guarantees
- ✓ Memory safe
- ✓ Cannot be called with an unhandled type
- ✓ No side effects
- ✓ No exceptions
- ✓ No errors
- ✓ No infinite loops

Idris With Tests

Since the Idris type language is just as powerful as the expression language, the distinction between test and type actually becomes blurred.

ex : x [1, 2, 3] 2 = Just 1
ex = Refl

This is a function with the strange type x [1, 2, 3] 2 = Just 1. This type means that in order to type check, the compiler must prove that x [1, 2, 3] 2 is equal to Just 1 (the proof is reflexive in this case). We have effectively written a test in the type system.

Idris With Proofs

To complete our analysis, we can use the full power of a dependent type system and prove our implementation. We can do this because a dependent type system is equivalent to constructive logic (see Curry-Howard correspondence).

We can prove the remaining properties which have eluded us so far:

-- We need to modify our implementation to use DecEq
x : DecEq a => Vect n a -> (y : a) -> Maybe (Fin n)
x y z = ...

-- This is not executable code, but a proof of function 'x'
findIndexOk : DecEq a => (y : Vect n a) -> (z : a) -> 
              case (x y z) of
                Just i => Elem z y
                Nothing => Not (Elem z y)
findIndexOk y z = ...

Read the type signature of findIndexOk as “for all elements, and all vectors of that elements type, if an index is found the element must be in the vector, and if an index is not found the element must not be in the vector”

So everything is covered! What’s the downside? Well, it can be tricky to write these proofs. At least for me. If you’re curious enough to see the proof in full, read the Idris mailing list thread where I plead for help.

Comparison

	Python	Python + Tests	C	Haskell	Haskell + Sig	Haskell + Sig + Tests	Idris	Idris + Tests	Idris + Proofs
Documentation
We know the intended behaviour	✗	✓	✗	✗	✓	✓	✓	✓	✓
We have an example of usage	✗	✓	✗	✗	✗	✓	✗	✓	✓
We know some types of values that will be handled correctly	✗	✓	✓	✗	✓	✓	✓	✓	✓
We know all types of values that will be handled correctly	✗	✗	✗	✗	✓	✓	✓	✓	✓
Specification
Works in at least one case	✗	✓	✗	✗	✗	✓	✗	✓	✓
The returned index is always a valid index	✗	✗	✗	✗	✗	✗	✓	✓	✓
The returned index always refers to a matching value	✗	✗	✗	✗	✗	✗	✗	✗	✓
A missing element always returns `None` or `Nothing`	✗	✗	✗	✗	✗	✗	✗	✗	✓
Common Errors
No typos or incorrect names	✗	✗	✓	✓	✓	✓	✓	✓	✓
No unexpected null	✗	✗	✗	✓	✓	✓	✓	✓	✓
The failure case is always handled	✗	✗	✗	✓	✓	✓	✓	✓	✓
Guarantees
Memory safe	✓	✓	✗	✓	✓	✓	✓	✓	✓
Cannot be called with an unhandled type	✗	✗	✗	✓	✓	✓	✓	✓	✓
No side effects	✗	✗	✗	✗	✓	✓	✓	✓	✓
No exceptions	✗	✗	✓	✗	✗	✗	✓	✓	✓
No errors	✗	✗	✗	✗	✗	✗	✓	✓	✓
No infinite loops	✗	✗	✗	✗	✗	✗	✓	✓	✓

Opinion

In my opinion, simply using a modern type system will give you the most information and guarantees for the least amount of effort. If you want to write solid code, write tests in combination with types. Ideally with a QuickCheck style library.

With dependent types, the line between tests and types becomes blurrier. If you’re writing software for Boeing or for an iron lung, you may want to consider writing proofs.

Errata and Feedback

pron98 on Reddit points out that dependent types are quite a new thing, and not really used in industry.

dllthomas points out an inaccuracy in the ‘complete’ test coverage example, and casts doubt on Haskell’s ability to keep exceptions in the type system.

A few people had some misgivings about the C example and explanation, which I have clarified. angersock points out another way of writing it – although I will point out sentinel values are not rare in C!

Alexey Shmalko noted that C is a compiled language and so you can’t have typos.