## High-Assurance Base64

**Author: David Lazar**

Galois' mission is improving the trustworthiness of critical systems. Trustworthiness is an inherent property of a system, but we need to produce evidence of its trustworthiness in order for people to make informed decisions. The evidence, and its presentation is a key part of what is often called an *assurance case*.

The kinds of evidence that make up an assurance case can differ, depending on how critical the system is. The more critical, the more thorough and convincing the assurance case should be. Formal methods — mathematical proofs of software correctness — are often called for when evaluating the most critical systems. A formal method approach to assurance produces a *strong argument* about a model of the system. In contrast, testing produces a *weak argument* about the system itself. The reason it is a weak argument is that for any non-trivial component, testing will only exercise a minuscule percent of the possible input space. In contrast, a formal proof says that for a model of the system, some property holds for *all inputs*. Phrased this way, it's clear that testing and formal evidence provide complementary evidence of a system's correctness.

In this post we describe a fast, formally verified C implementation of Base64 encoding. The 10,000 foot description of the proof approach is:

- choose the system,
- create a formal specification of the system,
- create a model of the implementation, and
- prove that the implementation meets the specification.

The code and proof are available here:

https://github.com/davidlazar/base64

The following sections describe the different parts of this repository.

### The C Code

The C code in `base64encode.c`

is a fast implementation of Base64 encoding. It is based on libb64, which uses coroutines to achieve speed. The nontrivial control-flow of this code makes proving it correct more challenging. This is the artifact we want to develop assurance evidence for.

Typing "make" at the top of our repository builds the `b64enc`

tool, a simple frontend to the C code. Here is a quick timing comparison between our code and the `base64`

utility that is part of GNU Coreutils:

```
$ time b64enc 500MB.random > x.b64
real: 1.197s user: 0.61s cpu: 0.48s
$ time base64 -w0 500MB.random > y.b64
real: 2.615s user: 1.81s cpu: 0.62s
$ diff -w x.b64 y.b64
$
```

### The Cryptol Specification

The Cryptol specification in `base64.cry`

is based on RFC 4648. Ignoring padding and assuming Cryptol is in big-endian mode, the encoding routine is just two lines:

```
encode : {a c} (fin a, c == (8 * a + 5) / 6, 6 * c >= 8 * a) =>
[a][8] -> [c][8];
encode xs = [| alphabet @ x || x <- groupBy(6, join xs # zero) |];
```

The corresponding `decode`

function is similarly short. The `b64encode`

and `b64decode`

functions are wrappers around `encode`

and `decode`

that handle padding.

Our specification provides the following correctness theorem, polymorphic over all lengths of `x`

:

```
theorem b64encodeLeftInv: {x}. b64decode (b64encode x) == x;
```

Cryptol can't prove polymorphic theorems on its own, so we must monomorphize the theorem to prove it:

```
base64> :prove (b64encodeLeftInv : [16][8] -> Bit)
Q.E.D.
base64> :prove (b64encodeLeftInv : [200][8] -> Bit)
Q.E.D.
```

### The Proof

This `proof/`

subdirectory contains infrastructure for proving the C code correct. The Makefile in this directory orchestrates the proof. Typing `make n=16`

generates and proves a theorem that says the `base64_encode`

function from the C code (as compiled by Clang) is equivalent to the `b64encode`

function in the Cryptol specification for the given input length `n`

:

```
$ make n=16
Proving function equivalent to reference:
encode_aig : [16][8] -> [24][8]
Q.E.D.
```

Let's see what the Makefile is doing behind the scenes.

The C code in `sym_encode.c`

is a wrapper around our C code that passes a symbolic value to the `base64_encode`

function. This code is compiled by Clang to LLVM bytecode.

The LLVM Symbolic Simulator (a prototype tool currently under development by Galois) is used to extract a formal model `encode_aig`

of the LLVM bytecode. The model can be loaded into Cryptol and used like any other function:

```
proof> :t encode_aig
encode_aig : [16][8] -> [24][8]
proof> encode_aig "16 characters..."
"MTYgY2hhcmFjdGVycy4uLg=="
```

In particular, we can write a theorem about this function:

```
theorem MatchesRef : {x}. encode_aig x == b64encode x;
```

...and then prove it:

```
proof> :prove MatchesRef
Q.E.D.
```

Success! Amazingly, this proof systems scales to large values of `n`

where exhaustive checking is not feasible:

```
$ time make n=1000
Proving function equivalent to reference:
encode_aig : [1000][8] -> [1336][8]
Q.E.D.
real: 17.882s user: 16.31s cpu: 1.50s
```

### Summary

The success of the proof gives us high confidence that our C code is correct.

To reiterate our steps:

- choose the system — we chose Base64 encode,
- create a formal specification of the system — we did this with a few lines of Cryptol,
- create a model of the implementation — we used our LLVM Symbolic Simulator to
*generate the model from the C code*, and - prove that the implementation meets the specification — to do this, the Cryptol tool calls out to a SAT solver to prove the AIGs are equivalent.

The weak-link of many approaches to using formal methods applied to the software correctness challenge is the *model-to-code* correspondence. The approach we took above addresses that weakness by automatically generating the model via symbolic simulation of a low-level representation of the program — in this case, the LLVM bytecode. This approach would miss bugs or malware in the path from LLVM to executable, but remains a compelling argument for the correctness of the C code. In critical applications, it makes sense to include the compiler in the scope of an overall assurance case.

## Reader Comments (1)

Nice.. One note regarding the comment:

"Cryptol can't prove polymorphic theorems on its own, so we must monomorphize the theorem to prove it"

Keep in mind that this is not a limitation, but rather follows from the fact that there is no notion of polymorphic validity in Cryptol's size-based type system. That is, you can write a polymorphic theorem that holds at certain monomorphic instances but not at others. (Essentially using a cardinailty argument, anyhing that mentions "size" is of suspect.) It then follows that there's no decision procedure to determine polymorphic validity in Cryptol.

Of course, for some polymorphic properties the theorem either holds at all monomorphic instances or at none. It's an interesting research question to ask whether one can design an algorithm to tell whether a polymorphic property falls into this category, without actually enumerating all the monomorphic instances (which could be infinite). I'm afraid the answer is "no," but I have no proof of this. Of course, this very question calls for precisely nailing down Cryptol's type-system itself, which is whole another can of worms.

Nonetheless, very nice work and very nice write up..