## MD6 in Cryptol

NIST is currently running a competition to come up with the next generation message hashing function that it intends to standardize and FIPS recommend upon completion (assuming one good candidate is left standing and well at the conclusion of the evaluation process):

http://csrc.nist.gov/groups/ST/hash/sha-3/index.html

Apart from the need to come up with better alternatives to its current recommendation, the SHA-2 family of hashing functions, this competition draws inspiration from the success that the AES competition had a couple of years ago in engaging the community in coming up with a replacement for the DES block cipher. As then, a lot of new innovation has resulted.As with block ciphers, many common types of hashing functions lend themselves well to expression in Cryptol. To demonstrate some of the features of Cryptol and how it could be used to express SHA-3 candidates, here's one of the submissions, MD6 from the CSAIL group at MIT, headed by Ronald L. Rivest:

http://groups.csail.mit.edu/cis/md6/

The goal of this writeup is twofold:

- Introduce you to the MD6 hashing algorithm and its construction.
- Expose you to the Cryptol language, and how it lends itself to expressing MD6.

Ideally, you'll come away with enthusiasm on both accounts!

### Setting the context

We won't try to re-count or re-express the design and implementation details of MD6 here, but assume the reader has either a basic knowledge of its structure or is able to separately consult a description of MD6 (such as its specification, a mighty fine and interesting document.) Instead, we'll limit ourselves to presenting some code snippets representing key portions of MD6 and highlight the Cryptol features that it employs. A pointer to the complete code of MD6 in Cryptol is provided at the end, for your own use and experimentation.

### The MD6 hashing function mode

At the toplevel, MD6 accommodates running in a number of different modes, tailored to the resources of the execution environment and external parameters like the digest output size and the number of rounds to apply each step. The implementation presented here is limited to the sequential/iterative version.The Cryptol function that implements the MD6 sequential mode of operation is `md6`, having the following type:

md6 : {a b c d} ( ...type constraints... ) => ( [8*a][b] // input message , [32] // bit width of original message input (before byte-padding) , [8] // number of rounds (104 for 256.) , [c][8] // key (zero for hashing applications) , [32] // output digest length (in bits.) ) -> [16][64];

To explain the Cryptol type signature, let's start at the end -- the result. It is made up of 16 64-bit words. In Cryptol, `[N]` is the type of an N-bit wide value. You can concatenate the `[]` sequence type operators, so `[N]Ty` is the type for a sequence of N` Ty` values; in fact, the `[N]` syntax is a shorthand for `[N]Bit`, i.e., an N-bit wide sequence of bits. So for the above result type, `[16][64]` is the type of a sequence of 16 64-bit values.The inputs to the `md6` function mirror those of the parameters to MD6 SEQ (see specification; Fig 2.6). The argument types make use of Cryptol's size polymorphism via type variables; for example, the input message's type is given as `[8*a][b]`. This means: a sequence with widths that are a multiple of 8 (i.e., if needs be, the input message has already been padded out to make it so.) of `[b]` values, i.e., b-bit wide words (for any `b).`Notice that we've left out the type constraints from the above signature. It is a somewhat long and involved for this function, but this portion of the Cryptol type signature let's you state constraints that must hold for type variables, like:

select5 : {a b} ( a >= 5 ) => [a]b -> b; select5 xs = xs@4;

The indexing operator `@` better be restricted to sequence values that have at least 5 elements, which is what the type constraint expresses here. If you try to evaluate `select5` with the Cryptol interpreter for narrower sequence, you'll see something like this:

Cryptol> :l "select.cry" ... Select> select5 [1 2 3] ["command line", line 1, col 1 "select.cry", line 1, col 13]: Inferred size of 3 is too small - expected to be >= 5 Select> select5 [1..10] 0x5 Select>

That's just a flavor of Cryptol's type system; it is capable of expressing the shape and sizes of the data you manipulate quite precisely and with some sophistication. Like size polymorphism for the `md6` mode function signature, letting you use type variables instead of concrete sizes, but in a constrained manner. The size preconditions that must hold for an argument to a function like `md6` are separately specificed using type constraints.Notice that Cryptol is able to infer the most general type for most definitions and values, so writing out the type signature is optional. But due to its compact and precise nature, it provides valuable documentation that's worth including alongside its definition. One popular way of using the type inference mechanism is to have Cryptol come up with the initial type for a definition, which you then copy in and use. Indeed, that was how the `md6` type signature ended up being derived!Enough on types for now; time to get to grips with the bit and word values that `md6` work over.

### The MD6 mode of operation

MD6 defines its toplevel, mode-of-operation as follows:

md6 (msg,wi,rnds,ks,digLen) = compress(last_chunk, rnds) where { // The final 89-word chunk to compress: last_chunk = qs # key # [uu vv] # r # last padMsg; r = cs @ (width(padMsg) - 1); // sequential mode of operation, each compression round // is fed into the next (cf. the 'c <- cs' knot-tying below.) cs = [c0] # [| compress((qs # key # [u v] # c # b), rnds) || c <- cs || u <- us || v <- vs || b <- padMsg |]; ... };

MD6 iteratively operates on chunks of 89 64-bit values, each chunk made up out of internal state, the chained output from the previous "round" (this being the sequential, Merkle-Damgaard rendition of the algorithm.) The result is one final application of the diffusion function `compress` on the last of these chunks (`last_chunk`.) We won't drill down here into detail about how that final chunk is constructed, but the syntax of `last_chunk`'s Cryptol definition merits some unscrambling:

last_chunk = qs # key # [uu vv] # r # last padMsg;

i.e., it is a sequence constructed by the concatenation of number of sub-sequences (`#` is the Cryptol concatentation operator); the constant Q sequence (see spec), an (optional) key, the control words U and V, and the final portion of the suitably padded input message. Preceding it is `r`, which is the output of the `(last-1)` chained compression steps. That value (`r`) is computed by taking the last element of the `cs` sequence:

// sequential mode of operation, each compression round // is fed into the next (cf. the 'c <- cs' knot-tying below.) cs = [c0] # [| compress((qs # key # [u v] # c # b), rnds) || c <- cs || u <- us || v <- vs || b <- padMsg |];

It is defined using Cryptol's parallel sequence comprehensions, prefixed by the initial value/chunk (IV) `c0`. The function `compress` is computed for each element, with the desired chaining of output from one `compress` invocation being fed into the next through the use of the recursive use of `cs` in one of the "arms" of the comprehension.Recursive sequences allow us to capture stateful computations in Cryptol *without* the explicit mentioning of registers or an internal state being iteratively updated. It is a common "functional" / declarative idiom used in Cryptol code, using such sequences to express chaining/feedback.As `c0`, which can be likened to the initial state of the mode function, is initially prefixed to `cs`, the recursive definition will produce values for any `N`, i.e., element `N` can be computed only by knowing the value of the sequence element `(N-1)` -- not `N` itself nor successive elements. The number of elements in this comprehension is equal to the length of the shortest of its "arms". In the above, that would be the padded-out sequence of message chunks.The net effect being that the `md6` mode function will iterate over all the chunks of the input message, applying at each step the compression function to the internal state and the chained output from the previous step. To finalize and generate output, the `compress` function is called again on the final element.The construction used by this MD6 mode (i.e., it's not the only one, alternative versions are supported by the spec.) is standard. Indeed, we could easily re-factor the `md6` function to have it be parameterized over the compression function to call at each step. The resulting chained-mode function could then be instantiated by any algorithm that uses the same mode-of-operation, but furnishing their own compression/diffusion function.Back to MD6 in Cryptol, expressing the compression function is next.

### The `compress` function

At the heart of MD6 is the function that compresses/diffuses the incoming message, called repeatedly by the chained mode in the previous section. Like with `md6`, let's first consider its Cryptol type:

compress : ([89][64],[8]) -> [16][64];

A function of two arguments, the first being the message block/chunk to process, which consists of 89 64-bit words. The second argument is the number of rounds to iterate over that chunk. For 512-bit digest outputs, the suggested number of rounds is 168.The result is 16 words wide. The function is expressed as follows in Cryptol:

compress (chunk,r) = as @@ outs where { as = chunk # [| cf (a,b,c,d) || a <- ss || b <- a_ts || c <- rs || d <- ls |]; cf (s_i,a_i,r_i,l_i) = x3 where { x1 = s_i ^ a_i; x2 = x1 ^ (x1 >> r_i); x3 = x2 ^ (x2 << l_i); }; a_ts = [| a ^ b ^ (c & d) ^ (e & f) || a <- as || b <- drop(n-t_0,as) || c <- drop(n-t_1,as) || d <- drop(n-t_2,as) || e <- drop(n-t_3,as) || f <- drop(n-t_4,as) |]; outs : [16][32]; outs = [| (t + 89 - 16 + x) || x <- [0 .. 15] |]; t : [32]; t = (r # zero) * (c # zero); c : [8]; c = 16; n : [8]; n = 89; };

The final 16 word result is produced by applying the `@@` operator, which indexes its first argument sequence value by the indices from the second. The MD6 compression function slides a *window* across its input chunk, so in this case the result is the last 16 words in that window -- see the MD6 spec for details about the portion that `outs` selects.The `as` sequence is defined using a common Cryptol idiom, the parallel sequence comprehension:

as = chunk # [| cf (a,b,c,d) || a <- ss || b <- a_ts || c <- rs || d <- ls |];

Its first element is the input chunk, followed by sequence values of equal chunk width, each computed by applying the function `cf` to successive elements of four generator sequences, `ss`, `a_ts`, `rs`, and `ls`. Apart from `a_ts`, these are constant-valued sequences that the MD6 specification defines. `a_ts` is rather more interesting, another parallel sequence comprehension:

a_ts = [| a ^ b ^ (c & d) ^ (e & f) || a <- as || b <- drop(n-t_0,as) || c <- drop(n-t_1,as) || d <- drop(n-t_2,as) || e <- drop(n-t_3,as) || f <- drop(n-t_4,as) |];

It is recursively defined in terms of the above `as` sequence value, i.e., to compute element I it reaches 'back' at previous values at 5 different positions of the 'as' sequence, as determined by the tap positions `t_X ( 0 =< X <= 5)` that MD6 specifies. In the words of the specification, `a_ts` implements a *non-linear feedback shift register*. The `drop(Num,Seq)` Cryptol built-in operator returns the sequence `Seq`, but with its first `Num` elements chopped off. In the above, `n` is the constant chunk size, 89. The values from these different positions are then combined together by simple (and hardware efficient) composition of word-sized bit operators.The result of `compress` is then the final 16 words of the `as` sequence. The length of the `as` sequence is determined by the number of iterations or rounds to perform, `r`.That's it - a tidy and simple compression function at the heart of MD6. Its expression in Cryptol is, we hope, reasonably clear and close to the specification.What's really gained from using Cryptol here? From a programming perspective, having the backing of its sized type system lends great help wrt. correctness and consistency of the word-level operations. The resulting declarative Cryptol program has a number of follow-on benefits though:

- to generate and derive an efficient hardware implementation (for mapping to FPGAs, perhaps). The types aid the generation of this hardware mapping. But, more importantly, use of the recursive sequence idiom allows the Cryptol implementation to infer how to effectively 'render' the function in hardware, mapping the feedback loop to a register-based design.
- as input to Cryptol's C code generator to output C code for embedding into other codebases.
- to check correctness of the generated implementations by using Cryptol's symbolic verification support.
- for prototyping and 'playing around' with ideas and variations of MD6. Cryptol's equivalence checking support can help you determine if the transformations you do to one maintain equivalence with previous versions.
- to check equivalence between this Cryptol 'specification' and other, non-Cryptol implementations in either C or VHDL. Cryptol's toolchain is able to symbolically execute code in either language, followed by the equivalence checking them via SAT solvers.

Being nice and high-level isn't the goal (but we don't mind! :-) ), but using Cryptol to help the hard work of generating algorithms that are correct and possibly high-performing.

### Cryptol sample implementation

The Cryptol files making up the MD6 implementation is available for download via the Galois Cryptol pages as a .zip archive. To use the code, simply load it in:

Cryptol> :load "MD6.cry" ... MD6> t1 // runs the t1 test, see MD6/Tests.cry. True MD6>

Please fire up an editor to have a look at the various ".cry" files.

### Next up: Skein

If that left you wanting more, the next Cryptol blog entry will present a reference implementation for another SHA-3 candidate, Schneier et al.'s Skein. Stay tuned.

#### Author

("Sigbjorn Finne" # "Galois, Inc" # "2009-01-23") : [35][8]

## Reader Comments (2)

There seems to be a typo: x3 = x2 ^ (x2 <should be: x3 = x2 ^ (x2 << l_i);

Thanks for the heads-up Andy; fixed.