## ZUC in Cryptol

ZUC is a stream cipher that is proposed for inclusion in the "4G" mobile standard named LTE (Long Term Evolution), the future of secure GSM. The proposal is actually comprised several different algorithms:

- A stream cipher named ZUC,
- LTEencryption algorithm (128-EEA3), based on ZUC,
- LTEintegrity algorithm (128-EIA3), which is a hash function using ZUC as its core.

In this post, we will develop a Cryptol implementation of ZUC, providing executable reference code. In addition, we will also discuss some of the security concerns raised for version 1.4 of the algorithm: We will use Cryptol to automatically identify an *IV collision* vulnerability. This weakness is addressed in version 1.5 of the algorithm: We will also prove that the proposed fix is indeed valid, again using Cryptol's formal verification tools.

While we do assume some familiarity with ZUC and Cryptol, this article should shed some light into how Cryptol can be used in the cryptographic design and evaluation processes even if you skip all the ZUC specific details.

Our implementation follows the latest specification of ZUC, which is at version 1.5 at the time of writing.

### Addition in GF(231-1)

One of the core operations in ZUC is addition of two 31-bit numbers, say a and b. The specification states that this operation can be done as follows:

- Compute v = a+b
- If the carry bit is 1, set v=v+1

We transliterate this algorithm to Cryptol straightforwardly:

plus : ([31], [31]) -> [31]; plus (a, b) = if sab ! 0 then sab'+1 else sab' where { sab : [32]; sab = (a # zero) + (b # zero); sab' : [31]; sab' = take(31, sab); };

Note how we detect overflow by doing the 32-bit addition and checking the final bit. We can generalize plus to , which will add any sequence of numbers in GF(231-1) using plus repetitively:

add : {a} (fin a) => [a][31] -> [31]; add xs = sums ! 0 where sums = [0] # [| plus (s, x) || s <- sums || x <- xs |];

We simply generate the partial sums, and return the final sum by selecting the last element.

### A digression on plus

As an interesting aside, the ZUCspecification calls the plus operation we have defined above addition modulo 231-1, which does not seem to be the traditional modular addition. In particular, you can prove that will never evaluate to 0 unless its arguments are both 0, which is not quite how modular addition typically behaves. We can prove this claim using Cryptol's satisfiability checker:

ZUC> :sat (\(a, b) -> (a != 0) & (b != 0) & (plus(a, b) == 0)) No variable assignment satisfies this function

It appears that the algorithm has been designed with modular addition in mind, with tweaks to avoid having it generate values on purpose.

### The LFSR

At the core of ZUClies an LFSR (linear feedback shift register), which comprises of 16 cells, each of which is precisely 31 bits wide. Cryptol's type-system has been designed to accurately capture such specifications:

type LFSR = [16][31];

Note that we do not have to resort to using 32-bit machine integers or some other machine mandated bit-size, freeing us to give a fully faithful implementation of the specification.

ZUC uses the LFSR in two different modes. In the *initialization *mode, it takes a 31-bit input u, and transforms the LFSR by performing the following computation, where s i refers to the i-th cell of the LFSR:

2^{15}s_{15} + 2^{17}s_{13} + 2^{21}s_{10} + 2^{20}s_{4} + (1+2^{8}) s_{0} mod (2^{31} - 1)

ZUC adds this value to the input u, and then "tweaks" the sum to avoid a 0 result. The Cryptol code below implements the specification quite closely:

LFSRWithInitializationMode : ([31], LFSR) -> LFSR; LFSRWithInitializationMode (u, ss) = (ss @@ [1 .. 15]) # [s16] where { v = add [| s <<< c || s <- ss @@ [15 13 10 4 0 0] || c <- [15 17 21 20 8 0] |]; vu = add [v u] s16 = if vu == 0 then 0x7fffffff else vu; };

Note how we pick elements of the LFSR and the coefficients by a simple sequence comprehension.

In the work mode, there is no initializer value u, but otherwise the operation is very similar:

LFSRWithWorkMode : LFSR -> LFSR; LFSRWithWorkMode ss = (ss @@ [1 .. 15]) # [s16] where { v = add [| s <<< c || s <- ss @@ [15 13 10 4 0 0] || c <- [15 17 21 20 8 0] |]; s16 = if v == 0 then 0x7fffffff else v; };

We could have defined LFSRWithWorkMode in terms of LFSRWithInitializationMode by passing 0 for u, but the above definition follows the specification much more closely, a desirable thing to do for a reference implementation. (Also, this version of is a bit faster for obvious reasons, saving us some cycles during execution.)

### Bit Reorganization

The middle layer of ZUC takes the LFSR and shuffles its contents as follows:

BitReorganization : LFSR -> [4][32]; BitReorganization ss = [| y # x || (x, y) <- [(hi s15, lo s14) (lo s11, hi s9) (lo s7, hi s5) (lo s2, hi s0)] |] where { lo, hi : [31] -> [16]; lo x = x @@ [0 .. 15]; hi x = x @@ [15 .. 30]; [s0 s2 s5 s7 s9 s11 s14 s15] = ss @@ [0 2 5 7 9 11 14 15]; };

There isn't much to say about bit-reorganization, except to note that selecting high and low bytes of a 31-bit word comes out quite clean in Cryptol, thanks to bit-level addressability and compact selection operation @@. Note how ZUC defines the higher 16 bits of a 31 bit number by picking bits 15 through 30; which is just as natural in Cryptol to express as any other slice of a given word.

### The nonlinear function F

Cryptol implementation of ZUC's F function follows the specification almost literally:

F : ([3][32], [2][32]) -> ([32], [2][32]); F ([X0 X1 X2], [R1 R2]) = (W, [R1' R2']) where { W = (X0 ^ R1) + R2; W1 = R1 + X1; W2 = R2 ^ X2; [W1L W1H] = split W1; [W2L W2H] = split W2; R1' = S(L1(W2H # W1L)); R2' = S(L2(W1H # W2L)); };

Note that we keep track of the parameters R1 and R2 explicitly. Being a purely functional language, Cryptol does not have any notion of state, and hence does not support in-place updates. However, this does not mean inefficient execution: The purely functional approach makes sure the specifications remain easy to develop and reason about, while backend tools (compilers, synthesizers, etc.) can transform the code to run efficiently on various targets appropriately, such as on FPGA's or in software.

We will skip the details of ZUC's S-boxes and the functions L1 and L2, but you can see their implementation in the attached full implementation.

### Loading the Key

ZUC receives a 128 bit key and a 128-bit IV (initialization vector), to construct the initial starting

configuration of the LFSR. The following definition follows the specification (Section 3.5) literally:

LoadKey : ([128], [128]) -> LFSR; LoadKey (key, iv) = [| i # d # k || k <- ks || i <- is || d <- ds |] where { ks : [16][8]; ks = split key; is : [16][8]; is = split iv; ds : [16][15]; ds = [0b100010011010111 0b010011010111100 0b110001001101011 0b001001101011110 0b101011110001001 0b011010111100010 0b111000100110101 0b000100110101111 0b100110101111000 0b010111100010011 0b110101111000100 0b001101011110001 0b101111000100110 0b011110001001101 0b111100010011010 0b100011110101100]; };

### Initializing ZUC

During the initialization stage, ZUC loads the key and the IV, and then repeatedly performs bit-reorganization, a run of F, and a run of LFSR in initialization mode. This process is repeated 32 times. For purposes

that will become clear later, we represent this operation as a Cryptol stream function that returns an infinite sequence of ZUC configurations:

type ZUC = (LFSR, [32], [32]); InitializeZUC : ([128], [128]) -> [inf]ZUC; InitializeZUC (key, iv) = outs where { initLFSR = LoadKey (key, iv); outs = [(initLFSR, 0, 0)] # [| step out || out <- outs |]; step (lfsr, R1, R2) = (LFSRWithInitializationMode(take(31, w >> 1), lfsr), R1', R2') where { [X0 X1 X2 _] = BitReorganization(lfsr); (w, [R1' R2']) = F ([X0 X1 X2], [R1 R2]); }; };

### Executing ZUC

We need two more pieces of functionality. In the so called working stage, ZUC runs bit-reorganization, F, and LFSR in work mode, discarding the result of the call to F:

WorkingStage : ZUC -> ZUC; WorkingStage (lfsr, R1, R2) = (lfsr', R1', R2') where { [X0 X1 X2 _] = BitReorganization(lfsr); (_, [R1' R2']) = F ([X0 X1 X2], [R1 R2]); lfsr' = LFSRWithWorkMode(lfsr); };

Cryptol's pattern-matching based definitions come out quite nicely in picking (and ignoring) results of operations.

In the production stage, ZUC transforms works just like in the working stage, except the result of the call to F is returned as the next 32-bit key, after XOR'ing with the last word of what bit-reorganization returns. Again, the Cryptol code is straightforward:

ProductionStage : ZUC -> ([32], ZUC); ProductionStage (lfsr, R1, R2) = (w ^ X3, (lfsr', R1', R2')) where { [X0 X1 X2 X3] = BitReorganization(lfsr); (w, [R1' R2']) = F ([X0 X1 X2], [R1 R2]); lfsr' = LFSRWithWorkMode(lfsr); };

### The ZUC API

We can finally give the ZUC API. Given a key and an IV, ZUC initializes itself, and then keeps calling ProductionStage to generate successive sequences of 32-bit words as key expansion. The result is most naturally captured in Cryptol as an infinite sequence of 32-bit words:

ZUC : ([128], [128]) -> [inf][32]; ZUC (key, iv) = tail [| w || (w, _) <- zucs |] where { initZuc = WorkingStage(InitializeZUC @ 32); zucs = [(zero, initZuc)] # [| ProductionStage zuc || (_, zuc) <- zucs |]; };

Encryption can now be done by taking the plaintext and XOR'ing with the successive words that come out of the above function: Clearly, decryption is the same as encryption, and the fact that they are inverses follows trivially from the fact that it's a mere XOR operation.

And this completes our development of ZUC in Cryptol. You can see the entire code here.

### Testing

One thing about crypto-algorithm development is that it is hard to convince oneself that the algorithm is coded correctly. ZUC is no exception. Hopefully, Cryptol makes that task easier by abstracting away from many of the machine-specific details, providing a language that allows one to express idioms that appear in cryptography quite concisely, thus removing a whole class of bugs that has nothing to do with the algorithm itself but rather with how it has to be implemented.

The other aspect of Cryptol is that the specification, as high level as it is, remains executable. So, we can use our implementation and test it against the published test-vectors for ZUC. Here's the first example from the test document

(Section 3.3):

ZUC> take(2, ZUC(0, 0)) [0x27bede74 0x018082da]

(The full implementation has further test vectors from the specification, see the theorem ZUC_TestVectors.) Note that since our implementation of ZUC returns an infinite sequence, we have used the "take" function to just look at the first two outputs. Naturally, we can pull out as many outputs as we would like from that infinite stream.

### Security of ZUC

One essential activity in crypto-algorithm design is to provide rigorous security arguments. While the current state-of-the-art in developing such arguments relies largely

on human intelligence, we can use tools to attest our findings. In this section we will see how to use Cryptol to mechanically demonstrate an IV collision vulnerability found in version 1.4 of the ZUC specification, and how the modifications in version 1.5 addressed the problem.

An IV collision occurs if two different IV's cause the algorithm to initialize itself to precisely the same state, thus losing entropy. Cryptographers seriouslyworryabout

such losses of entropy as they can lead to efficient attacks by cutting down the search space significantly. In a recent conference, Wu et al., demonstrated one such vulnerability in ZUC 1.4. In that version of the specification, ZUC had a slightly different initialization sequence. First, instead of addition in GF(2^{31}-1), it performed a simple XOR when LFSR is used in the initialization mode. Second, version 1.4 XOR'd the last byte from the bit-reorganization during initialization with the result of the call to the nonlinear function F. (You can see the precise differences between versions 1.4 and 1.5 in the attached Cryptol code, search for the occurrences of the variable version1_5, which distinguishes between the two.)

As demonstrated by Wu et al., the 1.4 version of the algorithm suffers from IV collision: That is, two different IV's can result in the precise same ZUC state, causing loss of entropy. It is easy to express this property as a Cryptol theorem:

theorem ZUC_isResistantToCollisionAttack: {k iv1 iv2}. if (iv1 != iv2) then InitializeZUC(k, iv1) @ 1 != InitializeZUC(k, iv2) @ 1 else True;

Let's spend a moment on what the above theorem is stating. It says that for all values of k, iv1, and iv2, the initial state of ZUC will be different so long as iv1 and iv2 are not the same. If this theorem holds of our algorithm, it would mean that there is no entropy loss due to IV collision. (We also now see why we chose InitializeZUC to return an infinite stream: This way we can simply look at the result of the first step, which will create a simpler verification problem for the backend SAT/SMT solver used by Cryptol.)

Here is Cryptol's response when we tell it to prove the above theorem, using version 1.4 of the specification:

```
ZUC> :set sbv
ZUC> :prove ZUC_isResistantToCollisionAttack
Falsifiable.
ZUC_isResistantToCollisionAttack(0x6bff61ffff8fcdffffffc996ffffff1a,
0xff08fc0085e000000a0000f5008f000a,
0xff08fc0085e000000a0000f5008f008a) = False
```

The first command tells Cryptol to switch to the SBV mode, which allows for formal proofs. In the second command, we asked Cryptol to prove that the theorem holds of our implementation. Not only Cryptol told us that the theorem we have stated is false, it also provided a concrete counterexample! (Note that those two IV values are different, check the second to last digit!) Let's verify that the vulnerability indeed does exist with the iv values Cryptol gave us:

ZUC> take(2,ZUC(0x6bff61ffff8fcdffffffc996ffffff1a,0xff08fc0085e000000a0000f5008f000a)) [0xa415abbe 0x673f1eb9] ZUC> take(2,ZUC(0x6bff61ffff8fcdffffffc996ffffff1a,0xff08fc0085e000000a0000f5008f008a)) [0xa415abbe 0x673f1eb9]

Voila! We have two different IV's, yet we get precisely the same key-sequence. Cryptol can attest to what Wu et al. showed, providing a concrete counterexample to wit.

In response to this vulnerability, the ZUC algorithm was slightly tweaked to remove the possibility of collision. (Again, you can look at the attached Cryptol code to see what those changes were, search for the word version1_5.) Is the vulnerability really removed? While mathematicians will have their own tools to claim as such, we can use Cryptol to verify that the fix indeed does work (and is correctly implemented) in our version as well. With version 1.5 of the spec, we have:

ZUC> :prove ZUC_isResistantToCollisionAttack Q.E.D.

(The above proof takes a couple of seconds to complete on my laptop.)

It is important to note that the above theorem does **not** prove that there are no IV collisions in ZUC 1.5. This is because we've only proved the theorem after the first run of the InitializeZUC routine. Recall that the actual implementation actually runs that operation 32 times. While we can express the full theorem in Cryptol as well, it generates quite a large verification problem, and the SAT solver running on my laptop is not quite powerful enough to tackle it. (The proof might indeed be feasible on a more decent machine with enough RAM. One can also construct an argument that the initialization step is injective for all possible LFSR configurations that LoadKey will produce, thus completing the proof in two steps. We leave that as an exercise for the interested reader!) In any case, our proof above shows that ZUC version 1.5 is at least free of "easy to find" IV collision attacks.

### Summary

Designing cryptographic algorithms requires a deep understanding of the underlying science of cryptography, and a fair amount of the mathematics thus involved. Implementing such algorithms need not! We believe that the Cryptol toolset provides the right idioms and the tools to simplify cryptographic algorithm implementations and evaluations, abstracting away from machine specific details and platform specific concerns. Specifications remain pure, and hence easier to reason about and communicate. The executable nature of Cryptol also makes it easy to just play around with your implementations, without worrying about myriads of implementation specific concerns. (Compare how you would do a similar study of ZUC if you had to use C or Java; how much of your time would be spent on the actual algorithm, and how much on "everything else.") Once the algorithm is developed, the compilation and synthesis tools of Cryptol can help in creating artifacts that can be deployed in software or in an FPGA. By separating the concerns of specification from implementation, Cryptol provides new means of simplifying crypto-algorithm development and evaluation.

The full ZUC implementation in Cryptol can be downloaded here. Free licenses for Cryptol are available at www.cryptol.net.

## Reader Comments