Tuesday, April 5, 2011

RC4 in Haskell

I've been meaning to try out Levent Erkok's sbv library, but for a while I couldn't think of anything to write. Now I have, and it was quite fun to implement: RC4 in pure Haskell. I took the original RC4 code from cypherpunks, and rewrote it using sbv.

It took a lot of tweaking to get my implementation right, but I think it works correctly now.

Everything, including implementation, haddock documententation, a few tests and a few theorems all weighs in at ~275 lines. Most of it I somewhat directly translated from the cypherpunks posting. I also included a ByteString interface that matches the RC4 code in the cryptocipher package, and an instance of crypto-api's StreamCipher class. So you can actually use it for encrypting things. I make no means of guaranteeing any sort of efficient implementation; indeed, it's probably somewhat slow. If I can think of a good benchmark, I'll add some. But I enjoyed writing it, and sbv is a very nice DSL that was pretty easy to get used to.

The nice thing? This:

*RC4> proveRc4IsCorrect
** Elapsed problem construction time: 16.360s
** Elapsed translation time: 0.000s
** Elapsed Yices time: 34.084s

Which showed my proposition decrypt (encrypt x) == x was correct for 40 bit keys and 40 bit plaintexts. I was quite glad that yices could actually verify this. But despite the fact my proposition holds according to the SMT solver, I also did some testing of my implementation by QuickCheck testing it against cryptocipher's RC4 module. Sure enough - my QuickCheck tests quickly turned up problems!

Moral: don't trust code that's proven and has not been tested, as Wouter said (PDF.) Of course in this instance, proving encryption followed by decryption is the identity function isn't a huge breakthrough, nor is the fact my tests showed errors surprising. Due to RC4's symmetric, pseudo-random nature, the generation of the keystream can be wrong and this proposition will still hold. This was where the bugs lied: keystream generation. Nonetheless, just getting a proposition and blasting it with the solver to be told it's correct is quite satisfying and it did save me once or twice - I really like this SMT solver approach.

I haven't done the C code generation yet. It mainly has to do with the fact the interface would be horrendous because the generated C code cannot use arrays. Levent is working on a new version of sbv that will let you generate code using arrays though (we need this for the S state of the RC4 context, which is a 256 byte array.)

Anyway, here's the code. It also includes my version of the original C implementation from cypherpunks - mainly just a simple driver program with a test inside of it. Caveat emptor: I am not a cryptographer, issues may exist, etc.

I imagine with more compiler backends, it could be possible to have libraries of code written in Haskell using sbv that can then be verified, and extracted to other languages for use - it will also hopefully be extended to use other SMT solvers directly besides Yices. Maybe in future posts I'll write some tutorials for it or something.