(A really quick post today). Following a recent post, Run length encoding in Python, I thought it would be nice to look at this simple encoding in Haskell. The RLE idea is simple, given some input:

    "aaaabbaaa"

compress it by taking the length of each run of characters:

    "4a2b3a"

Running this in GHCi:

    > encode "aaaabbaaa"
    [(4,'a'),(2,'b'),(3,'a')]

    > decode (encode "aaaabbaaa")
    "aaaabbaaa"

So there’s a simple QuickCheck property we’ll want to establish. First, let’s look at the Python implementations:

    def encode(l):
        return [(len(list(group)),name) for name, group in groupby(l)]

    def decode(l):
        return sum([length * [item] for length,item in l],[])

Ok, so group each run into a list, then pair up the length, with the list item, and to decode, replicate the character by its length, and concatenate. Easy in Haskell:

    encode = map (length &&& head) . group

    decode = concatMap (uncurry replicate)

Who says static typing implies verbosity?

We also get a generic implementation for free, given that Haskell infers that the code never inspects the elements in the list directly, our `encode’ function will just as happily encode Strings as lists of numbers or any other list. That is, ‘encode’ will work on any type `a’ that looks like it can do (==).

    encode :: Eq a => [a] -> [(Int, a)]

So:

    > encode [1,1,1,1,2,2,2,7,7]
    [(4,1),(3,2),(2,7)]

    > encode ["foo", "foo", "bar", "foo", "foo", "foo"]
    [(2,"foo"),(1,"bar"),(3,"foo")]

Polymorphism rocks.

The original poster used a couple of unit tests, but in Haskell we’d usually specify generic properties the function must satisfy:

    prop_id :: String -> Bool
    prop_id xs = decode (encode xs) == xs

QuickCheck then generates input for us:

    > quickCheck prop_id
    OK, passed 100 tests.

So go out and sketch your little algorithms in (shorter, faster, safer) pseudo-Python :-)

Thanks to psykotic for the idea!

About these ads