It’s deeply satisfying when the stream fusion optimization kicks in, in GHC Haskell. It is a powerful illustration of how the absence of side effects makes the compiler’s job so much easier, and lets it be far more aggressive.

Here’s a little example to show the real expressive power that we get when we have representation-changing optimizations like fusion to rely on. A simple example showing how good GHC is doing at this kind of stuff, these days: add a sequence of numbers. I’ve kept it super minimal so you can mentally compute each transformation to see GHC at work. (Of course there are lots of other ways to write trivial computations like this, but anything more substantial will make the assembly much harder to follow. So for teaching purposes, we’ll keep the source problem simple. Now, you can also use any of the 100+ other functions that fuse to construct all sorts of interesting programs with no closed form solution. Get at it!).

We begin with a pipeline of multiple recursive functions, brought together with function composition. As high level as it gets, and any junior Haskell program should be able to write stuff like this:

import qualified Data.Vector as U

    main = print . U.sum $ U.enumFromTo 1 (100000000 :: Int)

Now, semantically, there is an intermediate 100 million element array being created by enumFromTo. Try doing that in your favorite strict language. However…

… fusion kicks in, and GHC turns our program, via the source-to-source optimization phase, into a single tail recursive function, using unboxed values only .. with no intermediate array! No heap allocaiton, no stack, only one loop. A different algorithm!

    loop :: Int# -> Int# -> Int#
    loop x y = case y <= 100000000 of
          False -> x
          True  -> loop (x + y) (y + 1)

The compiler reordered our computations pretty wildly, and it can do this because there are no side effects by default — making large scale transformations like this far more feasible. I used ghc 6.13 with the llvm patches, and ghc -Odph -fllvm –make A.hs

The programmer didn’t have to think about recursion at all in the original program, let alone manual unboxing. Thanks GHC!

And now this tail recursive function gets compiled into an imperative loop via the codeGen phase. First, with the native code backend to GHC:

loop:
    .Lc216:
       cmpq $10000000,%rsi
       jle .Lc219
       movq %r14,%rbx
       movq (%rbp),%rax
       jmp *(%rax)
    .Lc219:
       addq %rsi,%r14
       incq %rsi
       jmp loop

And again, with the LLVM backend to GHC:

loop:
    leaq    1(%rsi), %rax
    addq    %rsi, %r14
    cmpq    $10000001, %rax
    jge     .LBB1_5
    addq    $2, %rsi
    addq    %rax, %r14
test:                                # %tailrecurse
    cmpq    $10000001, %rsi
    jl      loop

And that’s it. Our kernel code is as low level and un-functional as you get, but our input program was the most classic of high level approaches: a pipeline of functions held together with composition, with intermediate data structures all over the place.

And it performs!

With -fasm: mean: 92.9 ms:

With -fllvm: mean: 79.3ms (15% improvement with -fllvm):

Maybe our compiler is — sometimes — sufficiently smart.

To me, the fusion optimization is the epitome of what functional programming is about. I can write code at a very high level, throwing functions and composition around like candy, and the compiler will make stunning, complexity-changing transformations on my code, so it just runs damn fast. Can your language — functional or otherwise — do this?

About these ads