Data.Binary: performance improvments for Haskell binary parsing

I’ve just uploaded to Hackage Data.Binary 0.5.0.2 which contains some good performance improvements for binary parser users.

Data.Binary is a popular serialization library for Haskell, which uses Get and Put monads to efficiently translate Haskell structures to and from lazy bytestrings (basically, byte streams) in a pure style. It is used by 85 other packages on Hackage, for everything from network packet parsing, trie container types, network traffic analysis, to web server state persistance, and high speed cryptography.

As a refresher, recall that Binary provides both a Get and Put monadic environment. The Put monad gives you a locally scoped “buffer filling” mode, where calls to ‘put’ implicitly append values to a buffer. A sequence of ‘puts’ is a pure computation that returns a bytestring at the end. Like so:

    runPut :: Put -> ByteString
        -- runPut takes a serializer code fragment,
        -- and uses it to fill a bytestring with bytes.

    Data.Binary> runPut (do put 2; put (Just 7); put "hello")
    "\NUL\NUL\NUL\NUL\STX\SOH\NUL\NUL\NUL\NUL\a
       \NUL\NUL\NUL\NUL\NUL\NUL\NUL\ENQhello"

You can also stream these into the zlib binding to gain on the fly compression (via bzlib or zlib). Conversely, you can parse Haskell data from binary formats, using the ‘Get’ monad. Given a bytestring stream, parse it into some Haskell structure:

    runGet :: Get a -> ByteString -> a
       -- runGet takes a binary parser, a bytestring,
       -- and returns some 'a' parsed from the bytesting.

    Data.Binary.Get> let s = pack "\NUL\NUL\NUL\NUL\STX\SOH
           \NUL\NUL\NUL\NUL\a\NUL\NUL\NUL\NUL\NUL\NUL\NUL\ENQhello"
    Data.Binary.Get> runGet (do a <- get
                                b <- get
                                c <- get
                                return (a :: Integer,b :: Maybe Integer ,c :: String)) s
    (2,Just 7,"hello")

There are also default Get and Put parsers and serializers in the Binary type classes, which provides ‘encode’ and ‘decode’ methods:

    encode :: Binary a => a -> ByteString
    decode :: Binary a => ByteString -> a

Binary is used for heavy lifting binary wire protocol parsing and writing. As such it needs to be very fast. It uses aggressive inlining, careful data representations, specialization, and some interesting monadic representation transformations, to ensure GHC can optimize user-supplied parsers well. The changes, thanks to Simon Marlow, are two-fold, and improve the performance of reading into Haskell data structures by about 40%.

Firstly, the underlying Get monad representation is changed from:

    newtype Get a = Get { unGet :: S -> (a, S) }

to

    newtype Get a = Get { unGet :: S -> (# a, S #) }

That is, we use an explict stack/register allocated unboxed tuple to return the parsed value from the parser. This allows GHC to more aggressively optimize code containing repeated reads from the parser state. (As an aside, we also looked at a continuation-based encoding of the Get monad, but there is no performance win due to the very simple tuple return type, which GHC is already able to decompose nicely. Continuation encodings are more useful if we returned, say, an Either).

Secondly, the monadic bind for the Get monad is now strict. Moving from an explicitly lazy:

    m >>= k   = Get (\s -> let (a, s') = unGet m s in unGet (k a) s')

to:

    m >>= k   = Get $ \s -> case unGet m s of
                             (# a, s' #) -> unGet (k a) s'

This seems to also improve the kind of code GHC generates, which coalescing of repeated reads into straight line code without tests.

Overall, there is no user-visible change to Data.Binary, as only the monadic plumbing has been modified. Your code should just get faster. Parsing streams of Word64s on my laptop goes from ~100M/s to 140M/s.

Haskell for Everyone: Hackage and the Haskell Platform : Haskell Implementors Workshop 2009

“FP Super Fun Week”, aka ICFP + Haskell Symposium + CUFP + Haskell Implementors Workshop + DEFUN has come and gone.

Malcolm Wallace did an amazing job recording most of the sessions, so as a result we have the talk and discussion about Hackage and the Haskell Platform.

You can follow along with the slides here.

Haskell for Everyone: Hackage and the Haskell Platform

.

Improving Data Structures with Associated Types

Original .PDF

This talk describes how to view live heap structures in Haskell, including sharing and unboxing information, and then to use insights on the representation of data to improve the performance and space, for the first time, of uniform polymorphic data types by changing their representation. We’ll use associated data types to guide regular optimizations of polymorphic structures.

The library described in this talk is available on hackage, along with the visualization tool used to build it. There’s also a screencast of the visualization tool in use.

This talk was originally presented at WG2.8 in Frauenchiemsee, Germany in June 2009.

Stream Fusion for Haskell Arrays

PDF Version

Arrays have traditionally been an awkward data structure for Haskell programmers. Despite the large number of array libraries available, they have remained relatively awkward to use in comparison to the rich suite of purely functional data structures, such as fingertrees or finite maps. Arrays have simply not been first class citizens in the language.

In this talk I’ll begin with a survey of the more than a dozen array types available, including some new matrix libraries developed in the past year. I’ll then describe a new efficient, pure, and flexible array library for Haskell with a list like interface, based on work in the Data Parallel Haskell project, that employs stream fusion to dramatically reduce the cost of pure arrays. The implementation will be presented from the ground up, along with a discussion of the entire compilation process of the library, from source to assembly.

The library described in this talk is available on hackage, and is now used by a few projects, including haskell-monte-carlo haskell-pqueue-mtl haskell-statistics-fusion haskell-uvector-algorithms and Bryan O’Sullivan’s new uber benchmark suite.


This talk was originally presented at Galois on August 28th, 2008.

The Haskell Platform: Status Report: Haskell Symposium 2009

At the future of Haskell discussion, at the Haskell Symposium 2009, Duncan Coutts and I gave a status update on the Haskell Platform project: the project to build a single, shared distribution of Haskell for every platform.

DEFUN 2009: Multicore Programming in Haskell Now!

Here are the

for today’s DEFUN tutorial: Multicore Programming in Haskell Now!

Multicore computers are here: is your programming language ready for it?

Haskell is: you can take an off-the-shelf copy of GHC and write high performance parallel programs right now. This tutorial will teach you how to exploit parallelism through Haskell on your commodity multicore machine, to make your code faster. We will introduce key parallel programming models, as implemented in Haskell, including:

  • semi-explicit parallelism via sparks
  • explicit parallelism via threads and shared memory
  • software transactional memory
  • data parallelism

and look at how to build faster programs using these abstractions. We will also look at the engineering considerations when writing  parallel programs, and the tools Haskell provides for debugging and reasoning about parallel programs.

This half-day tutorial will teach intermediate functional programmers with no previous parallel programming experience how to write, reason about and run parallel Haskell programs, using a range of parallel programming models. Each model will be introduced with motivating examples, and exercises to develop familarity with the model in question. By the end you should have an understanding of which parallelism abstraction to use in which circumstance, and be able to go to work writing multicore capable programs in Haskell.

Parallel Programming in Haskell: A Reading List

I’m busy preparing a tutorial for DEFUN on Saturday, and was putting together the “further reading” slides for the attendees, so they have material to go to for deeper reading.threadscope

Here’s my basic “How to learn about parallel programming in Haskell” reading list.

Learn Parallel Haskell

Does anyone else have favourite learning materials for parallel and concurrent programming in GHC Haskell?

Follow

Get every new post delivered to your Inbox.

Join 51 other followers