Haskell development job at Standard Chartered

The Strats team at Standard Chartered is hiring expert typed FP developers for Haskell dev roles in London.

This is a “front office” finance role – meaning you will work directly with traders building software to automate and improve their efficiency. The role is very development focused, and you will use Haskell for almost all tasks: data analysis, DSLs, market data publishing, databases, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. There may be a small amount of C++ or F# on occasion. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

You will join an expert team in London, and significant, demonstrable experience in typed FP (Haskell, OCaml, F# etc) is strongly preferred. We have more than 2 million lines of Haskell, and our own compiler. In this context we look for skill and taste in typed API design to capture and abstract over complex, messy systems.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability.

The role requires physical presence on the trading floor in London. Remote work isn’t an option. Ideally you have some project and client management skills — you will talk to users, understand their problems, and then implement and deliver what they really need. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send CVs to me - dons00 at gmail.com

Painless NP-complete problems: an embedded DSL for SMT solving

SMT solvers are automated tools to solve the “Satisfiability Modulo Theories” problem — that is, determining whether a given logical formula can be satisfied. However, unlike SAT solvers, SMT solvers generalize to solving such NP-complete problems that contain not just boolean variables, but more useful types, such as lists, tuples, arrays, integers and reals. And they do this while retaining very good performance (with yearly shootouts for the best solvers). As a result, SMT solvers provide a big hammer for solving many difficult problems automatically.

They are so interesting they’ve been described as a disruptive technology in formal methods. They are more automated than interactive theorem provers, while still providing high-end expressive capability.

And they’re growing in use in industrial applications. In the Cryptol toolset, for example, SMT solvers are used to determine if two implementations of a cryptographic function are functionally identical;  Microsoft uses SMT solvers for device driver verification; and they’ve been used for large scale test generation, planning and routing problems.

The challenge now is to make such power easily available to programmers without a formal methods background.

To do this, my approach is to embed the SMT solver into a good programming language, as an embedded DSL. Embedded languages let us hide significant complexity, using the host language’s combining forms to build up complex structures from simple primitives — instead of starting with a complex API in the first place.

If you already know the host language, you can program in the EDSL. If you know Haskell, you can program an SMT solver.

The code is available:

Scripting a Solver

SMT solvers are typically scripted in an imperative fashion: variables are declared, and then facts about those variables are asserted, step by step. The resulting set of declarations is passed to the solver to determine satisifiability (and to find any satisfying values to assign to the variables, if possible). From a programming language perspective, these declarations and assertions look like (typed) (pure) functions with free variables.

The standard SMT-LIB format (accepted by most SMT solvers) looks something like:

(define p::bool)
(define q::bool)
(define r::bool)
(assert (=> (and (=> p q) (=> q r)) (=> p r)))

Which we’ll write in our EDSL in Haskell as:

\p q r -> (p → q && q → r) → (p → r)

Haskell’s type inference can fill in the types in the EDSL, and infix notation also helps clean things up.

Such a proposition may be passed to a solver, like so:

> solve $ \p q r -> (p → q && q → r) → (p → r)

Satisfiable!
r => False
q => True
p => True

We start with something that looks like a Haskell function (but is actually a function in the EDSL), which can be interpreted by a solver, with Haskell’s language machinery filling in types, and checking safety.

In this post I’ll talk about how I designed yices-painless, an EDSL that embeds SMT scripting support deeply into Haskell (deeply, in that the Haskell type system will check the types of the SMT propositions). The work is based on the language embedding approach taken by Chakravarty et al’s, accelerate arrays language (for GPU programming), using higher-order abstract syntax to support elegant variable binding, with a type-preserving translation to a form that can be executed by the underlying solver.

Designing the EDSL

For a client problem at work, I was interested in trying an SMT solver.  The Yices SMT solver from SRI is a high-performance SMT solver with a well-documented C interface, which is a good place to start experimenting. The design challenge was to get from the imperative C interface we have to use, to the typed, pure Haskell embedding that we want to write.

To do this, we build up several layers of abstraction, each one removing redundancy and hiding unnecessary complexity:

  1. The base layer, Yices itself, a big blob of compiled C or C++ code, exporting 150+ functions.
  2. Yices.Painless.Base.C: Stitching Yices to Haskell via a full (and straightforward) FFI bindings (150+ Haskell functions, and a few types)
  3. Yices.Painless.Base: The FFI bindings are just as unsafe as the C layer, so in the next abstraction layer we automate the resource managment, use native Haskell types, and native Haskell abstractions (e.g. lazy lists instead of iteratees). The number of functions exposed is reduced as a result, making the API simpler. This layer is still imperative (all functions are in IO), but relatively useful.
  4. Yices.Painless.Language: The AST layers are the big step up in expressive power: we design an abstract syntax tree to represent the embedded language, along with a set of primitive functions and types. This pure data structure will be interpreted (or compiled to SMT-LIB format), generating calls to the imperative layer. We thus have very strong separation between the messy, error-prone imperative model, and the correct-by-construction pure AST.  We have just simple constants, variables, conditions and a handful of (polymorphic) primitive functions, relying on the language to build up complex expressions from a few simple forms.
  5. Surface Syntax: Finally, we expose the EDSL language via smart surface syntax, overloaded via Haskell’s type classes: Bits, Num, String. And now programmers need only use Haskell’s native math and bit functions, syntax and types, to build propositions. Type inference and checking works, as well as proper scoping of variables in the propositions. This user-facing layer looks and feels exactly like Haskell, yet it is translated to a form that drives an imperative C library.

The surface layer reveals the power of a good EDSL: hiding unnecessary complexity, increasing expressivity (compare the few functions (and combinators) in the EDSL layer at the top, with the 150+ functions in the C API).

The EDSL reduces the cognitive load on the programmer by an order of magnitude.

Implementation

The Base Layers

The base layers are  the most straightforward part of the system. First, we write  a natural FFI binding to every function exposed by the Yices C API. Each C function gets an FFI import; value types are represented by types defined in the Haskell FFI report (e.g. CInt, CString), while Yices types allocated and controlled by Yices are typed in Haskell with empty data  types (e.g. YExpr), giving us increased type safety (compared to using void pointers).

The FFI layer is then wrapped in resource handling code, hiding a set of functions from an C++ iterator behind a single function returning a lazy list). The FFI bindings are about 350 lines of code; the wrapper layer another 500. Most of this can be generated.

The Typed AST

Following Chakravarty’s approach, we represent the embedded language with a surface syntax (supporting numerical and scalar polymorphic operations), which builds a nameless AST  (variables in the EDSL are represented as Haskell variables). The typed embedding gives us a type checker for the DSL for free. The AST is decorated with type information, so we can resolve overloading (in the EDSL), and do other type-based operations. The EDSL itself is just over 1000 lines of Haskell.

The language is relatively simple: variables (of any scalar type), literals (of any scalar type), conditionals, and a range of built-in functions. Variables are bound/quantified at the top level of the EDSL.

data Exp t where
    Tag         :: (IsScalar t) => Int        -> Exp t
    Const       :: (IsScalar t) => t          -> Exp t
    Cond        :: Exp Bool -> Exp t -> Exp t -> Exp t
    PrimApp     :: PrimFun (a -> r) -> Exp a  -> Exp r

data PrimFun t where
    PrimEq   :: ScalarType a -> PrimFun ((a, a) -> Bool)
    PrimNEq  :: ScalarType a -> PrimFun ((a, a) -> Bool)

    PrimAdd  :: NumType a -> PrimFun ((a, a) -> a)
    PrimSub  :: NumType a -> PrimFun ((a, a) -> a)
    PrimMul  :: NumType a -> PrimFun ((a, a) -> a)
    ...

We can expose these via a type class interface for Exp things, yielding nice surface syntax: (+), (*), (==*), and (/=*). It isn’t possible to write sensible instance of Eq, sadly, as those Haskell type classes wire in return types of Bool (instead of a polymorphic Boolean class). We have to redefine (==) instead.

instance (IsNum t) => Num (Exp t) where
  (+)         = mkAdd
  (-)         = mkSub
  (*)         = mkMul

infix 4 ==*, /=*, <*, <=*, >*, >=*

-- | Equality lifted into Yices expressions.
(==*) :: (IsScalar t) => Exp t -> Exp t -> Exp Bool
(==*) = mkEq

-- |Inequality lifted into Yices expressions.
(/=*) :: (IsScalar t) => Exp t -> Exp t -> Exp Bool
(/=*) = mkNEq

The smart constructors build AST nodes, with type information added:

mkAdd :: (IsNum t) => Exp t -> Exp t -> Exp t
mkAdd x y = PrimAdd numType `PrimApp` tup2 (x, y)

mkMul :: (IsNum t) => Exp t -> Exp t -> Exp t
mkMul x y = PrimMul numType `PrimApp` tup2 (x, y)

mkEq   :: (IsScalar t) => Exp t -> Exp t -> Exp Bool
mkEq   x y = PrimEq  scalarType `PrimApp` tup2 (x, y)

mkNEq  :: (IsScalar t) => Exp t -> Exp t -> Exp Bool
mkNEq  x y = PrimNEq scalarType `PrimApp` tup2 (x, y)

Converting to de Bruijn form

This nameless form (via surface syntax) is convenient to write, and solves all the issues with binders for us, but harder to optimize (as propositions are reprsented as opaque functions). So this form is then converted to a form with explicit variables (represented as de Bruijn indices into a variable environment). The approach is identical to the one taken in the accelerate EDSL, with the change that at the top level we support n-ary functions (to represent propositions with n free variables).

Thanks to Trevor Elliott, and for historical reference, we can convert n-ary functions to their de Bruijn form via a type class, yielding a common type for propositions with arbitrary variables: :: Yices f r => f -> Prop r

Running the Solver

Once we’ve built the AST for the proposition, and replaced all variables with their de Bruijn tags, we can now execute the proposition, recursively calling Yices operations to reconstruct the AST on the Yices side, before asking to solve, finding a satisfying assignment of variables (or not), and returning a model of the bindings.

solve :: (Yices f r) => f -> IO Result
solve q = do
    c <- Yices.mkContext
    Yices.setTypeChecker True
    let t = convertYices q -- convert to de Bruijn form
    (g,e) <- execY c t     -- build a Yices expression representing the AST
    Yices.assert c e       -- register it with Yices
    Yices.check c          -- check for satisfiability, returning the result

Related Work

There’s quite a few bindings to Yices (in particular) in Haskell, but none on Hackage (at the time of writing) done as a deep EDSL (ensuring type safety and lexical scoping). The original inspiration for  the work was yices-easy, which exposes a monadic, shallow DSL — and has a good tutorial.

Significant related work is by my colleague at Galois, Levent Erkok, who simultaneously developed “SBV”, a deep EDSL that does support sharing and SMT-LIB access, via heavy type class use. We’ll be sharing code.

The approach to embedding taken is based on the one found in accelerate, which shares similarities with Atkey, Lindley and Yallop‘s approach. The general idea of de Bruijn/HOAS translations was introduced by McBride and McKinna.

Future Work

At the moment, this is basically a proof of concept. Simple propositions can be solved, but its not ready for large scale problems. A few things need to be done:

  • Support for sharing in the AST. The EDSL currently loses sharing information, duplicating terms. This isn’t feasible for large problems.
  • Operations that require certain sizes of bit vector should check those sizes (e.g. via Iavor Diatchki’s type level natural support for Haskell).
  • Support deriving encodings from new Haskell types into Yices types.
  • Support for lists, tuples and arrays of variables (some problems have very large numbers of variables)
  • Supporting function types.
  • Compile the AST to SMT-LIB statements (instead of Yices FFI calls), making it independent of a particular SMT solver (or solver’s license).
  • Use it on some big problems.

I’ll be continuing work on this in 2011, and would welcome patches, feedback and use cases.

Control.Monad.Inception

I saw Inception a few months ago. At the time I was struck by how coherent the computational environment describe in the movie was. I suspect “Inception” forms a monadic environment for computation (though not necessarily a useful one), and have been meaning to try to implement it. However, time is short, so here are my notes from the time (August 23rd) on the semantics of Inception, in the hope that someone else will implement Control.Monad.Inception.

Semantics of the Inception computational environment

  • “Dream” == thread of control == forkIO
  • A thread must stay at each level (thread group) for it not to collapse (dreamer)
  • Messages can be sent down (music) – e.g. async message passing
  • A kick exits a thread up one level (“die”) – throwTo KillThread
  • Time degrades quickly 10, 3 mins , 10 mins , 10 years – so adjust scheduler time slices based on levels
  • Groups of threads brought together by the dreamer can enter a new environment together (collaborative dreaming) – shared heap
  • Arbitrary effects are allowed, (yep, it has IO at the bottom)
  • Under sedation, you can disassociate a thread from its control stack, and it is in limbo (some “init” thread group), until it remembers.
  • Tokens are (unique) depth counters (0, many).

Your challenge:

  • Improve Claessen’s “poor man’s concurrency monad” to support the Inception environment.
  • What notion of `bind` and `return` are used?
  • Show that the monad laws are satisfied.
  • import Control.Monad.Inception and win!

Refer to the published analysis on the use of the inception monad you need to be able to support.

Hackage, Cabal and the Haskell Platform: The Second Year

The Haskell Implementors Workshop was held in Baltimore, Oct 1, 2010. Duncan Coutts from Well-Typed and I presented a status report on the Haskell distribution infrastructure:

You can read the slides as PDF here, or online:

The amount of freely available Haskell code has grown exponentially in the past
two years as Hackage and Cabal have come online. Managing millions of Haskell
code, partitioned thousands of interdependent packages is a serious engineering
challenge that has received little attention from the language research community.
Meanwhile, new adopters of Haskell struggle to deal with the sheer number of
libraries and tools now available.

One pragmatic approach to managing this web is the Haskell Platform (HP), a
project to build a blessed, comprehensive set of libraries meeting objective
quality control criteria, and in doing so make expert recommendations on which
packages to use. In its first six weeks of operation the HP had over forty
thousand downloads.

The challenge with such a project is to manage the many conflict constraints
for diversity, coverage, and quality when assembling the package set. This talk
will outline the state of the Haskell Platform, the technical approaches taken
to build it, and the roadmap ahead.

Practical Haskell: scripting with types

I had the pleasure to give a new talk today, on design in functional programming — types, abstractions and monads — using the motivating example of scripting. The slides are below and a PDF version is available.

Shell scripts are often a quick, dirty way to get the job done. You glue
together external tools, maybe do a little error checking and process all data
as strings. This is great for some very simple problems but as requirements
change and more is demanded from the code shell scripts become unwieldy and
fragile. When they get large, they become slow and difficult to maintain. If
you need to write robust code then shell is not the way to go.

In this talk at an alternative: how to use Haskell as a type checked and
natively compiled language for scripting tasks. By refining the semantics of
the problem domain, employing abstraction, we produce shorter and more robust
code, that is more maintainable and scalable.

Haskell Platform 2010.2.0.0 is live!

We’re pleased to announce the fifth release of the Haskell Platform: a single, standard Haskell distribution for everyone.

The specification, along with installers (including Windows, Apple and
Unix installers for a full Haskell environment) are available.

The Haskell Platform is a single, standard Haskell distribution for every system, in the form of a blessed library and tool suite for Haskell distilled from the thousands of libraries on Hackage, along with installers for a wide variety of systems. It saves developers work picking and choosing the best Haskell libraries and tools to use for a task.

When you install the Haskell Platform, you get the latest stable compiler, an expanded set of core libraries, additional development tools, and cabal-install – so you can download anything else you need from Hackage.

What you get is specified here.

– The Platform Infrastructure Team

Engineering Large Projects in a Functional Language

I had the opportunity to speak at DevNation on July 10th in Portland, and gave the following talk, an updated version of Galois’ collective experiences developing Haskell projects over the past decade. You can download the .pdf.

Abstract

Galois has been building software systems in Haskell for the past decade. This talk describes some of what we’ve learned about in-the-large, commercial Haskell programming in that time. I’ll look at when and where we use Haskell. At correctness, productivity, scalabilty, maintainability, and what language features we like: types, purity, types, abstractions, types, concurrency, types!

We’ll also look at the Haskell toolchain: FFI, HPC, Cabal, compiler, libraries, build systems, etc, and being a commercial entity in a largely open source community.

Follow

Get every new post delivered to your Inbox.

Join 53 other followers