Have you tried examples where the C compiler (eg. icc from Intel) manages to vectorize the code? Eg.

float f(float *fs, int n) {

float r = 0.0f;

int i;

for(i = 0; i < n; i++)

r += (fs[i] * 3.14f) – (fs[i] + fs[1]);

return r;

}

where we'd expect the use of 'packed' sse instructions? On recent machines this should be quite a bit faster.

Are there some from-the-trenches tricks we can use to get close to the same performance in Haskell using GHC?

]]>We can make mean have type [Double]-> Double and have list enumeration fused by writing:

{-# RULES

“toU/enumFromTo” forall (n::Double) (m::Double).

toU (enumFromTo n m) = enumFromToFracU n m;

#-}

mean :: [Double] -> Double

mean xs = s / fromIntegral l

where

P s l = foldlU k (P 0 0) (toU xs)

k (P s l) val = P (s+val) (l+1)

This is great!

However I have to admit that I find those ghc optimizations flags quite obscure (use of -funbox-strict-field for exemple). Details play a great role here!

]]>I’m very novice with Haskell and so I quickly got stuck.

I suppose [1 .. d] is a syntactic sugar for (enumFromTo 1 d), so I made this rule in my file:

{-# RULES

“RealFrac.toU/enumFromTo” (RealFrac n, RealFrac m) => forall n m.

toU (enumFromTo n m) = enumFromToFracU n m

#-}

It doesn’t work unfortunatly. Could You have any guidance? Am I missing something or am I completly misleaded?

]]>