<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Write Haskell as fast as C: exploiting strictness, laziness and recursion</title>
	<atom:link href="http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/feed/" rel="self" type="application/rss+xml" />
	<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/</link>
	<description>A Journal of Haskell Programming</description>
	<lastBuildDate>Mon, 03 Oct 2011 02:09:43 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Don Stewart</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-74</link>
		<dc:creator><![CDATA[Don Stewart]]></dc:creator>
		<pubDate>Tue, 10 Mar 2009 16:42:03 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-74</guid>
		<description><![CDATA[@George

Yes, looks like on 32 bits GCC is being smarter with the floating point generated from C, compared to the Haskell stuff. While on x86_64 it does the same (better) thing.]]></description>
		<content:encoded><![CDATA[<p>@George</p>
<p>Yes, looks like on 32 bits GCC is being smarter with the floating point generated from C, compared to the Haskell stuff. While on x86_64 it does the same (better) thing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: George Giorgidze</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-73</link>
		<dc:creator><![CDATA[George Giorgidze]]></dc:creator>
		<pubDate>Tue, 10 Mar 2009 16:28:42 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-73</guid>
		<description><![CDATA[If used with -fexcess-precision some of the fstp/fld (storing and loading of floating points) instructions are removed from the loop, but not all of them:

  .type     s1qL_info, @function
# 169 &quot;/tmp/ghc5427_0/ghc5427_0.hc&quot; 1
# 0 &quot;&quot; 2
  fldl        16(%ebp)
  fldl        1(%esi)
  fxch        %st(1)
  fucom       %st(1)
  fnstsw      %ax
  fstp        %st(1)
  sahf
  ja  .L21
  fld1
  fldl        8(%ebp)
  fadd        %st(1), %st
  fldl        (%ebp)
  fadd        %st(3), %st
  fxch        %st(3)
  faddp       %st, %st(2)
  fxch        %st(1)
  fstpl       16(%ebp)
  fstpl       8(%ebp)
  fstpl       (%ebp)
  jmp s1qL_info
  .p2align 4,,7
  .p2align 3
.L21:
  fstp        %st(0)
  fldl        (%ebp)
  fdivl       8(%ebp)
  addl        $24, %ebp
  fstpl       56(%ebx)
  movl        (%ebp), %eax
  jmp *%eax


This makes the Haskell program a lot faster:

&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs
&gt; time ./h_mean 1e9
&gt; real	0m16.634s

By inlineing the mean function into the main function things get even more faster:

{-# INLINE mean #-}
mean :: Double -&gt; Double -&gt; Double
...

&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs
&gt; time ./h_mean 1e9
&gt; real	0m9.625s

By replacing the length counter of type Int with type Double things get even more faster:

        go :: Double -&gt; Double -&gt; Double -&gt; Double
        go s l x &#124; x &gt; m      = s / l
                 &#124; otherwise  = go (s + x) (l + 1) (x + 1)

&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs
&gt; time ./h_mean 1e9
&gt; real	0m4.842s

To be honest I do not really understand (yet) why the last two transformations speeded the Haskell version up. In the case of C version the last transformation had no effect on its performance.

To summarise, hear are the final results (for now) on my system:

&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs
&gt; time ./h_mean 1e9
&gt; real	0m4.842s

&gt; gcc -O2 -o c_mean src/mean.c
&gt; time ./c_mean 1e9
&gt; real	0m2.770s

So, on my system C remains about twice as fast. For now, I am not able to make the Haskell version any faster. Any pointers in this direction will be very much appreciated. In particular, how can I get rid off the remaining fstp/fld instructions in the Haskell version.

I should also note that Don uses x86_64 architecture and I am using i686 (though my CPU, see above, has 64 bit support). I am not sure but this might be the main reason of different results. I do not have access to x86_64 machine to confirm this doubt.

Maybe it is time to upgrade my Arch Linux to x86_64 version.

Anyway, thanks for your post and helpful comments on my struggles. I learned a lot from this example.

Cheers, George]]></description>
		<content:encoded><![CDATA[<p>If used with -fexcess-precision some of the fstp/fld (storing and loading of floating points) instructions are removed from the loop, but not all of them:</p>
<p>  .type     s1qL_info, @function<br />
# 169 &#8220;/tmp/ghc5427_0/ghc5427_0.hc&#8221; 1<br />
# 0 &#8220;&#8221; 2<br />
  fldl        16(%ebp)<br />
  fldl        1(%esi)<br />
  fxch        %st(1)<br />
  fucom       %st(1)<br />
  fnstsw      %ax<br />
  fstp        %st(1)<br />
  sahf<br />
  ja  .L21<br />
  fld1<br />
  fldl        8(%ebp)<br />
  fadd        %st(1), %st<br />
  fldl        (%ebp)<br />
  fadd        %st(3), %st<br />
  fxch        %st(3)<br />
  faddp       %st, %st(2)<br />
  fxch        %st(1)<br />
  fstpl       16(%ebp)<br />
  fstpl       8(%ebp)<br />
  fstpl       (%ebp)<br />
  jmp s1qL_info<br />
  .p2align 4,,7<br />
  .p2align 3<br />
.L21:<br />
  fstp        %st(0)<br />
  fldl        (%ebp)<br />
  fdivl       8(%ebp)<br />
  addl        $24, %ebp<br />
  fstpl       56(%ebx)<br />
  movl        (%ebp), %eax<br />
  jmp *%eax</p>
<p>This makes the Haskell program a lot faster:</p>
<p>&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs<br />
&gt; time ./h_mean 1e9<br />
&gt; real	0m16.634s</p>
<p>By inlineing the mean function into the main function things get even more faster:</p>
<p>{-# INLINE mean #-}<br />
mean :: Double -&gt; Double -&gt; Double<br />
&#8230;</p>
<p>&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs<br />
&gt; time ./h_mean 1e9<br />
&gt; real	0m9.625s</p>
<p>By replacing the length counter of type Int with type Double things get even more faster:</p>
<p>        go :: Double -&gt; Double -&gt; Double -&gt; Double<br />
        go s l x | x &gt; m      = s / l<br />
                 | otherwise  = go (s + x) (l + 1) (x + 1)</p>
<p>&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs<br />
&gt; time ./h_mean 1e9<br />
&gt; real	0m4.842s</p>
<p>To be honest I do not really understand (yet) why the last two transformations speeded the Haskell version up. In the case of C version the last transformation had no effect on its performance.</p>
<p>To summarise, hear are the final results (for now) on my system:</p>
<p>&gt; ghc -O2 -fexcess-precision -fvia-C -optc-O2 -o h_mean src/Mean.hs<br />
&gt; time ./h_mean 1e9<br />
&gt; real	0m4.842s</p>
<p>&gt; gcc -O2 -o c_mean src/mean.c<br />
&gt; time ./c_mean 1e9<br />
&gt; real	0m2.770s</p>
<p>So, on my system C remains about twice as fast. For now, I am not able to make the Haskell version any faster. Any pointers in this direction will be very much appreciated. In particular, how can I get rid off the remaining fstp/fld instructions in the Haskell version.</p>
<p>I should also note that Don uses x86_64 architecture and I am using i686 (though my CPU, see above, has 64 bit support). I am not sure but this might be the main reason of different results. I do not have access to x86_64 machine to confirm this doubt.</p>
<p>Maybe it is time to upgrade my Arch Linux to x86_64 version.</p>
<p>Anyway, thanks for your post and helpful comments on my struggles. I learned a lot from this example.</p>
<p>Cheers, George</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: George Giorgidze</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-65</link>
		<dc:creator><![CDATA[George Giorgidze]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 02:43:51 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-65</guid>
		<description><![CDATA[Using ghc-core I had a look at inner loop instructions (I am not expert in this stuff, I hope i got it right). I was looking for a division instruction and for the preceding loop.

C version:

.L9:
        fxch    %st(2)
        fxch    %st(1)
.L4:
        fadd    %st(1), %st
        fxch    %st(1)
        addl    $1, %edx
        fadds   .LC0
        fxch    %st(2)
        fucom   %st(2)
        fnstsw  %ax
        sahf
        jae     .L9
        fstp    %st(0)
        fstp    %st(1)
        pushl   %edx
        fildl   (%esp)
        addl    $4, %esp
.L3:
        fdivrp  %st, %st(1)
        movl    $.LC2, (%esp)
        fstpl   4(%esp)
        call    printf

Haskell:

 .type     s1rK_info, @function
# 97 &quot;/tmp/ghc10380_0/ghc10380_0.hc&quot; 1
# 0 &quot;&quot; 2
  fldl        12(%ebp)
  fstpl       32(%esp)
  fldl        32(%ebp)
  fstpl       24(%esp)
  fldl        32(%esp)
  fldl        24(%esp)
  fxch        %st(1)
  fucom       %st(1)
  fnstsw      %ax
  fstp        %st(1)
  sahf
  ja  .L12
  fld %st(0)
  movl        8(%ebp), %eax
  fadds       .LC0
  addl        $1, %eax
  movl        %eax, 8(%ebp)
  fstpl       48(%esp)
  fldl        (%ebp)
  fstpl       16(%esp)
  faddl       16(%esp)
  fstpl       56(%esp)
  fldl        48(%esp)
  fstpl       12(%ebp)
  fldl        56(%esp)
  fstpl       (%ebp)
  jmp s1rK_info
  .p2align 4,,7
  .p2align 3
.L12:
  fstp        %st(0)
  fildl       8(%ebp)
  fstpl       40(%esp)
  fldl        (%ebp)
  addl        $40, %ebp
  fstpl       8(%esp)
  fldl        8(%esp)
  fdivl       40(%esp)
  fstpl       64(%esp)
  fldl        64(%esp)
  fstpl       56(%ebx)
  movl        (%ebp), %eax
  jmp *%eax
  .size        s1rK_info, .-s1rK_info

It seems to me that Haskell loop has lots of un-necessary fstp/fld (storing and loading of floating points) in the loop. However, in the core worker function for mean uses only unboxed types.

Cheers, George

P.S. I liked this ghc-core tool.]]></description>
		<content:encoded><![CDATA[<p>Using ghc-core I had a look at inner loop instructions (I am not expert in this stuff, I hope i got it right). I was looking for a division instruction and for the preceding loop.</p>
<p>C version:</p>
<p>.L9:<br />
        fxch    %st(2)<br />
        fxch    %st(1)<br />
.L4:<br />
        fadd    %st(1), %st<br />
        fxch    %st(1)<br />
        addl    $1, %edx<br />
        fadds   .LC0<br />
        fxch    %st(2)<br />
        fucom   %st(2)<br />
        fnstsw  %ax<br />
        sahf<br />
        jae     .L9<br />
        fstp    %st(0)<br />
        fstp    %st(1)<br />
        pushl   %edx<br />
        fildl   (%esp)<br />
        addl    $4, %esp<br />
.L3:<br />
        fdivrp  %st, %st(1)<br />
        movl    $.LC2, (%esp)<br />
        fstpl   4(%esp)<br />
        call    printf</p>
<p>Haskell:</p>
<p> .type     s1rK_info, @function<br />
# 97 &#8220;/tmp/ghc10380_0/ghc10380_0.hc&#8221; 1<br />
# 0 &#8220;&#8221; 2<br />
  fldl        12(%ebp)<br />
  fstpl       32(%esp)<br />
  fldl        32(%ebp)<br />
  fstpl       24(%esp)<br />
  fldl        32(%esp)<br />
  fldl        24(%esp)<br />
  fxch        %st(1)<br />
  fucom       %st(1)<br />
  fnstsw      %ax<br />
  fstp        %st(1)<br />
  sahf<br />
  ja  .L12<br />
  fld %st(0)<br />
  movl        8(%ebp), %eax<br />
  fadds       .LC0<br />
  addl        $1, %eax<br />
  movl        %eax, 8(%ebp)<br />
  fstpl       48(%esp)<br />
  fldl        (%ebp)<br />
  fstpl       16(%esp)<br />
  faddl       16(%esp)<br />
  fstpl       56(%esp)<br />
  fldl        48(%esp)<br />
  fstpl       12(%ebp)<br />
  fldl        56(%esp)<br />
  fstpl       (%ebp)<br />
  jmp s1rK_info<br />
  .p2align 4,,7<br />
  .p2align 3<br />
.L12:<br />
  fstp        %st(0)<br />
  fildl       8(%ebp)<br />
  fstpl       40(%esp)<br />
  fldl        (%ebp)<br />
  addl        $40, %ebp<br />
  fstpl       8(%esp)<br />
  fldl        8(%esp)<br />
  fdivl       40(%esp)<br />
  fstpl       64(%esp)<br />
  fldl        64(%esp)<br />
  fstpl       56(%ebx)<br />
  movl        (%ebp), %eax<br />
  jmp *%eax<br />
  .size        s1rK_info, .-s1rK_info</p>
<p>It seems to me that Haskell loop has lots of un-necessary fstp/fld (storing and loading of floating points) in the loop. However, in the core worker function for mean uses only unboxed types.</p>
<p>Cheers, George</p>
<p>P.S. I liked this ghc-core tool.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dons00</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-64</link>
		<dc:creator><![CDATA[dons00]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 01:36:13 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-64</guid>
		<description><![CDATA[Usual benchmarking rules apply: if you&#039;re out by a factor of 20x, then you&#039;re doing something wrong.

Check carefully it is being compiled properly, that the flags you think it is using are correct, and if all else fails, look at the generated code (the ghc-core tool is helpful here).]]></description>
		<content:encoded><![CDATA[<p>Usual benchmarking rules apply: if you&#8217;re out by a factor of 20x, then you&#8217;re doing something wrong.</p>
<p>Check carefully it is being compiled properly, that the flags you think it is using are correct, and if all else fails, look at the generated code (the ghc-core tool is helpful here).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: George Giorgidze</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-63</link>
		<dc:creator><![CDATA[George Giorgidze]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 01:29:01 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-63</guid>
		<description><![CDATA[I am sure. Here is the code:

module Main where

import System.Environment

mean :: Double -&gt; Double -&gt; Double
mean n m = go 0 0 n
    where
        go :: Double -&gt; Int -&gt; Double -&gt; Double
        go s l x &#124; x &gt; m      = s / fromIntegral l
                 &#124; otherwise  = go (s+x) (l+1) (x+1)

main :: IO ()
main = do
    [d] &lt;- map read `fmap` getArgs
    print (mean 1 d)]]></description>
		<content:encoded><![CDATA[<p>I am sure. Here is the code:</p>
<p>module Main where</p>
<p>import System.Environment</p>
<p>mean :: Double -&gt; Double -&gt; Double<br />
mean n m = go 0 0 n<br />
    where<br />
        go :: Double -&gt; Int -&gt; Double -&gt; Double<br />
        go s l x | x &gt; m      = s / fromIntegral l<br />
                 | otherwise  = go (s+x) (l+1) (x+1)</p>
<p>main :: IO ()<br />
main = do<br />
    [d] &lt;- map read `fmap` getArgs<br />
    print (mean 1 d)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dons00</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-62</link>
		<dc:creator><![CDATA[dons00]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 01:14:34 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-62</guid>
		<description><![CDATA[@George

Yes, the assembly above was obtained today (2009-03-08) with GHC 6.10, and GCC 4.3.3 on Linux x86_64. Runtimes are still much the same:

 GCC -O2 1.966s
 GHC -O2 -fvia-C -optc-O3 2.080

Are you sure you&#039;re using the non-list based version?]]></description>
		<content:encoded><![CDATA[<p>@George</p>
<p>Yes, the assembly above was obtained today (2009-03-08) with GHC 6.10, and GCC 4.3.3 on Linux x86_64. Runtimes are still much the same:</p>
<p> GCC -O2 1.966s<br />
 GHC -O2 -fvia-C -optc-O3 2.080</p>
<p>Are you sure you&#8217;re using the non-list based version?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: George Giorgidze</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-61</link>
		<dc:creator><![CDATA[George Giorgidze]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 01:07:16 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-61</guid>
		<description><![CDATA[Hi Don,

Thanks for your timely reply.

I am using exactly the same version of GCC (4.3.3 the one which comes with Arch Linux) and tried exactly the same flags. Unfortunately, results for GHC got even worse.

&gt; gcc -O2 -o c_mean src/mean.c
&gt; ghc -O2 -fvia-C -optc-O3 -o h_mean src/Mean.hs

&gt; time ./c_mean 1e9
500000000.500000
real	0m2.572s

&gt; time ./h_mean 1e9
5.00000000067109e8
real	0m41.289s

You are right I should dig into the assembly codes and I am digging now.

Meanwhile, can you confirm that you get the similar (to your original post) results with newer versions of GHC. Results in your post are obtained with GHC 6.8.

Once again thanks for your excellent post and your help.

Cheers, George]]></description>
		<content:encoded><![CDATA[<p>Hi Don,</p>
<p>Thanks for your timely reply.</p>
<p>I am using exactly the same version of GCC (4.3.3 the one which comes with Arch Linux) and tried exactly the same flags. Unfortunately, results for GHC got even worse.</p>
<p>&gt; gcc -O2 -o c_mean src/mean.c<br />
&gt; ghc -O2 -fvia-C -optc-O3 -o h_mean src/Mean.hs</p>
<p>&gt; time ./c_mean 1e9<br />
500000000.500000<br />
real	0m2.572s</p>
<p>&gt; time ./h_mean 1e9<br />
5.00000000067109e8<br />
real	0m41.289s</p>
<p>You are right I should dig into the assembly codes and I am digging now.</p>
<p>Meanwhile, can you confirm that you get the similar (to your original post) results with newer versions of GHC. Results in your post are obtained with GHC 6.8.</p>
<p>Once again thanks for your excellent post and your help.</p>
<p>Cheers, George</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dons00</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-60</link>
		<dc:creator><![CDATA[dons00]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 00:14:54 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-60</guid>
		<description><![CDATA[@George

The absolutely only way to see what is going on is to look at the assembly produced. This is highly dependent on which GCC version you use. In this case,

  gcc (GCC) 4.3.3

The other issue I see is that you&#039;re using different flags:

  gcc -O2

and

  ghc -O2 -fvia-C -optc-O3

Then check the resulting assembly (e.g. with the ghc-core tool):

    s1u7_info:
      ucomisd     5(%rbx), %xmm6
      ja  .L19
      addsd       %xmm6, %xmm5
      addq        $1, %rsi
      addsd       .LC0(%rip), %xmm6
      jmp s1u7_info

    .L19:
      cvtsi2sdq   %rsi, %xmm0
      divsd       %xmm0, %xmm5]]></description>
		<content:encoded><![CDATA[<p>@George</p>
<p>The absolutely only way to see what is going on is to look at the assembly produced. This is highly dependent on which GCC version you use. In this case,</p>
<p>  gcc (GCC) 4.3.3</p>
<p>The other issue I see is that you&#8217;re using different flags:</p>
<p>  gcc -O2</p>
<p>and</p>
<p>  ghc -O2 -fvia-C -optc-O3</p>
<p>Then check the resulting assembly (e.g. with the ghc-core tool):</p>
<p>    s1u7_info:<br />
      ucomisd     5(%rbx), %xmm6<br />
      ja  .L19<br />
      addsd       %xmm6, %xmm5<br />
      addq        $1, %rsi<br />
      addsd       .LC0(%rip), %xmm6<br />
      jmp s1u7_info</p>
<p>    .L19:<br />
      cvtsi2sdq   %rsi, %xmm0<br />
      divsd       %xmm0, %xmm5</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: George Giorgidze</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-59</link>
		<dc:creator><![CDATA[George Giorgidze]]></dc:creator>
		<pubDate>Mon, 09 Mar 2009 00:08:12 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-59</guid>
		<description><![CDATA[For some reason I can&#039;t explain, I am not able to reproduce Don&#039;s results on my system. Unfortunately, in my experiments GCC wins with the huge advantage:

&gt; gcc -Wall -O3 -o c_mean src/mean.c
&gt; ghc -Wall -O3 -o h_mean src/Mean.hs

&gt; time ./c_mean 1e9
500000000.500000
real	0m2.611s

&gt; time ./h_mean 1e9
5.00000000067109e8
real	0m17.968s

&gt; ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.10.1

&gt; uname -a
Linux 2.6.28-ARCH #1 SMP PREEMPT i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux

I played with various GHC options (including -fvia-C) but without any success.

Any comments and suggestions will be very much appreciated. Can anyone else confirm the similar behaviour?

Understanding this results will help me to improve some of my own numerical Haskell code.

Cheers, George]]></description>
		<content:encoded><![CDATA[<p>For some reason I can&#8217;t explain, I am not able to reproduce Don&#8217;s results on my system. Unfortunately, in my experiments GCC wins with the huge advantage:</p>
<p>&gt; gcc -Wall -O3 -o c_mean src/mean.c<br />
&gt; ghc -Wall -O3 -o h_mean src/Mean.hs</p>
<p>&gt; time ./c_mean 1e9<br />
500000000.500000<br />
real	0m2.611s</p>
<p>&gt; time ./h_mean 1e9<br />
5.00000000067109e8<br />
real	0m17.968s</p>
<p>&gt; ghc &#8211;version<br />
The Glorious Glasgow Haskell Compilation System, version 6.10.1</p>
<p>&gt; uname -a<br />
Linux 2.6.28-ARCH #1 SMP PREEMPT i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux</p>
<p>I played with various GHC options (including -fvia-C) but without any success.</p>
<p>Any comments and suggestions will be very much appreciated. Can anyone else confirm the similar behaviour?</p>
<p>Understanding this results will help me to improve some of my own numerical Haskell code.</p>
<p>Cheers, George</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Using Mutable Arrays for Faster Sorting &#171; Haskellville</title>
		<link>http://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-38</link>
		<dc:creator><![CDATA[Using Mutable Arrays for Faster Sorting &#171; Haskellville]]></dc:creator>
		<pubDate>Thu, 19 Feb 2009 22:58:18 +0000</pubDate>
		<guid isPermaLink="false">http://donsbot.wordpress.com/?p=99#comment-38</guid>
		<description><![CDATA[[...] as Don Stewart explains in his blogpost &#8220;A Journal of Haskell Programming Write Haskell as fast as C: exploiting strictness, laziness ... we may use GHC core to gain better performance.     Posted by Mads Lindstrøm Filed in [...]]]></description>
		<content:encoded><![CDATA[<p>[...] as Don Stewart explains in his blogpost &#8220;A Journal of Haskell Programming Write Haskell as fast as C: exploiting strictness, laziness &#8230; we may use GHC core to gain better performance.     Posted by Mads Lindstrøm Filed in [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
