Why is GHC so large/big?

18,752

Solution 1

It's a bit silly really. Every library that comes with GHC is provided in no less than 4 flavours:

  • static
  • dynamic
  • profiled
  • GHCi

The GHCi version is just the static version linked together in a single .o file. The other three versions all have their own set of interface files (.hi files) too. The profiled versions seem to be about twice the size of the unprofiled versions (which is a bit suspicious, I should look into why that is).

Remember that GHC itself is a library, so you're getting 4 copies of GHC. Not only that, but the GHC binary itself is statically linked, so that's 5 copies of GHC.

We recently made it so that GHCi could use the static .a files. That will allow us to get rid of one of these flavours. Longer term, we should dynamically link GHC, but that's a bigger change because that would entail making dynamic linking the default - unlike in C, with GHC you have to decide up front whether you're going to link dynamically or not. And we need more changes (e.g. to Cabal and the package system, amongst other things) before this is really practical.

Solution 2

Probably we should compare apples to apples and oranges to oranges. JRE is a runtime, not a developer kit. We may compare: source size of the development kit, the size of the compiled development kit and the compiled size of the minimal runtime.

OpenJDK 7 source bundle is 82 MB (download.java.net/openjdk/jdk7) vs GHC 7 source bundle, which is 23 MB (haskell.org/ghc/download_ghc_7_0_1). GHC is not big here. Runtime size: openjdk-6-jre-headless on Ubuntu is 77 MB uncompressed vs Haskell helloworld, statically linked with its runtime, which is <1 MB. GHC is not big here.

Where GHC is big, is the size of the compiled development kit:

GHC disk usage

GHC itself takes 270 MB, and with all the libraries and utilities that come together it takes over 500 MB. And yes, it's a lot, even with base libraries and a build tool/dependency manager. Java development platform is smaller.

GHC:

$ aptitude show ghc6 | grep Size
Uncompressed Size: 388M

against OpenJDK withdependencies:

$ aptitude show openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless ant maven2 ivy | grep Size
Uncompressed Size: 34.9M
Uncompressed Size: 905k
Uncompressed Size: 77.3M
Uncompressed Size: 1,585k
Uncompressed Size: 3,736k
Uncompressed Size: 991k

But it is still more than 100 MB, not 26 MB as you write.

Heavyweight things in ghc6 and ghc6-prof are:

$ dpkg -L ghc6 | grep '\.a$' | xargs ls -1ks | sort -k 1 -n -r | head -3
57048 /usr/lib/ghc-6.12.1/ghc-6.12.1/libHSghc-6.12.1.a
22668 /usr/lib/ghc-6.12.1/Cabal-1.8.0.2/libHSCabal-1.8.0.2.a
21468 /usr/lib/ghc-6.12.1/base-4.2.0.0/libHSbase-4.2.0.0.a
$ dpkg -L ghc6-prof | grep '\.a$' | xargs ls -1ks | sort -k 1 -n -r | head -3
112596 /usr/lib/ghc-6.12.1/ghc-6.12.1/libHSghc-6.12.1_p.a
 33536 /usr/lib/ghc-6.12.1/Cabal-1.8.0.2/libHSCabal-1.8.0.2_p.a
 31724 /usr/lib/ghc-6.12.1/base-4.2.0.0/libHSbase-4.2.0.0_p.a

Please note how big is libHSghc-6.12.1_p.a. So the answer seems to be static linking and profiling versions for every library out there.

Solution 3

My guess -- lots and lots of static linking. Each library needs to statically link its dependencies, which in turn need to statically link theirs and soforth. And this is all compiled often both with and without profiling, and even without profiling the binaries aren't stripped and so hold lots of debugger information.

Solution 4

Because it bundles gcc and a bunch of libraries, all statically linked.

At least on Windows.

Solution 5

Here's the directory size breakdown on my box:

https://spreadsheets.google.com/ccc?key=0AveoXImmNnZ6dDlQeHY2MmxPcEYzYkpweEtDSS1fUlE&hl=en

It looks like the largest directory (123 MB) is the binaries for compiling the compiler itself. The documents weigh in at an astounding 65 MB. Third place is Cabal at 41 MB.

The bin directory is 33 MB, and I think that only a subset of that is what's technically required to build Haskell applications.

Share:
18,752

Related videos on Youtube

Christopher Done
Author by

Christopher Done

Updated on September 06, 2020

Comments

  • Christopher Done
    Christopher Done over 3 years

    Is there a simple answer: Why is GHC so big?

    • OCaml: 2MB
    • Python: 15MB
    • SBCL: 9MB
    • OpenJRE - 26MB
    • GHC: 113MB

    Not interested in evangelism of "Why I shouldn't care about the size if Haskell is the right tool"; this is a technical question.

    • Jacob
      Jacob about 13 years
      Where are you getting this 500MB from? My GHC is nowhere close to that big.
    • Jacob
      Jacob about 13 years
      Unless you count all of the libraries, I guess...
    • Christopher Done
      Christopher Done about 13 years
      Sorry, I was going off a package manager download which includes some deps. I updated it to reflect the download size from the web site. I added an Edit summary but it didn't appear here (yet?). I think the question still stands. It's big.
    • sastanin
      sastanin about 13 years
      Probably we should compare apples to apples and oranges to oranges. JRE is a runtime, not a developer kit. OpenJDK 7 source bundle, 82 MB (download.java.net/openjdk/jdk7) vs GHC 7 source bundle, 23 MB (haskell.org/ghc/download_ghc_7_0_1). Now runtime: openjdk-6-jre-headless on Ubuntu, 77 MB uncompressed vs Haskell helloworld, statically linked with its runtime, <1 MB.
    • AnneTheAgile
      AnneTheAgile over 9 years
      Today I was curious on the sizes now 2014. It seems like the argument still holds. I found URLS: 1.GHC haskell.org/ghc/download_ghc_7_8_3 ; 2.OpenJCK packages.ubuntu.com/precise/openjdk-7-jdk
  • John L
    John L about 13 years
    I probably wouldn't mind if GHC switched to a whole-program, recompile almost everything model, similar to jhc. It might even compile faster if it would keep 'ld' from swapping.
  • fuz
    fuz about 13 years
    Let me add something to this: If you only take the barebone compiler and strip anything not absolutely needed, (like building the compiler unprofiled, stripped etc), you can go down to about 5 MB. But try to compare the compilers size with GCC. (Edited the comment, so I had to delete it... sorry)
  • mcandre
    mcandre about 13 years
    And here I thought it was all the logic that Haskell offers: lazy evaluation, type inference, etc.
  • comonad
    comonad about 13 years
    no, not on linux. it only depends on gcc. because windows has no gcc in its "distribution", it has to come with ghc.
  • Earth Engine
    Earth Engine over 10 years
    So, 113MB / 4 ~= 28MB, still bigger than OpenJRE... But consider GHC is comparable to OpenJDK, not just JRE, it makes me feel better.
  • AnneTheAgile
    AnneTheAgile over 9 years
    Now that I think GHC uses dynamic linking, perhaps Dr. @Simon Marlow's ideas for compression of the four flavors is more practical? Cites: 1.#3658 (Dynamically link GHCi (and use system linker) on platforms that support it) – GHC ghc.haskell.org/trac/ghc/ticket/3658; 2.#8266 (Dynamic linking on Mac) – GHC ghc.haskell.org/trac/ghc/ticket/8266 ; 3.#8376 (Static Executable + GHC API (+ Dynamic Linking?) gives Segfault) – GHC
  • zumalifeguard
    zumalifeguard over 2 years
    ghc Version 9.0.1, Hello.exe is 11,880,549 bytes. Version 8.x was slightly less, hovering around 10MB.
  • zumalifeguard
    zumalifeguard over 2 years
    On ubuntu, it makes an image that's 896,168 in size (static). 16,592 with -dynamic. -dynamic didn't work on Windows.
  • nponeccop
    nponeccop over 2 years
    Well, to be more pedantic, it's because Windows has rpm but not yum, so there is no easy way to fetch dependencies so every application just bundles all its dependencies in Docker/Snap fashion. Also MSI dependencies exist but are hardly ever used due to overengineering of MSI. This may change when and if a new packaging tech is eventually adopted (likely a successor to AppX)
  • nponeccop
    nponeccop over 2 years
    7.6.3 is the last version of GHC that shipped dynamic base. Then dynamic linking was broken due to various reasons, and then they partially fixed it so it's possible to make dynamic hello world now, but you must build your own GHC with support, as it's not production ready yet after so many years since 7.6.3. Also, can you find stripped sizes?