Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune builds #2

Open
begriffs opened this issue Dec 4, 2013 · 12 comments
Open

Prune builds #2

begriffs opened this issue Dec 4, 2013 · 12 comments

Comments

@begriffs
Copy link
Owner

begriffs commented Dec 4, 2013

The GHC installation includes some things that the Heroku instance potentially does not need. If we remove these things after building (in the bin/compile script) then we have less to copy from cache on subsequent deploys.

Ideas from @puffnfresh's precompile-binaries.sh

  • Remove haddock and hpc docs
  • Remove duplicate .a, .o, .so, ..p_hi, and .dyn_hi libraries
  • Remove the whole share/ subdirectory
  • Apply Unix strip(1) to the binaries
@tel
Copy link

tel commented Jan 18, 2014

Perhaps related to this, why do we slug together the entire GHC install instead of just the compiled artifacts? Right now my slug is clocking in at around 200mb (2/3rds of the slug limit) while the app actually need only be around 3mb.

This trades off sandbox caching, but since Anvil can mitigate build times it seems it could be worth trading off in that direction.

@begriffs
Copy link
Owner Author

Interesting, can you put together a pull request to illustrate?

@begriffs
Copy link
Owner Author

(...or just elaborate here if a pull request is too daunting) 😄

@tel
Copy link

tel commented Feb 27, 2014

Haha, no, I've just been very low priority on devops for the last while.

My broad idea is just that once someone has builds going through vulcan or what-have-you then there are two build artifacts of interest, one large, one small. In particular, you have the actual executable (plus any static assets needed) and the sandbox. Since Heroku has a limit on slug size, we only want to include what's truly necessary and it's likely that the executable+assets is all that needs to be tossed around in the slug.

Perhaps the easiest way to solve this issue is to just cabal sandbox delete after a successful build and then ship only the dist directory. This also would require changing the run command to hit the executable directly since cabal run will want to run cabal configure which will fail since the sandbox is gone.

The downside is that we lose sandbox cacheing. That tradeoff might be forced if someone reaches full slug volume or just preferrable if they're worried about it and their build system is sufficiently automated that they're never personally waiting on sandbox rebuilds.

@rehno-lindeque
Copy link
Contributor

Let me mention that I'm building with anvil every time now - Heroku builds can no longer keep up with the compile time. On anvil thankfully it's incremental though. If the sandbox weren't cached there things would take an intolerable amount of time to compile...

@rehno-lindeque
Copy link
Contributor

On the other hand, in our project we have

  .cabal-sandbox/ 294.3 MB
  dist/ 141.2 MB

so that definitely seems unsustainable (this is still a small project). Unfortunately it seems like the Putting cache... step starts to fail on anvil close to this size. We know that it's possible to do something about the dist folder (stripping etc) so I'd say figuring out what to do with the sandbox is a more immediate problem...

@rehno-lindeque
Copy link
Contributor

Total file size for my .sandbox/*.dyn_hi (via find .cabal-sandbox/ -name "*.dyn_hi" -printf ,%s) was 32MB so I decided to quickly try and remove those as a quick fix rehno-lindeque@cdabf0a

@begriffs
Copy link
Owner Author

I'm tempted to rebase this in tonight...do you think anything depends on these files? Why do they exist?

@rehno-lindeque
Copy link
Contributor

Honestly, I didn't investigate hard - I expect it's for building dynamic libs

@jferris
Copy link

jferris commented Sep 26, 2014

I downloaded a slug I just built which weighed in around 238M. After unzipping, I ended up with about 1.4G of data!

Breakdown from du:

 % du -sh dist .cabal .cabal-sandbox vendor/*
63M dist
149M    .cabal
291M    .cabal-sandbox
68M vendor/cabal-install-1.20.0.0
801M    vendor/ghc-7.8.2
400K    vendor/ghc-includes
2.1M    vendor/ghc-libs
4.5M    vendor/ghc-utils
21M vendor/node

Looks like the bulk of it is from GHC itself. Are there any of these we can safely delete after building? Do we need some of these build artifacts for incremental builds?

@jferris
Copy link

jferris commented Oct 7, 2014

We've since pruned everything but ghc-libs from our vendor. This took our most recent slug down from 310M to 19M. The app still boots fine, so this seems like a good improvement.

@mietek
Copy link
Contributor

mietek commented Oct 31, 2014

FYI—Haskell on Heroku by default slugs only stripped build products and datadirs.

Omitting portions of the GHC distribution is problematic, and best left to the user by way of e.g. a ghc-post-build-hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants