Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

support for reproducible builds #225

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

mark-kubacki
Copy link
Contributor

@mark-kubacki mark-kubacki commented Dec 15, 2016

This contribution implements support for reproducible builds with dgr.

That is, if the scripts and tools used within the builder and target environment (equals "build environment" in our case) allow for such builds, then using dgr will result in a reproducibly built ACI image.

dgr's function is that of a packager, hence its responsibilities are deterministically ordered content in said image, and pinned datetimes – especially modification datetimes. This requires us to upgrade tar to ≥1.29 for its new features, and to expose those arguments to templating of the aci-manifest.yml, which is the precursor of the APPC manifest. Everything else is already done by rkt.

The sole setting which the user needs to set is the build epoch (a fixed datetime), like this:

aci:
  annotations:
    - name: build-date
      value: '2016-12-15T20:00:00+01:00'

I have created a reproducible build as demonstration which you can try from here:
https://github.com/Blitznote/baseimage
Any two runs result in binary identical image.aci files.

@mark-kubacki
Copy link
Contributor Author

mark-kubacki commented Dec 15, 2016

Please note that Github displays my commits in a different order than git or submitted.

This is the output of two runs of the demo:
reproducible-builds-for-rkt

I am sorry, but I will, categorically, not work around any false errors thrown by Travis, the CI system, due to Travis using an outdated or incomplete OS installation. (realpath is part of the Linux/GNU coreutils, and should not be missing.)

Please test on your local machine(s).

@mark-kubacki mark-kubacki force-pushed the reproducible-builds branch 2 times, most recently from 0880443 to c639dbe Compare December 19, 2016 23:41
@mark-kubacki
Copy link
Contributor Author

mark-kubacki commented Dec 19, 2016

rebased, please mind the commit hashes

@mark-kubacki
Copy link
Contributor Author

Another reproducible build as example and motivation for this feature:
https://github.com/wmark/aci-avorion-server

@n0rad
Copy link
Member

n0rad commented Feb 6, 2017

Sorry it's just that I don't have a lot of time this days.
Will get back to it soon.

@mark-kubacki mark-kubacki force-pushed the reproducible-builds branch 6 times, most recently from d15a1de to 2bc337e Compare March 5, 2017 19:26
@mark-kubacki
Copy link
Contributor Author

rebased, fixed an example which timed-out

@Globegitter
Copy link

Would be great to see support for this.

gomake Outdated
@@ -81,6 +81,7 @@ gomake_update() {
clean() {
echo_green "Cleaning"
rm -Rf ${work_path}/${target_name}
git clean -fdx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remove untracked files from the working tree"

hum ? why would we want to auto remove untracked files ?

@@ -10,12 +10,15 @@ deb-src http://www.apache.org/dist/cassandra/debian 30x main
deb http://ftp.debian.org/debian jessie-backports main
EOF

gpg --keyserver pool.sks-keyservers.net --recv-keys F758CE318D77295D || \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, can you put that in another PR, it's not linked.

Also can you squash your commits, 13 commits for a single feature too much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I will issue separate PRs. Once all prerequisites are merged, trim and squash this one.

SupplementaryGIDs []int `json:"supplementaryGIDs,omitempty" yaml:"supplementaryGIDs,omitempty"`
WorkingDirectory string `json:"workingDirectory,omitempty" yaml:"workingDirectory,omitempty"`
Environment types.Environment `json:"environment,omitempty" yaml:"environment,omitempty"`
EventHandlers []types.EventHandler `json:"eventHandlers,omitempty" yaml:"eventHandlers,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event handler is not linked to reproductible build, can you put it in another PR?

Copy link
Contributor Author

@mark-kubacki mark-kubacki Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option to remove anything dgr from the result entails the aci.exclude setting (or build.exclude, wherever this eventually lands), and this in turn needs the blanking/removing of event handlers.

@@ -74,6 +74,8 @@ type BuilderDefinition struct {

type BuildDefinition struct {
MountPoints []MountInfo `json:"mountPoints,omitempty" yaml:"mountPoints,omitempty"`
Exclude []string `json:"exclude,omitempty" yaml:"exclude,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exclude & transform is not linked to reproductible build, can you put it in a separate PR please ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's not meaningful, we don't know what will be excluded or transformed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we need them, the builder scripts goal is to prepare the filesystem to be ready for tar. it should be done there

Copy link
Contributor Author

@mark-kubacki mark-kubacki Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the build, and we need this option to exclude dgr from the resulting tarball as well as intermediate items. For example, you can include a file but exclude its parent directory (tar allows for that!) to relinquish control over it to a previous layer.

im.Dependencies, err = ToAppcDependencies(m.Aci.Dependencies)
if err != nil {
return errs.WithEF(err, fields, "Failed to prepare dependencies for manifest")
}
im.Name = *name
im.Labels = labels

for _, exclusion := range m.Build.Exclude {
if strings.HasPrefix("/dgr/bin", exclusion) {
goto collectionIsComplete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use goto it's unusual and hard to read what you want to do.

Copy link
Member

@n0rad n0rad Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading 10 times, I start to see you want to exclude completly dgr prestart process if /dgr/bin is excluded.

Is it needed for reproductible build or can you add it in another PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested or backwards gotos are frowned upon. Forward-gotos (in this case to skip steps not needed) are no bad style. If you still wish I will substitute it, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xkcd

//im.Annotations.Set(*dgrBuilderIdentifier, m.Builder.Image.String())
im.Annotations.Set(*buildDateIdentifier, time.Now().Format(time.RFC3339))

if _, ok := im.Annotations.Get("build-date"); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if setting the build-date in the aci-manifest so it's not generated (and break reproductible build) is the best solution. It may be better just by telling not to set it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok after full read, you are using it to fixe file dates as long as manifest build date. It can stay like that

@@ -268,6 +288,7 @@ func (b *Builder) prepareNspawnArgsAndEnv(commandPath string) ([]string, []strin
if err != nil {
return args, env, errs.WithEF(err, b.fields.WithField("content", string(content)), "Failed to process manifest template")
}
b.fullManifest = aciManifest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is is useless, in the flow it rewritten by writeManifest()

@@ -123,6 +124,8 @@ func (b *Builder) writeManifest() error {
aciManifest.NameAndVersion = *common.NewACFullName(aciManifest.NameAndVersion.Name() + ":" + common.GenerateVersion(b.aciTargetPath))
}

b.fullManifest = aciManifest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of storing fullManifest in Builder, it should be return by WriteManifest() and passed to Tar().
Here you are storing the result of the manifest processing globally in the middle of the build process. This leads to think it's always available which is not true.

@mark-kubacki
Copy link
Contributor Author

Thanks for your review. I'll address the issues tomorrow.

The result should ideally be independent of the building app (dgr), hence I've made exclude part of this.

Travis, the current CI system, uses a very old Ubuntu distribution whose
package 'coreutils' does not include 'realpath'. Or, 'coreutils' has had
it but the CI image maintainer has stripped it from the final image for
consistency's sake. Anyway, this gets us realpath in one way or another.
In order to ship with the most recent 'tar' – which is necessary for
reproducible builds – we build it ourselves should it be missing.
tar 1.29 comes with option --clamp-mtime which is necessary to create
reproducibly built ACIs.

Its most common use is with a fixed datetime, the so-called "source epoch",
like this:

  epoch="2016-11-27 06:00Z"           # date --utc +"%Y-%m-%d 06:00Z"
  tar \
    --mtime="${epoch}" --clamp-mtime \
    --sort=name \
    -cJf result.aci manifest rootfs
If the manifest is the first file then tools do not need to download
the whole image file in order to detect changes or updates.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

Parameter 'f' for 'tar' must be followed by the filename for recent
versions of 'tar'. Mind the order!

closes blablacar#210
Now that we have a recent version of 'tar' there is no need to use any
outdated from the host.
dgr changes paths and performs needless renames, which result in mayhem
if the process quits prematurely or the timing were off. The solution is
to use tar's `-C` param and transform the filenames.
Enables dgr to work on hosts that don't have any tar. (See also blablacar#217.)
Using these allows for more elaboarated builds, and excluding 'dgr'
from being part of the ACI image. For example, this is how you do it:

  build:
    exclude:
      - /dgr

To create a replacement tarball for the core 'aci-builder' you'd:

  build:
    exclude:
      - /etc/resolv.conf
      - /dgr
    transform:
      - 's@^rootfs@rootfs/dgr@'

If a path starts with a slash then the correct 'rootfs' or its alias
gets prepended automatically.

Please keep in mind that 'tar' applies exclusions first, followed by
transformations.
If the user has provided a build-date (as aci::annotation), then don't
overwrite it. Instead, use it as ceiling to modification datetime of files.

If everything else has been created in a deterministic manner, this
results in completely reproducible builds.

Please note that you can use templating to set this value.
Event handlers which have been set in the `aci-manifest.yml` will be written
to the final `manifest` file. Empty values clear any default handlers.

If a "pre-start" handler has been set, then it will not be overwritten by
dgr's default, "/dgr/bin/prestart".

Iff "/dgr/bin" has been excluded from the image, especially using the prefix
"/dgr", then dgr's defaults won't be used at all.
(Such an image would not start anyway, leaving the user confused about the
cause, whether it's app.exec or any pre-start handler.)
@mark-kubacki
Copy link
Contributor Author

I've begun splitting this in smaller chunks and filing as separate PR, and am waiting for #238 to be merged to rebase and proceed.

@mark-kubacki
Copy link
Contributor Author

Waiting on #238 to rebase and move this forward.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants