Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Onweek] - Replace templating with docker args in dockerfile #6177

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pchila
Copy link
Member

@pchila pchila commented Dec 2, 2024

What does this PR do?

Remove go templating from Dockerfile and substitute it with docker build arguments

This PR re-organises the docker build in 3 or more stages:

  • stage 1: base OS image (ubuntu or wolfi) where packages get installed, users created etc.
  • stage 2: collect all the necessary elastic-agent files and directory structure, fix permissions and ownership (uses the stage 1 to ensure that users/groups are consistent)
  • stage 3: put the final image together (at least for the basic variant), set env, entrypoint etc.
  • There could be more stages added for the more complicated images (cloud etc)

Currently tested with ubuntu and wolfi base images (ubi/rhel needs another base image, it looks feasible though)

There's a lot of build arguments defined as we need a lot of flexibility but maybe that can also be simplified.
Build args are evaluated still using templating in the docker builder but they become plain strings when invoking Docker

Why is it important?

It allows using standardized toolchains that do not necessarily play nice with golang templating.
It also allows for good IDE integration and syntax checks so editing should be easier.
Iterating over elastic agent docker images should be easier since build can be kicked off with a plain docker command.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Implementation is not yet complete (there are a few attributes missing from the docker specs like the base image family or the build target to be used for a specific spec), however the new build can still be tested by using a new env var when packaging:

USE_DOCKER_BUILDX=true SNAPSHOT=true EXTERNAL=true PACKAGES="docker" PLATFORMS="linux/amd64" DOCKER_VARIANTS=basic mage package

<bunch of mage package logs here... >

--- CrossBuildGoDaemon Elastic-Agent
--- CrossBuild Elastic-Agent
--- CrossBuild Elastic-Agent
>> buildGoDaemon: Building for linux/amd64
>> golangCrossBuild: Building for linux/amd64
>> Building using: cmd='build/mage-linux-amd64 buildGoDaemon', env=[CC=gcc, CXX=g++, GOARCH=amd64, GOARM=, GOOS=linux, PLATFORM_ID=linux-amd64]
>> Building using: cmd='build/mage-linux-amd64 golangCrossBuild', env=[CC=gcc, CXX=g++, GOARCH=amd64, GOARM=, GOOS=linux, PLATFORM_ID=linux-amd64]
--- Package artifact
>> package: Building linux-amd64
>> package: Building elastic-agent type=docker for platform=linux/amd64
[+] Building 39.0s (37/37) FINISHED                                                                                                                                                                   docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                            0.0s
 => => transferring dockerfile: 11.91kB                                                                                                                                                                         0.0s
 => [internal] load metadata for docker.io/library/ubuntu:24.04                                                                                                                                                 0.0s
 => [context dockerbuild] load .dockerignore                                                                                                                                                                    0.0s
 => => transferring dockerbuild: 2B                                                                                                                                                                             0.0s
 => [internal] load .dockerignore                                                                                                                                                                               0.0s
 => => transferring context: 2B                                                                                                                                                                                 0.0s
 => [elastic_agent_base_ubuntu 1/3] FROM docker.io/library/ubuntu:24.04                                                                                                                                         0.0s
 => [context dockerbuild] load from client                                                                                                                                                                      0.0s
 => => transferring dockerbuild: 53B                                                                                                                                                                            0.0s
 => [internal] load build context                                                                                                                                                                               4.5s
 => => transferring context: 1.68GB                                                                                                                                                                             4.5s
 => CACHED [elastic_agent_base_ubuntu 2/3] RUN touch /var/mail/ubuntu &&     chown ubuntu /var/mail/ubuntu &&     userdel -r ubuntu &&     apt-get update -y &&     DEBIAN_FRONTEND=noninteractive apt-get ins  0.0s
 => CACHED [elastic_agent_base_ubuntu 3/3] RUN set -e ;   TINI_BIN="";   TINI_SHA256="";   TINI_VERSION="v0.19.0";   case "$(arch)" in     x86_64)       TINI_BIN="tini-amd64";       TINI_SHA256="93dcc18adc7  0.0s
 => CACHED [elastic_agent_base_ubuntu_buildtools 1/1] RUN apt-get update -y  && apt-get install --no-install-recommends --yes libcap2-bin && apt-get clean all                                                  0.0s
 => [elastic_agent_files  1/22] COPY beat/ /usr/share/elastic-agent/                                                                                                                                            1.8s
 => [elastic_agent_files  2/22] RUN chmod 0777 /usr/share/elastic-agent/                                                                                                                                        0.3s
 => [elastic_agent_files  3/22] RUN mkdir -p /usr/share/elastic-agent//data /usr/share/elastic-agent//data/elastic-agent-325e6e/logs                                                                            0.3s
 => [elastic_agent_files  4/22] RUN find /usr/share/elastic-agent/ -type d -exec chmod 0755 {} ;                                                                                                                0.4s
 => [elastic_agent_files  5/22] RUN find /usr/share/elastic-agent/ -type f -exec chmod 0644 {} ;                                                                                                                5.7s
 => [elastic_agent_files  6/22] RUN find /usr/share/elastic-agent//data -type d -exec chmod 0777 {} ;                                                                                                           0.4s
 => [elastic_agent_files  7/22] RUN find /usr/share/elastic-agent//data -type f -exec chmod 0666 {} ;                                                                                                           5.5s
 => [elastic_agent_files  8/22] RUN rm /usr/share/elastic-agent//elastic-agent                                                                                                                                  0.3s
 => [elastic_agent_files  9/22] RUN ln -s /usr/share/elastic-agent//data/elastic-agent-325e6e/elastic-agent /usr/share/elastic-agent//elastic-agent                                                             0.3s
 => [elastic_agent_files 10/22] RUN chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/elastic-agent                                                                                                     0.8s
 => [elastic_agent_files 11/22] RUN chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/*beat                                                                                                  1.3s
 => [elastic_agent_files 12/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/osquery* || true)                                                                                     0.5s
 => [elastic_agent_files 13/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/apm-server || true)                                                                                   0.4s
 => [elastic_agent_files 14/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/endpoint-security || true)                                                                            0.3s
 => [elastic_agent_files 15/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/fleet-server || true)                                                                                 0.4s
 => [elastic_agent_files 16/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/pf-elastic-collector || true)                                                                         0.4s
 => [elastic_agent_files 17/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/pf-elastic-symbolizer || true)                                                                        0.4s
 => [elastic_agent_files 18/22] RUN (chmod 0755 /usr/share/elastic-agent//data/elastic-agent-*/components/pf-host-agent || true)                                                                                0.5s
 => [elastic_agent_files 19/22] RUN find /usr/share/elastic-agent//data/elastic-agent-325e6e/components -name "*.yml*" -type f -exec chmod 0644 {} ;                                                            1.2s
 => [elastic_agent_files 20/22] RUN [ "basic" != "cloud" ] || (     mkdir -p /opt/agentbeat /opt/filebeat /opt/metricbeat &&     cp -f /usr/share/elastic-agent//data/cloud_downloads/filebeat.sh /opt/filebea  0.3s
 => [elastic_agent_files 21/22] RUN chown -R 1000:1000 /usr/share/elastic-agent/                                                                                                                                5.0s
 => [elastic_agent_files 22/22] RUN setcap =p /usr/share/elastic-agent//data/elastic-agent-325e6e/elastic-agent &&   ([ -z "${LINUX_CAPABILITIES}" ] || setcap ${LINUX_CAPABILITIES}  $(readlink -f elastic-ag  0.7s
 => [image_basic 1/4] COPY --from=elastic_agent_files /usr/share/elastic-agent/ /usr/share/elastic-agent/                                                                                                       2.8s
 => [image_basic 2/4] RUN mkdir /licenses &&     cp /usr/share/elastic-agent/LICENSE.txt /licenses &&     cp /usr/share/elastic-agent/NOTICE.txt /licenses                                                      0.3s
 => [image_basic 3/4] COPY --from=dockerbuild --chmod=0755 docker-entrypoint.elastic-agent /usr/local/bin/docker-entrypoint                                                                                     0.2s
 => [image_basic 4/4] WORKDIR /usr/share/elastic-agent/                                                                                                                                                         0.1s
 => exporting to image                                                                                                                                                                                          2.1s
 => => exporting layers                                                                                                                                                                                         2.1s
 => => writing image sha256:ba23f4deaccd12e4a62b761198fecf70131019c7705e979c258fb93efba70dc0                                                                                                                    0.0s
 => => naming to docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT                                                                                                                                           0.0s
--- TestPackages, the generated packages (i.e. file modes, owners, groups).
--- TestPackages
>> Testing package contents
package ran for 5m53.003236727s

There's still potential to use docker build cache, especially if we relaunch a docker build using the command line stored in the package dir:

➜  elastic-agent git:(onweek-improve-dockerfile) ✗ cat build/package/elastic-agent/elastic-agent-linux-amd64.docker/docker_build_cmd.txt
docker buildx build --build-arg BEAT_COMMIT=325e6e75dd4c5eac4e79d8bc83529ce87087215f --build-arg BEAT_DESCRIPTION=Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host. --build-arg BEAT_ROOT_IMPORT_PATH=https://www.elastic.co/beats/elastic-agent --build-arg BEAT_VCS_REF=github.com/elastic/elastic-agent --build-arg DOCKER_VARIANT=basic --build-arg ELASTIC_AGENT_USER=elastic-agent --build-arg BEAT_COMMIT_SHORT=325e6e --build-arg BEAT_LICENSE=Elastic License --build-arg BEAT_URL=https://www.elastic.co/beats/elastic-agent --build-arg BEAT_VENDOR=Elastic --build-arg BEAT_VERSION=9.0.0-SNAPSHOT --build-arg BUILD_TIMESTAMP=2024-12-02T09:09:29Z --build-context dockerbuild=/home/paolo/dev/elastic-agent/dev-tools/packaging/docker --target image_basic -f /home/paolo/dev/elastic-agent/dev-tools/packaging/docker/Dockerfile -t docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT build/package/elastic-agent/elastic-agent-linux-amd64.docker/docker-build %
➜  elastic-agent git:(onweek-improve-dockerfile) ✗

however, since the mage package process downloads components and compiles elastic-agent binary again the cache gets invalidated easily even if the exact same components/source code is used (this should be looked into as better caching would improve packaging in general, especially for development usecases)

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 2, 2024
@pchila pchila self-assigned this Dec 2, 2024
@pchila pchila requested a review from a team as a code owner December 2, 2024 09:27
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pchila pchila marked this pull request as draft December 2, 2024 09:27
Copy link
Contributor

mergify bot commented Dec 2, 2024

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Dec 2, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 2, 2024
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify enhancement New feature or request skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants