Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use staged builds to minimize final image sizes #1031

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Oct 25, 2024

Description

Staged image builds so that final images do not have redundant things like:

  • Git tool and its deps (e.g. Perl)
  • Git repo history
  • Test directories

And drop explicit installation of:

  • langchain_core: GenAIComps installs langchain which already depends on that
  • jemalloc & GLX: nothing uses them (in any of the ChatQnA services), and for testing[1] it's trivial to create separate image adding those on top
  • File descriptor limit increase for ~/.bashrc (as these images run Python programs directly, not through Bash scripts)

=> This demonstrates that only 2-3 lines in the Dockerfiles are unique, and everything preceding those could be removed with a common base image.

[1] I assume those files were there to test this: https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#switch-memory-allocator

Issues

Fixes: #225

Type of change

  • Others (image size improvement / Dockerfile cleanup)

Dependencies

n/a (this removes redundant Git, Perl, jemalloc, GLX dependencies from final images)

Tests

This is draft / example for fixing #225

I have not tested it apart from verifying that images still build.

Notes

In a proper fix, non-unique part of the Dockerfiles would be a separate base image, generated with GenAIComps repo Dockerfile, and Dockerfiles in this repository would depend on that image instead of python-slim.

However, that requires co-operation between these two repositories (unless components base image Dockerfile is also in this repo) and:

  • CI handling this dependency i.e. building the base image first, when relevant
  • That base image being in a repository accessible for building the application images
    • E.g. in OPEA Docker hub project

(I.e. it needs to be done by a member of this project, I cannot do it.)

@eero-t
Copy link
Contributor Author

eero-t commented Oct 25, 2024

None of the test failures are due to my changes.

CodeGen Gaudi test TGI fail is due to it trying to load HuggingFace model it has no rights for:

Access to model meta-llama/CodeLlama-7b-hf is restricted and you are not in the authorized list.
Visit https://huggingface.co/meta-llama/CodeLlama-7b-hf to ask for access.

CodeGen Xeon test TGI seems to fail due to: Could not import SGMV kernel from Punica, which may be similar issue.

VisualQnA Gaudi & Xeon tests fail is due to NPM dependency conflict for it's Node.js Svelte UI container build (which spec is not touched by this PR).

@eero-t
Copy link
Contributor Author

eero-t commented Nov 14, 2024

Rebased this example to latest main, on assumption that CI issues have been fixed in the meanwhile.

Note: I did not update the Dockerfiles for applications that were added after this PR was created:

  • GraphRAG
  • EdgeCraftRAG

@eero-t
Copy link
Contributor Author

eero-t commented Nov 18, 2024

Updated also the new ChatQnA/Dockerfile.wrapper to staged build.

Rebased to latest main, as previously used main failed in CI.

@eero-t
Copy link
Contributor Author

eero-t commented Nov 22, 2024

No idea why guardrails times out:

Waiting for deployment "chatqna-tgi" rollout to finish: 0 of 1 updated replicas are available...
deployment "chatqna-tgi-guardrails" successfully rolled out
error: deployment "chatqna-tgi" exceeded its progress deadline
+ echo 'Timeout waiting for chatqna_guardrail pod ready!'
+ exit 1
Timeout waiting for chatqna_guardrail pod ready!

And translation fails:

curl: (18) transfer closed with outstanding read data remaining
Validate Translation failure!!!

As CI does not provide enough information.

So that redundant things do not end in final image:
- Git repo history
- Test directories
- Git tool and its deps

And drop explicit installation of:
- jemalloc & GLX: nothing uses them (in ChatQnA at least), and
  for testing it's trivial to create image adding those on top:
  https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#switch-memory-allocator
- langchain_core: GenAIComps install langchain which already depends on that

This demonstrates that only 2-3 lines in the Dockerfiles are unique,
and everything before those can be removed with a common base image.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
@eero-t
Copy link
Contributor Author

eero-t commented Nov 22, 2024

Rebased to main, and updated also GraphRAQ Dockerfile.

EdgeCraftRAG was not updated because it's using comps-base package from pip, instead of cloning Comps repo.

@eero-t
Copy link
Contributor Author

eero-t commented Nov 22, 2024

@lvliang-intel CI seems to be in rather bad state, as CMake is segfaulting on image builds:

 [vllm build 5/7] RUN --mount=type=cache,target=/root/.cache/pip     --mount=type=cache,target=/root/.cache/ccache     --mount=type=bind,source=.git,target=.git     VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel &&     pip install dist/*.whl &&     rm -rf dist:
...
4.853 subprocess.CalledProcessError: Command '['cmake', '/workspace/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cpu', '-DCMAKE_C_COMPILER_LAUNCHER=ccache', '-DCMAKE_CXX_COMPILER_LAUNCHER=ccache', '-DCMAKE_CUDA_COMPILER_LAUNCHER=ccache', '-DCMAKE_HIP_COMPILER_LAUNCHER=ccache', '-DVLLM_PYTHON_EXECUTABLE=/usr/bin/python3', '-DVLLM_PYTHON_PATH=/workspace/vllm:/usr/lib/python310.zip:/usr/lib/python3.10:/usr/lib/python3.10/lib-dynload:/usr/local/lib/python3.10/dist-packages:/usr/lib/python3/dist-packages:/usr/local/lib/python3.10/dist-packages/setuptools/_vendor', '-DFETCHCONTENT_BASE_DIR=/workspace/vllm/.deps', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=152']' returned non-zero exit status 1.
5.371 Segmentation fault (core dumped)

@ashahba ashahba self-assigned this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why containers use hundreds of MBs for Vim/Perl/OpenGL?
2 participants