-
Notifications
You must be signed in to change notification settings - Fork 86
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bump and refactor pytorch dockerfile template
- Loading branch information
Verdi March
committed
May 3, 2024
1 parent
10f7224
commit cf0eb3c
Showing
3 changed files
with
75 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29 changes: 29 additions & 0 deletions
29
2.ami_and_containers/containers/pytorch/1.xformers.fragment.dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
# SPDX-License-Identifier: MIT-0 | ||
|
||
#################################################################################################### | ||
# This is NOT a complete Dockerfile! Attempt to docker build this file is guaranteed to fail. | ||
# | ||
# This file provides an sample stanza to build xformers, that you can optionally add to | ||
# 0.nvcr-pytorch-aws.dockerfile should you need a container image with xformers. | ||
# | ||
# NOTE: always check `pip list` what's been installed. The base container (specified in | ||
# 0.nvcr-pytorch-aws.dockerfile) is already pre-installed with Transformer Engine, flash attention, | ||
# triton (https://github.com/openai/triton/), etc. | ||
#################################################################################################### | ||
|
||
# Install the xformers dependency from source, because pip install either breaks or try to pull | ||
# its own pt + cuda. | ||
# | ||
# Pre-requisite: build node has enough memory to compile xformers. More info on the stanza. | ||
RUN export TORCH_CUDA_ARCH_LIST="8.0;9.0+PTX" && \ | ||
# On p4de.24xlarge: | ||
# - MAX_JOBS=16 => 145GB memory | ||
# - MAX_JOBS=32 => 241GB memory | ||
# - MAX_JOBS=48 => 243GB memory, 542.5s | ||
# | ||
# NOTE: must export MAX_JOBS. For some reason, `MAX_JOBS=16 pip install ...` doesn't seem to | ||
# work to prevent OOM. | ||
export MAX_JOBS=32 && \ | ||
export NVCC_PREPEND_FLAGS="-t 32" && \ | ||
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters