This repo implements a simple PyTorch codebase for training classification models with powerful tools including Docker, PyTorchLightning, and Hydra.
- nvidia-docker
- docker-compose
We use docker to run all experiemnts. Before running any codes, you should check docker-compose.yml
first. The defualt setting is shown as below:
version: "3.9"
services:
playground:
container_name: playground
build:
context: docker/
dockerfile: Dockerfile.local
args:
- USER_ID=${UID}
- GROUP_ID=${GID}
- USER_NAME=${USER_NAME}
image: pytorch_local
environment:
- TZ=Asia/Shanghai
- TORCH_HOME=/data/torch_model
ipc: host
hostname: docker
working_dir: /code
command: ['sleep', 'infinity']
volumes:
- .:/code
- /data1/data:/data
- /data2/data/train_log/outputs:/outputs
You should change the volumes
to:
- mount your dataset folders to
/data
, - and mount a folder for
/outputs
(training logs will be written to this folder)
Next, simply run:
python core.py env prepare
This command will first build an image based on /docker/Dockerfile.local
and then luanch a container based on this image.
Simply run:
python core.py env
The default user is the same as the host to avoid permission issues. And of course you can enter the container with root:
python core.py env --root
Basiclly, there are four config files:
/docker/Dockerfile.pytorch
defines basic environments including cuda, cudnn, nccl, conda, torch, etc. This image has been build atdeepbase/pytorch
. By default, you don't need to change this./docker/Dockerfile.local
defines the logic of building the local image. For example, install packages defined inrequirements.txt
./docker/requirements.txt
defines the python packages you want to install./docker-compose.yml
defines the setting of running the container. For example, the volumes, timezone, etc.
After changing the settings as you want, you can rebuild the local image by running:
python core.py env prepare --build
Enter the environment and run:
python train.py
Reading the offical documents of Hydra and PyTorchLightning to know more:
- Hydra: Very powerful and convenient configuration system and more.
- PyTorchLightning: You almost only need to write codes for models and data. Say goodbye to codes for pipelines, mixed precision, logging, etc.