model = resnet18,dataset = ImageNet,epoch = 5, batch_size = 1200,GPUs_num = 3 @TITAN Xp
CUDA_VISIBLE_DEVICES=0,1,2 python dataparallel.py
CUDA_VISIBLE_DEVICES=0,1,2 python -m torch.distributed.launch --nproc_per_node=3 --master_port=23334 distributed.py
CUDA_VISIBLE_DEVICES=0,1,2 python -m torch.distributed.launch --nproc_per_node=3 --master_port=23334 distributed_syncBN_amp.py
Method | Memory (MB) | Time (s) | ImageNet Top1 Acc(%) |
---|---|---|---|
DataParallel | 11329 | 7633 | 46.71 |
DistributedDataParallel | 11329 | 4612 | 46.83 |
DistributedDataParallel + amp | 8679 | 4680 | 46.74 |
DistributedDataParallel + amp + SyncBN | 8679 | 8173 | 46.78 |
注: SyncBN: 同步BN(pytorch自带); amp: 自动混合精度训练(pytorch自带)
(1)SyncBN 会影响训练速度,且在图像分类中作用不大,进行目标检测和图像分割时使用
(2)amp 可大大降低模型的内存占用,但可能在小模型上加速效果不明显
(3)一般情况下,使用 DistributedDataParallel 即可