Tensor Parallelism in a Toy Model

I train a toy model (3 linear layers with a ReLU between the first and second) to understand tensor parallelism better. I train it on a regression task on a synthetic dataset across 2 GPUs. I MOSTLY use code from the amazing Megatron repo. I've made the code easy to follow but that's mainly because I stripped away details. Interested readers may refer to page 3 of my notes for the Megatron paper to understand how tensor parallelism works in the first two layers of my model. My parallelized Toy Model look like this:

Figure: Adapted from Fig. 3(a) of the Megatron-LM Paper. I use ReLU and don't use Dropout.

Usage

Prepare synthetic data for the regression task:

python prepare_data.py

Train the "non-parallel" version of the ToyModel:

bash bash_train.sh

Train the tensor-parallel version of the ToyModel:

bash bash_parallel_train.sh

Notes

You'll see that the training loss curves match for the tensor-parallel version and the "non-parallel" version. There is some work to be done around syncing random number generation on multiple GPUs for tensor-parallel, but to keep it simple, in this repo I have initialized all weight matrices using a simple deterministic scheme.

Here is the train-loss curve for "non-parallel":

For tensor-parallel training:

Note that we see the same loss on both GPUs (ranks 0 and 1) because the same regression labels are sent to both devices, and also there is a reduction step which makes sure model output on both devices is the same (refer to page 3 of my Megatron-LM notes).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
bash_parallel_train.sh		bash_parallel_train.sh
bash_train.sh		bash_train.sh
model.py		model.py
parallel_data.py		parallel_data.py
parallel_model.py		parallel_model.py
parallel_train.py		parallel_train.py
parallel_utils.py		parallel_utils.py
prepare_data.py		prepare_data.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensor Parallelism in a Toy Model

Usage

Notes

About

Releases

Packages

Languages

talwarabhimanyu/tensor_parallel_toy_model

Folders and files

Latest commit

History

Repository files navigation

Tensor Parallelism in a Toy Model

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages