26 Feb 10:34

tocean

Release MS-AMP v0.4.0 Latest

Latest

MS-AMP Improvements

Improve GPT-3 performance by optimizing the FP8-gradient accumulation with kernel fusion technology
Support FP8 in FSDP
Support DeepSpeed+TE+MSAMP and add cifar10 example
Support MSAMP+TE+DDP
Update DeepSpeed to latest version
Update TransformerEngin to V1.1 and flash-attn to latest version
Support CUDA 12.2
Fix several bugs in DeepSpeed integration

MS-AMP-Examples Improvements

Improve document for data processing in GPT3
Add launch script for pretraining GPT-6b7
Use new API of TransformerEngine in Megatron-LM

Document Improvements

Add docker usage in Installation page
Tell customer how to run FSDP and DeepSpeed+TE+MSAMP example in "Run Examples" page

Assets 2

03 Nov 10:41

tocean

Release MS-AMP v0.3.0

MS-AMP 0.3.0 Release Notes

MS-AMP Improvements

Integrate latest Transformer Engine into MS-AMP
Integrate with latest Megatron-LM
Add a website for MS-AMP and improve documents
Add custom DistributedDataParallel which supports FP8 and computation/computation overlap
Refactor code in dist_op module
Support UT for distributed testing
Integrate with MSCCL

MS-AMP-Examples Improvements

Support pretrain GPT-3 with Megatron-LM and MS-AMP
Provide a tool to print the traffic per second of NVLINK and InfiniBand
Print tflops and throughput metrics in all the examples

Document Improvements

Add performance number in Introduction page
Enhance Usage page and Optimization Level page
Add Container Images page
Add Developer Guide section

Assets 2

20 Jul 13:07

tocean

Release MS-AMP v0.2.0

MS-AMP 0.2.0 Release Notes

MS-AMP Improvements

Add O3 optimization for supporting FP8 in distributed training frameworks
Support ScalingTensor in functional.linear
Support customized attributes in FP8Linear
Improve performance
Add docker file for pytorch1.14+cuda11.8 and pytorch2.1+cuda12.1
Support pytorch 2.1
Add performance result and TE result in homepage
Cache TE build in pipeline

MS-AMP-Examples Improvements

Add 3 examples using MS-AMP:

Assets 2

21 Apr 09:01

tocean

Release MS-AMP v0.1.0

MS-AMP 0.1.0 Release Notes

MS-AMP package

Support the new FP8 feature that is introduced by latest accelerators (e.g. H100).
Speed up math-intensive operations, such as linear layers, by using Tensor Cores.
Speed up memory-limited operations by accessing one byte compared to half or single-precision.
Reduce memory requirements for training models, enabling larger models or larger minibatches.
Speed up communication for distributed model by transmitting lower precision gradients.
Support two optimization levels: O1 and O2.
Support two optimizers: Adam and AdamW.

Examples using MS-AMP

Assets 2