Releases: Azure/MS-AMP
Releases · Azure/MS-AMP
Release MS-AMP v0.4.0
MS-AMP Improvements
- Improve GPT-3 performance by optimizing the FP8-gradient accumulation with kernel fusion technology
- Support FP8 in FSDP
- Support DeepSpeed+TE+MSAMP and add cifar10 example
- Support MSAMP+TE+DDP
- Update DeepSpeed to latest version
- Update TransformerEngin to V1.1 and flash-attn to latest version
- Support CUDA 12.2
- Fix several bugs in DeepSpeed integration
MS-AMP-Examples Improvements
- Improve document for data processing in GPT3
- Add launch script for pretraining GPT-6b7
- Use new API of TransformerEngine in Megatron-LM
Document Improvements
- Add docker usage in Installation page
- Tell customer how to run FSDP and DeepSpeed+TE+MSAMP example in "Run Examples" page
Release MS-AMP v0.3.0
MS-AMP 0.3.0 Release Notes
MS-AMP Improvements
- Integrate latest Transformer Engine into MS-AMP
- Integrate with latest Megatron-LM
- Add a website for MS-AMP and improve documents
- Add custom DistributedDataParallel which supports FP8 and computation/computation overlap
- Refactor code in dist_op module
- Support UT for distributed testing
- Integrate with MSCCL
MS-AMP-Examples Improvements
- Support pretrain GPT-3 with Megatron-LM and MS-AMP
- Provide a tool to print the traffic per second of NVLINK and InfiniBand
- Print tflops and throughput metrics in all the examples
Document Improvements
- Add performance number in
Introduction
page - Enhance
Usage
page andOptimization Level
page - Add
Container Images
page - Add
Developer Guide
section
Release MS-AMP v0.2.0
MS-AMP 0.2.0 Release Notes
MS-AMP Improvements
- Add O3 optimization for supporting FP8 in distributed training frameworks
- Support ScalingTensor in functional.linear
- Support customized attributes in FP8Linear
- Improve performance
- Add docker file for pytorch1.14+cuda11.8 and pytorch2.1+cuda12.1
- Support pytorch 2.1
- Add performance result and TE result in homepage
- Cache TE build in pipeline
MS-AMP-Examples Improvements
Add 3 examples using MS-AMP:
Release MS-AMP v0.1.0
MS-AMP 0.1.0 Release Notes
MS-AMP package
- Support the new FP8 feature that is introduced by latest accelerators (e.g. H100).
- Speed up math-intensive operations, such as linear layers, by using Tensor Cores.
- Speed up memory-limited operations by accessing one byte compared to half or single-precision.
- Reduce memory requirements for training models, enabling larger models or larger minibatches.
- Speed up communication for distributed model by transmitting lower precision gradients.
- Support two optimization levels: O1 and O2.
- Support two optimizers: Adam and AdamW.