Skip to content

Commit

Permalink
Update CHANGELOG for ROCm 6.2.x (#1343)
Browse files Browse the repository at this point in the history
* [CHANGELOG] Update for ROCm 6.2.1

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [CHANGELOG] Add Known issues for ROCm 6.2.1

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Updated 6.2.1 known issues to match the content in develop.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
  • Loading branch information
nileshnegi and corey-derochie-amd authored Oct 23, 2024
1 parent 6f2e07d commit 612add2
Showing 1 changed file with 14 additions and 2 deletions.
16 changes: 14 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@

Full documentation for RCCL is available at [https://rccl.readthedocs.io](https://rccl.readthedocs.io)

## RCCL 2.20.5 for ROCm 6.2.1
### Fixed
- GDR support flag now set with DMABUF
### Known issues
- On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, Direct Memory Access (DMA) transfers between the GPU and NIC are disabled and impacts multi-node RCCL performance.
- This issue was reproduced with RCCL 2.20.5 (ROCm 6.2.0 and 6.2.1) on systems with Broadcom Thor-2 NICs and affects other systems with RoCE networks using Linux 6.8.0 or newer.
- Older RCCL versions are also impacted.
- This issue will be addressed in a future ROCm release.

## RCCL 2.20.5 for ROCm 6.2.0
### Changed
- Compatibility with NCCL 2.20.5
Expand All @@ -24,12 +33,15 @@ Full documentation for RCCL is available at [https://rccl.readthedocs.io](https:
- New unit test for main kernel stack size
- New -n option for topo_expl to override # of nodes
- Improved debug messages of memory allocations
- Channel shuffling for IB systems
- Channel shuffling for multi-node MI300X systems
### Fixed
- Bug when configuring RCCL for only LL128 protocol
- Scratch memory allocation after API change for MSCCL
- Incorrect minNchannels in multi-node
- GDR support flag now set with DMABUF

## RCCL 2.18.6 for ROCm 6.1.2
### Changed
- Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid overflow

## RCCL 2.18.6 for ROCm 6.1.0
### Changed
Expand Down

0 comments on commit 612add2

Please sign in to comment.