Skip to content

Releases: ROCm/rccl

RCCL 2.22.3 for ROCm 6.4.0

11 Apr 13:35
7b86f83
Compare
Choose a tag to compare

Added

  • RCCL_SOCKET_REUSEADDR and RCCL_SOCKET_LINGER environment parameters
  • Setting NCCL_DEBUG=TRACE NCCL_DEBUG_SUBSYS=VERBS will generate traces for fifo and data ibv_post_sends
  • Added --log-trace flag to enable traces through the install.sh script (e.g. ./install.sh --log-trace)

Changed

  • Compatibility with NCCL 2.22.3
  • Added support for the rail-optimized tree algorithm for the MI300 series. This feature requires the use of all eight GPUs within
    each node. It limits NIC traffic to use only GPUs of the same index across nodes and should not impact performance
    on non-rail-optimized network topologies. The original method of building trees can be enabled by setting the
    environment variable RCCL_DISABLE_RAIL_TREES=1.
  • Additional debug information about how the trees are built can be logged to the GRAPH logging subsys by setting
    RCCL_OUTPUT_TREES=1.

rccl 2.21.5 for ROCm 6.3.3

19 Feb 17:47
9a0e6a1
Compare
Choose a tag to compare

RCCL code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.

rccl 2.21.5 for ROCm 6.3.2

28 Jan 15:44
9a0e6a1
Compare
Choose a tag to compare

RCCL code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.

RCCL 2.21.5 for ROCm 6.3.1

20 Dec 16:12
4ab67f5
Compare
Choose a tag to compare

Added

Changed

  • Enhanced user documentation

Resolved issues

  • Corrected user help strings in install.sh

RCCL 2.21.5 for ROCm 6.3.0

03 Dec 19:49
eef7b29
Compare
Choose a tag to compare

Added

  • MSCCL++ integration for specific contexts
  • Performance collection to rccl_replayer
  • Tuner Plugin example for MI300
  • Tuning table for large number of nodes
  • Support for amdclang++
  • New Rome model

Changed

  • Compatibility with NCCL 2.21.5
  • Increased channel count for MI300X multi-node
  • Enabled MSCCL for single-process multi-threaded contexts
  • Enabled gfx12
  • Enabled CPX mode for MI300X
  • Enabled tracing with rocprof
  • Improved version reporting
  • Enabled GDRDMA for Linux kernel 6.4.0+

Resolved issues

  • Fixed model matching with PXN enable

Known issues

  • MSCCL is temporarily disabled for AllGather collectives.
    • This can impact in-place messages (< 2 MB) with ~2x latency.
    • Older RCCL versions are not impacted.
    • This issue will be addressed in a future ROCm release.
  • Unit tests do not exit gracefully when running on a single GPU.
    • This issue will be addressed in a future ROCm release.

rccl 2.20.5 for ROCm 6.2.4

06 Nov 19:55
612add2
Compare
Choose a tag to compare

RCCL code for ROCm 6.2.4 did not change. The library was rebuilt for the updated ROCm 6.2.4 stack.

rccl 2.20.5 for ROCm 6.2.2

27 Sep 16:01
d380693
Compare
Choose a tag to compare

RCCL code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

rccl 2.20.5 for ROCm 6.2.1

20 Sep 19:58
d380693
Compare
Choose a tag to compare

RCCL code for ROCm 6.2.1 did not change. The library was rebuilt for the updated ROCm 6.2.1 stack.

RCCL 2.20.5 for ROCm 6.2.0

02 Aug 16:15
45b618a
Compare
Choose a tag to compare

Changed

  • Compatibility with NCCL 2.20.5
  • Compatibility with NCCL 2.19.4
  • Performance tuning for some collective operations on MI300
  • Enabled NVTX code in RCCL
  • Replaced rccl_bfloat16 with hip_bfloat16
  • NPKit updates:
    • Removed warm-up iteration removal by default, need to opt in now
    • Doubled the size of buffers to accommodate for more channels
  • Modified rings to be rail-optimized topology friendly
  • Replaced ROCmSoftwarePlatform links with ROCm links

Added

  • Support for fp8 and rccl_bfloat8
  • Support for using HIP contiguous memory
  • Implemented ROC-TX for host-side profiling
  • Enabled static build
  • Added new rome model
  • Added fp16 and fp8 cases to unit tests
  • New unit test for main kernel stack size
  • New -n option for topo_expl to override # of nodes
  • Improved debug messages of memory allocations
  • Channel shuffling for IB systems

Fixed

  • Bug when configuring RCCL for only LL128 protocol
  • Scratch memory allocation after API change for MSCCL
  • Incorrect minNchannels in multi-node

rccl 2.18.6 for ROCm 6.1.5

12 Mar 18:30
2fbe387
Compare
Choose a tag to compare

RCCL code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.