rccl-2.8.4 for ROCm 4.2.0
Added
- Compatibility with NCCL 2.8.4
Optimizations
- Additional tuning for clique-based kernels
- Enabling GPU direct RDMA read from GPU
- Fixing potential memory leak issue when re-creating multiple communicators within same process
- Improved topology detection
Known issues
- None