MSCCL++ v0.4.3
What's Changed
- Add optional prefix to installation paths by @chhwang in #235
- Fix #235 by @chhwang in #239
- Check
nvidia_peermem
during runtime by @chhwang in #234 - Do not check value of
__HIP_PLATFORM_AMD__
by @chhwang in #240 - Fix crash in static variable deconstructor by @Binyang2014 in #238
- Update interface to let user change fifo size by @Binyang2014 in #243
- Mask each fields of the trigger by @chhwang in #244
- Minor improvement on device syncer by @chhwang in #231
- remove make pylib-copy command by @Binyang2014 in #249
- Increase MSCCLPP_BITS_REGMEM_HANDLE to 9 by @aashaka in #251
- Add
putWithSignal()
latency tests by @chhwang in #246 - NVLS support. by @saeedmaleki in #250
- Fix wrong offset calculation by @chhwang in #257
- Fix NVLS support by @chhwang in #258
- Allow MSCCL++ CommGroup to take PyTorch tensors in args by @aashaka in #255
- Fix multi-nodes test failure by @Binyang2014 in #262
- Allow semaphores and memory to be registered separately in ProxyService by @aashaka in #264
- Remove cuda-python from project by @Binyang2014 in #245
- Fix the comm.py for nvls by @saeedmaleki in #267
- New packet format & optimizations by @chhwang in #256
- Fix multi-node ci pipeline by @Binyang2014 in #272
- add launch_bounds for mscclpp_test by @Binyang2014 in #273
- Fix bootstrapping mechanism by @chhwang in #278
- v0.4.3 by @chhwang in #279
New Contributors
Full Changelog: v0.4.2...v0.4.3