Release MSCCL++ v0.4.3 · microsoft/mscclpp

What's Changed

Add optional prefix to installation paths by @chhwang in #235
Fix #235 by @chhwang in #239
Check nvidia_peermem during runtime by @chhwang in #234
Do not check value of __HIP_PLATFORM_AMD__ by @chhwang in #240
Fix crash in static variable deconstructor by @Binyang2014 in #238
Update interface to let user change fifo size by @Binyang2014 in #243
Mask each fields of the trigger by @chhwang in #244
Minor improvement on device syncer by @chhwang in #231
remove make pylib-copy command by @Binyang2014 in #249
Increase MSCCLPP_BITS_REGMEM_HANDLE to 9 by @aashaka in #251
Add putWithSignal() latency tests by @chhwang in #246
NVLS support. by @saeedmaleki in #250
Fix wrong offset calculation by @chhwang in #257
Fix NVLS support by @chhwang in #258
Allow MSCCL++ CommGroup to take PyTorch tensors in args by @aashaka in #255
Fix multi-nodes test failure by @Binyang2014 in #262
Allow semaphores and memory to be registered separately in ProxyService by @aashaka in #264
Remove cuda-python from project by @Binyang2014 in #245
Fix the comm.py for nvls by @saeedmaleki in #267
New packet format & optimizations by @chhwang in #256
Fix multi-node ci pipeline by @Binyang2014 in #272
add launch_bounds for mscclpp_test by @Binyang2014 in #273
Fix bootstrapping mechanism by @chhwang in #278
v0.4.3 by @chhwang in #279

Full Changelog: v0.4.2...v0.4.3