Where is the AR code? #44
Unanswered
ConnollyLeon
asked this question in
Q&A
Replies: 1 comment
-
Greetings! Adaptive Routing (AR) allows Azure NDv4 VM to automatically detect and avoid network congestion by dynamically selecting more optimal network paths in the InfiniBand networking. It requires cluster level configurations including InfiniBand switch etc. and cannot be controlled by application only. You can leverage its tuning if using Azure HPC image (which contains optimized environment and packages) with multiple NDv4 VMs. Please refer to this article or Mellanox AR documents for more information about adaptive routing. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have read the report about tutel, and I am curious about the implementation of AlltoAll.
I notice that tutel accelerate AlltoAll using CPU-GPU binding and adaptive routing tuning. I could find the CPU-GPU binding code in
system_init.py
, but fail to find code corresponding to adaptive routing tuning.Could you please help me to locate the code?
Beta Was this translation helpful? Give feedback.
All reactions