You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this is close enough assumption a lot of the time, but it starts falling apart for cases with high model parallel comm. for example expert MLPs backward has wgrad and dgrad, but still only has 2x alltoall