-
Notifications
You must be signed in to change notification settings - Fork 61
reorderShardedAxisPass
for DID loop split
#4256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
!test |
Review updated until commit 50a4ee2 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
b420957
to
8e64e6e
Compare
!test |
1 similar comment
!test |
f8e6435
to
fb07219
Compare
!test --diff |
1 similar comment
!test --diff |
8803b66
to
58f1039
Compare
!test --diff |
!test |
8fb5af9
to
35090b0
Compare
reorderShardedAxisPass
for DID loop split
d635cb8
to
602de55
Compare
!test |
!test |
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch; still reviewing...
csrc/multidevice/utils.h
Outdated
const std::vector<IterDomain*>& domain, | ||
const IterDomain* id); | ||
|
||
// Returns the communication info for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-comment that this assumes expr
has been decomposed.
I'd move this to lower_to_communication.h so it's closer to
Fuser/csrc/host_ir/lower_to_communication.cpp
Line 477 in 671fbe9
std::vector<Expr*> convertSingleOpToCommunication( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can lead to circular dependency since that file import multidevice/utils and multidevice/utils itself needs this function.
In a separate PR, I can extend this function to be more elaborate and use it directly in convertSingleOpToCommunication
to avoid any mismatch between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can lead to circular dependency since that file import multidevice/utils and multidevice/utils itself needs this function.
AFAICT, to avoid circular dependency, only getCommunicationInfo and isCommunicationLayoutCompliant need to go to lower_to_communication. They probably should anyhow because they are all about lowering.
I general, a large "utils.h" header file tends to cause the following issues:
- Increased Compilation Times: A large utils.h header will be included in many source files, leading to a lot of parsing and compilation overhead, significantly increasing build times. Any change to utils.h, even a minor one, requires recompilation of all files that include it, further slowing down development.
- Reduced Encapsulation and Interface Clarity: A large utils.h likely exposes many internal utility functions and data structures to clients that don't need to know about them, violating the principle of information hiding and reducing encapsulation. This makes it harder to understand the actual public interface and can lead to accidental misuse or dependencies on internal implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- lower.cpp and lower_to_communication.cpp are importing each other -> This is easy to resolve.
lower.cpp
does not needlower_to_communication.h
. - lower.cpp can hold the above functions but it imports presegmentation passes. This is fine for now, since
reorderShardedAxisPass
is removed from there. However, previously these 2 files were also importing each other. The same dependency exists between other preseg passes, and this file. The preseg passes querycanLower
whereaslower
calls the preseg passes.
We need a restructuring to avoid this. Moving canLower
to lower_to_communication
seems sufficient but it's a part of HostIrLower
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circular dependencies among cpps are OK, although still not the best practice. Recall that cpp files are compiled independently. Circular dependencies among header files are much more problematic, but I don't think you are hitting any.
To avoid circular dependencies among cpps 1 for FusionExecutorCache, I think we can:
- Let
preseg
depend onlower_to_communication
. - Don't let
lower_to_communication
depend onlower
. HostIrLowerParams can be avoided by passing in CommunicatorBackend directly.Fuser/csrc/host_ir/lower_to_communication.cpp
Line 320 in 520a14d
HostIrLower::canLower(c), lower
. - Keep
lower
away from FusionExecutorCache. I saw several#include lower.h
in the main stack, but none of them seem necessary.
Footnotes
-
This isn't accurate because we never include a cpp from another cpp. I'm really referring to scenarios like a.cpp include b.h and b.cpp include a.h. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
56ebef4
to
b2122d5
Compare
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
1fd7ea9
to
8c55e43
Compare
!test |
!test |
As a follow-up to #4256 (comment) This makes the test more realistic and gives better coverage. It indeed caught #4642, a new bug.
Issue #3900.
Key changes:
hasShardingChanges
->getGatherOrScatterCommInfo
isInnerResharding
->isCommLayoutCompliant
andisAllocatedOutermost
to support split allocation domains seen in DID loop splitDependency on
canLower
is removed to decouple from Issue ExtendInsertReshardingPass
for loop split. #4382.This PR does not handle ParallelType::Stream right now.
TODO:ReduceScatter
tests after PR #4384