[DRAFT] Consolidate simple_fsdp and compiler_toolkit experiments #2360
+3,401
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR merges the
simple_fsdpandcompiler_toolkitexperiments into a new unified experiment calledgraph_based_training(name to be discussed later).The two experiments shared the same DTensor-based SimpleFSDP model authoring but had separate compilation paths:
simple_fsdpused JIT compilation (torch.compile) and compiler_toolkit used AOT joint graph capture. The new experiment unifies them under a singlecompile.modeconfig field ("jit"or"aot"), with a shared pass registry that validates pass/mode compatibility.No existing files in simple_fsdp/ or compiler_toolkit/ are modified.
File change breakdown
Files copied without changes:
simple_fsdp.py— Copied fromsimple_fsdp/simple_fsdp.py.reshard_after_forward.py— Copied fromsimple_fsdp/reshard_after_forward.py.cudagraph.py— Copied fromcompiler_toolkit/cudagraph.py.Files copied with import path changes only
common_utils.py— Adapted fromcompiler_toolkit/common_utils.py.graph_utils.py— Adapted fromcompiler_toolkit/graph_utils.py.jit_backend.py— Adapted fromsimple_fsdp/backend.py.train.py— Adapted fromcompiler_toolkit/train.py.llama3/__init__.py— Adapted fromsimple_fsdp/llama3/__init__.py.llama3/model.py— Adapted fromsimple_fsdp/llama3/model.py.deepseek_v3/__init__.py— Adapted fromsimple_fsdp/deepseek_v3/__init__.py.deepseek_v3/model.py— Adapted fromsimple_fsdp/deepseek_v3/model.py.Files adapted with non-trivial changes