Skip to content

Conversation

H-Huang
Copy link
Member

@H-Huang H-Huang commented Sep 18, 2025

Option 2 of #1682

In our custom overlap_f_b function we write run_forward() and run_backward(). run_backward() is run as a separate thread so that we can have both forward and backward running together side by side. Looks like this:

image

We added hooks before and after expert parallel dispatch and combine to signal boundary points, which now turns into
image

Now in each of these red blocks, we just need threading.Barrier(2).wait() so that the comm and compute are scheduled in lock-step.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2025
@H-Huang H-Huang force-pushed the deepseek-v3-new-methods branch from 3a61b86 to 0f7a7c9 Compare September 22, 2025 21:52
@H-Huang
Copy link
Member Author

H-Huang commented Sep 22, 2025

Running with:

TORCH_NCCL_TRACE_BUFFER_SIZE=2000 TORCH_NCCL_DUMP_ON_TIMEOUT=true TORCH_FR_DUMP_TEMP_FILE=./nccl_trace_rank_ NGPU=4 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh

CUDA_LAUNCH_BLOCKING

TORCH_NCCL_TRACE_BUFFER_SIZE=2000 TORCH_NCCL_DUMP_ON_TIMEOUT=true TORCH_FR_DUMP_TEMP_FILE=./nccl_trace_rank_ NGPU=4 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" CUDA_LAUNCH_BLOCKING=1 ./run_train.sh

@H-Huang H-Huang force-pushed the deepseek-v3-new-methods branch from 0f7a7c9 to 6584aac Compare September 24, 2025 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant