Skip to content

Comments

[Reorgnize] Reorganize intra-node examples add add tests#52

Open
Rachmanino wants to merge 3 commits intomainfrom
wt/reorg
Open

[Reorgnize] Reorganize intra-node examples add add tests#52
Rachmanino wants to merge 3 commits intomainfrom
wt/reorg

Conversation

@Rachmanino
Copy link
Collaborator

@Rachmanino Rachmanino commented Feb 13, 2026

  • Reorgnize intra-node examples and add tests for them in CI
    Intranode examples weren't taken into consideration in CI previously. And i think we need to sort examples into intranode, ipc-based and others.
  • Fix bugs in allocators and let CI pass
  • Investigate rdc-related issues

Summary by CodeRabbit

  • Tests

    • Introduced comprehensive intranode distributed tests validating multi-process execution across overlapped computation scenarios with CUDA hardware compatibility requirements.
  • Chores

    • Removed distributed all-gather example file.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

The PR removes a distributed all-gather GEMM example file and introduces a new test suite for intranode distributed examples (allgather_gemm_overlapped, gemm_rs_overlapped, and sp_ag_attention_intra_node) that spawn multi-process CUDA execution on compute capability 9.0 hardware.

Changes

Cohort / File(s) Summary
Deleted Example
examples/distributed/example_allgather_gemm.py
Removed 113-line distributed all-gather GEMM example including kernel construction, distributed orchestration, and runtime validation logic.
New Test Suite
examples/distributed/intranode/test_intranode.py
Added three intranode distributed tests using torch.multiprocessing.spawn, each targeting a different overlapped example module with 2-process execution on CUDA compute 9.0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • [Example] Add CP example #37: Modifies the exact example modules (example_gemm_rs_overlapped and example_sp_ag_attention_intra_node) that are now being tested by the new intranode test suite.

Poem

🐰 Old gemm hops away,
New overlapped tests come to play,
Two processes dance,
On GPU's expanse,
Distributed dreams brighten the day! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title contains a typo and is partially related to the changeset but fails to accurately summarize it. Fix the typo '[Reorgnize]' to '[Reorganize]' and clarify the title to accurately reflect the main changes, such as 'Reorganize intra-node examples and add tests'.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch wt/reorg

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@examples/distributed/intranode/test_intranode.py`:
- Around line 13-14: The test passes args=(2, None) which sends None into the
main functions and causes AttributeError when they access attributes; update the
test (test_example_allgather_gemm_overlapped) to pass a minimal valid args
object (e.g., an argparse.Namespace or simple object) containing the attributes
expected by example_allgather_gemm_overlapped.main,
example_gemm_rs_overlapped.main and example_sp_ag_attention_intra_node.main (at
least persistent for the first two, and batch_size, q_head, etc. for the
attention example), or alternately add defensive checks inside those main
functions to handle args is None before accessing attributes — pick one approach
and implement it consistently for the three mains named above.
🧹 Nitpick comments (1)
examples/distributed/intranode/test_intranode.py (1)

10-28: Consider pytest.mark.parametrize to reduce boilerplate.

All three tests share the same decorator stack and spawn logic, differing only in the imported module. A parametrized test would reduce duplication:

♻️ Optional refactor
+@tilelang.testing.requires_distributed
+@tilelang.testing.requires_cuda
+@tilelang.testing.requires_cuda_compute_version_eq(9, 0)
+@pytest.mark.parametrize("module", [
+    example_allgather_gemm_overlapped,
+    example_gemm_rs_overlapped,
+    example_sp_ag_attention_intra_node,
+])
+def test_intranode_example(module):
+    torch.multiprocessing.spawn(module.main, args=(2, None), nprocs=2)

Comment on lines +13 to +14
def test_example_allgather_gemm_overlapped():
torch.multiprocessing.spawn(example_allgather_gemm_overlapped.main, args=(2, None), nprocs=2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the location of the example modules
find . -name "example_allgather_gemm_overlapped.py" -o -name "example_gemm_rs_overlapped.py" -o -name "example_sp_ag_attention_intra_node.py" | head -20

Repository: tile-ai/tilescale

Length of output: 264


🏁 Script executed:

#!/bin/bash
# Extract main function definitions with context
for f in ./examples/distributed/intranode/example_allgather_gemm_overlapped.py ./examples/distributed/intranode/example_gemm_rs_overlapped.py ./examples/distributed/intranode/example_sp_ag_attention_intra_node.py; do
  echo "=== $(basename $f) ==="
  sed -n '/^def main/,/^def \|^class \|^$/p' "$f" | head -40
done

Repository: tile-ai/tilescale

Length of output: 931


🏁 Script executed:

#!/bin/bash
# Get the full main function for each module (up to 60 lines)
for f in ./examples/distributed/intranode/example_allgather_gemm_overlapped.py ./examples/distributed/intranode/example_gemm_rs_overlapped.py ./examples/distributed/intranode/example_sp_ag_attention_intra_node.py; do
  echo "=== $(basename $f) ==="
  sed -n '/^def main/,/^def \|^class \|^$/p' "$f" | head -60
  echo ""
done

Repository: tile-ai/tilescale

Length of output: 934


🏁 Script executed:

#!/bin/bash
# Check the test file to confirm args pattern
echo "=== test_intranode.py ==="
cat -n examples/distributed/intranode/test_intranode.py

echo ""
echo "=== Checking for any args usage in example_sp_ag_attention_intra_node.py ==="
grep -n "args\." examples/distributed/intranode/example_sp_ag_attention_intra_node.py || echo "No args. accesses found"

Repository: tile-ai/tilescale

Length of output: 1792


All three main functions will crash with AttributeError when args=None is passed.

The tests call torch.multiprocessing.spawn(..., args=(2, None), nprocs=2), which passes None as the third argument to each main function. However:

  • example_allgather_gemm_overlapped.main and example_gemm_rs_overlapped.main unconditionally access args.persistent on line 5.
  • example_sp_ag_attention_intra_node.main unconditionally accesses args.batch_size, args.q_head, and multiple other attributes (lines 293-304, 409).

All three will fail with AttributeError: 'NoneType' object has no attribute .... Either pass valid args or add None checks to all attribute accesses in these functions.

🤖 Prompt for AI Agents
In `@examples/distributed/intranode/test_intranode.py` around lines 13 - 14, The
test passes args=(2, None) which sends None into the main functions and causes
AttributeError when they access attributes; update the test
(test_example_allgather_gemm_overlapped) to pass a minimal valid args object
(e.g., an argparse.Namespace or simple object) containing the attributes
expected by example_allgather_gemm_overlapped.main,
example_gemm_rs_overlapped.main and example_sp_ag_attention_intra_node.main (at
least persistent for the first two, and batch_size, q_head, etc. for the
attention example), or alternately add defensive checks inside those main
functions to handle args is None before accessing attributes — pick one approach
and implement it consistently for the three mains named above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant