Refine NVSHMEM path and codgen, build_nvshmem.sh and docs.#51
Refine NVSHMEM path and codgen, build_nvshmem.sh and docs.#51rqfeng930 wants to merge 7 commits intotile-ai:mainfrom
Conversation
Remove USE_DISTRIBUTED gating on NVSHMEM path detection properties. USE_DISTRIBUTED controls the IPC backend, not NVSHMEM. Setting only USE_NVSHMEM=1 previously skipped path detection silently. Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
Remove _get_nvshmem_include_path() which duplicated the detection logic already in env.Environment._find_nvshmem_paths(). Use the centralized env.NVSHMEM_INCLUDE_DIR property instead. Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
Accept "1", "true", "on" (case-insensitive) in the C++ use_distributed() helper, matching the Python env.USE_DISTRIBUTED property. Previously only "1" was recognized, causing mismatches when users set the env var to "true" or "on". Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
Delegate directly to env.USE_NVSHMEM / env.USE_DISTRIBUTED which already return bool. The local wrappers in profiler and cython_wrapper were re-parsing str(bool).lower() unnecessarily. Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
Remove duplicated header included in CodeGenTileLangCUDA::Finish(). Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
Remove BarrierAll, Quiet, Fence, SyncAll each had a duplicated else-if branch in VisitExpr_(CallNode). Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughDocs and scripts clarify NVSHMEM install and verification; NVSHMEM discovery was refactored to probe on first access; nvshmem-specific CUDA codegen emissions were removed; environment-flag normalization was replaced by returning raw env values. Changes
Sequence Diagram(s)(silently omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/env.py (1)
297-339:⚠️ Potential issue | 🟡 MinorNVSHMEM_HOME / NVSHMEM_SRC paths returned without verifying subdirs exist.
For the first two priority levels (Lines 304-312), the function returns
include_dir/lib_pathafter only checking that the root directory exists. If theincludeorlibsubdirs are missing (e.g., partial install or build not yet run), downstream code will get non-existent paths. The third-party and pip fallbacks (Lines 321, 333) do verify subdirectory existence.Suggested fix for NVSHMEM_HOME branch
nvshmem_home = os.environ.get("NVSHMEM_HOME", "") if nvshmem_home and os.path.exists(nvshmem_home): - include_dir = os.path.join(nvshmem_home, "include") - lib_path = os.path.join(nvshmem_home, "lib") + candidate_inc = os.path.join(nvshmem_home, "include") + candidate_lib = os.path.join(nvshmem_home, "lib") + if os.path.exists(candidate_inc) and os.path.exists(candidate_lib): + include_dir = candidate_inc + lib_path = candidate_lib else: nvshmem_src = os.environ.get("NVSHMEM_SRC", "") if nvshmem_src and os.path.exists(nvshmem_src): - include_dir = os.path.join(nvshmem_src, "build/src/include") - lib_path = os.path.join(nvshmem_src, "build/src/lib") + candidate_inc = os.path.join(nvshmem_src, "build/src/include") + candidate_lib = os.path.join(nvshmem_src, "build/src/lib") + if os.path.exists(candidate_inc) and os.path.exists(candidate_lib): + include_dir = candidate_inc + lib_path = candidate_libWithout this, a user who sets
NVSHMEM_HOMEto a directory without the expected layout will silently get broken include/lib paths, and the fallback to 3rdparty/pip won't be reached.
🤖 Fix all issues with AI agents
In `@docs/get_started/Installation.md`:
- Around line 63-81: After running build_nvshmem.sh the user's shell is left
inside "${NVSHMEM_SRC}/build", so update the docs to explicitly return to the
project root before Step 2: add a short instruction after the build step telling
the user to cd back to the project root (so subsequent cd
tilelang/distributed/pynvshmem will resolve correctly); reference the
build_nvshmem.sh step and NVSHMEM_SRC to make it clear where the user currently
is and where to return.
In `@examples/distributed/README.md`:
- Around line 14-35: The README's NVSHMEM build/install steps assume the shell
returns to the repo root after running build_nvshmem.sh; after the "source
build_nvshmem.sh" step (in the build section referencing build_nvshmem.sh) add
an explicit change-directory command (e.g., "cd -" or "cd <repo-root>") before
the subsequent "cd tilelang/distributed/pynvshmem" line so the pynvshmem install
runs from the expected repo root.
In `@tilelang/distributed/build_nvshmem.sh`:
- Line 40: The tar extraction line incorrectly forces gzip with the -z flag when
extracting the archive referenced by the TARBALL variable; change the tar
invocation that currently runs against "${TARBALL}" to remove the -z flag so tar
auto-detects compression (i.e., use a tar command without the z option) to
support .txz (xz) and .tar.gz transparently.
- Around line 3-6: This script is intended to be sourced but currently uses a
global "set -e" and "exit" which will affect the caller's shell; modify
build_nvshmem.sh so it does not change the caller's shell state: save the
existing errexit state at start, use a subshell or a main function to run the
body (e.g., wrap logic in a function like main() or a subshell) so "set -e" is
local and restored afterward, and replace any "exit 0" used for the "--help" or
error paths with "return 0" (or "return 1" for errors) so sourcing does not
terminate the caller; ensure the script still exports NVSHMEM_SRC and
NVSHMEM_PATH after the localized work completes.
🧹 Nitpick comments (4)
src/target/codegen_cuda.h (1)
26-35:std::toloweron plaincharis technically UB for negative values.When
charis signed and the env var contains non-ASCII bytes,std::tolower(c)receives a negativeint, which is undefined per the C/C++ standard. The fix is trivial:Proposed fix
- for (auto &c : val) c = std::tolower(c); + for (auto &c : val) c = static_cast<char>(std::tolower(static_cast<unsigned char>(c)));In practice, environment variable values are almost always ASCII, so this is a pedantic concern.
tilelang/env.py (1)
265-272: Inconsistent truthy set:USE_NVSHMEM/USE_DISTRIBUTEDomit"yes".These two properties accept
("1", "true", "on"), but every other boolean helper in this class (e.g.,is_cache_globally_disabled,use_gemm_v1,get_default_verbose) accepts("1", "true", "yes", "on"). A user settingTILELANG_USE_NVSHMEM=yeswould be silently ignored. Consider aligning the truthy set (and the C++ counterpart incodegen_cuda.h).tilelang/distributed/build_nvshmem.sh (2)
17-23: Unknown arguments are silently swallowed.The catch-all
*) shift ;;discards unrecognised flags without warning. A typo like--arch90(missing space) will be eaten andARCHwill remain empty, leading to a confusing build.Suggested improvement
- *) shift ;; + *) echo "Unknown option: $1" >&2; return 1 2>/dev/null || exit 1 ;;
33-34: Hidden-file glob.*can match.and..on older shells.Line 34 (
rm -rf "${NVSHMEM_SRC}/".* 2>/dev/null || true) and Line 44 (mv nvshmem_src/.* ...) rely on GNU coreutils refusing to operate on./... On non-GNU systems (e.g., some minimal containers), this could be harmful. Consider usingfindorshopt -s dotglobinstead.
| ```bash | ||
| pip install mpich # building NVSHMEM needs MPI | ||
| export NVSHMEM_SRC="your_custom_nvshmem_dir" # default to 3rdparty/nvshmem_src | ||
| pip install mpich # NVSHMEM build requires MPI | ||
| cd tilelang/distributed | ||
| # Optional: set NVSHMEM_SRC to a custom path; default is ../../3rdparty/nvshmem_src | ||
| # For H100 (sm_90), add: source build_nvshmem.sh --arch 90 | ||
| source build_nvshmem.sh | ||
| ``` | ||
| You also need to install the `pynvshmem` package, which provides wrapped host-side Python API for NVSHMEM. | ||
|
|
||
| The script downloads the NVSHMEM source tarball from NVIDIA; you may need to be logged in at [NVIDIA Developer](https://developer.nvidia.com) for the download to succeed. | ||
|
|
||
| **2. Install pynvshmem (host-side Python API)** | ||
|
|
||
| From the project root (ensure `NVSHMEM_SRC` is set, e.g. from step 1 in the same shell): | ||
|
|
||
| ```bash | ||
| cd ./pynvshmem | ||
| cd tilelang/distributed/pynvshmem | ||
| python setup.py install | ||
| export LD_LIBRARY_PATH="$NVSHMEM_SRC/build/src/lib:$LD_LIBRARY_PATH" | ||
| export LD_LIBRARY_PATH="${NVSHMEM_SRC}/build/src/lib:$LD_LIBRARY_PATH" | ||
| ``` |
There was a problem hiding this comment.
Working directory left inside ${NVSHMEM_SRC}/build after sourcing the script.
build_nvshmem.sh performs cd "${NVSHMEM_SRC}" && cd build, so after step 1 finishes the user's shell is no longer at the project root. Step 2 asks the user to cd tilelang/distributed/pynvshmem, which will fail (or resolve to the wrong path) from the build directory.
Add an explicit cd back to the project root between the two steps:
source build_nvshmem.sh- After building, return to the project root:
-
- cd - # or: cd /path/to/tilescale
-
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @docs/get_started/Installation.md around lines 63 - 81, After running
build_nvshmem.sh the user's shell is left inside "${NVSHMEM_SRC}/build", so
update the docs to explicitly return to the project root before Step 2: add a
short instruction after the build step telling the user to cd back to the
project root (so subsequent cd tilelang/distributed/pynvshmem will resolve
correctly); reference the build_nvshmem.sh step and NVSHMEM_SRC to make it clear
where the user currently is and where to return.
</details>
<!-- fingerprinting:phantom:medusa:phoenix -->
<!-- This is an auto-generated comment by CodeRabbit -->
| **Build NVSHMEM from source (from repo root):** | ||
|
|
||
| ```bash | ||
| export NVSHMEM_SRC="your_custom_nvshmem_dir" # default to 3rdparty/nvshmem_src | ||
| pip install mpich | ||
| cd tilelang/distributed | ||
| source build_nvshmem.sh | ||
| source build_nvshmem.sh # optional: --arch 90 for H100 (sm_90) | ||
| ``` | ||
| You also need to install the `pynvshmem` package, which provides wrapped host-side Python API for NVSHMEM. | ||
|
|
||
| **Or install the prebuilt NVSHMEM package:** | ||
|
|
||
| ```bash | ||
| cd ./pynvshmem | ||
| pip install nvidia-nvshmem-cu12 | ||
| ``` | ||
|
|
||
| **Install pynvshmem and set library path:** | ||
|
|
||
| ```bash | ||
| cd tilelang/distributed/pynvshmem | ||
| python setup.py install | ||
| # If you built NVSHMEM from source: | ||
| export LD_LIBRARY_PATH="$NVSHMEM_SRC/build/src/lib:$LD_LIBRARY_PATH" | ||
| ``` |
There was a problem hiding this comment.
Same working-directory caveat as Installation.md.
After source build_nvshmem.sh, the shell's CWD is inside ${NVSHMEM_SRC}/build. The subsequent cd tilelang/distributed/pynvshmem on Line 31 assumes the user is back at the repo root. Consider adding a cd instruction (e.g., cd - or cd <repo-root>) between the build and pynvshmem install steps.
🤖 Prompt for AI Agents
In `@examples/distributed/README.md` around lines 14 - 35, The README's NVSHMEM
build/install steps assume the shell returns to the repo root after running
build_nvshmem.sh; after the "source build_nvshmem.sh" step (in the build section
referencing build_nvshmem.sh) add an explicit change-directory command (e.g.,
"cd -" or "cd <repo-root>") before the subsequent "cd
tilelang/distributed/pynvshmem" line so the pynvshmem install runs from the
expected repo root.
| # Usage: source build_nvshmem.sh [--arch 90] [--jobs N] [--force-download] | ||
| # Override at runtime: NVSHMEM_SRC, NVSHMEM_* (see below), CMAKE, NVSHMEM_VERSION. | ||
|
|
||
| if [ -z "${NVSHMEM_SRC}" ]; then | ||
| export NVSHMEM_SRC="$(realpath ../../3rdparty/nvshmem_src)" | ||
| echo "NVSHMEM_SRC not set, defaulting to ${NVSHMEM_SRC}" | ||
| else | ||
| NVSHMEM_SRC="$(realpath ${NVSHMEM_SRC})" | ||
| echo "Using NVSHMEM_SRC=${NVSHMEM_SRC}" | ||
| fi | ||
|
|
||
| if [ -d "${NVSHMEM_SRC}" ]; then | ||
| if [ "$(ls -A ${NVSHMEM_SRC})" ]; then | ||
| echo "NVSHMEM_SRC directory (${NVSHMEM_SRC}) is not empty, cleaning it..." | ||
| rm -rf "${NVSHMEM_SRC}/"* | ||
| rm -rf "${NVSHMEM_SRC}/".* 2>/dev/null || true | ||
| fi | ||
| else | ||
| mkdir -p "${NVSHMEM_SRC}" | ||
| fi | ||
|
|
||
| wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz | ||
| tar zxvf nvshmem_src_3.2.5-1.txz | ||
| rm -rf nvshmem_src_3.2.5-1.txz | ||
|
|
||
| mkdir -p "${NVSHMEM_SRC}" | ||
| mv nvshmem_src/* "${NVSHMEM_SRC}/" | ||
| mv nvshmem_src/.* "${NVSHMEM_SRC}/" 2>/dev/null || true | ||
| rmdir nvshmem_src | ||
|
|
||
|
|
||
| export NVSHMEM_PATH="${NVSHMEM_SRC}" | ||
| set -e |
There was a problem hiding this comment.
exit and set -e in a sourced script are dangerous.
The usage line says source build_nvshmem.sh, but:
- Line 18:
exit 0on--helpwill terminate the caller's shell session, not just the script. - Line 6:
set -ewill persist in the caller's shell after sourcing, causing subsequent non-zero commands to kill the session.
If the script is meant to be sourced (to export NVSHMEM_SRC/NVSHMEM_PATH), use return instead of exit and save/restore the set -e state, or wrap the body in a subshell function and only export the variables afterward.
Minimal fix for the exit/set-e issue
-set -e
+# Avoid set -e when sourced; it leaks into the caller's shell.
+if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
+ set -e
+fi
...
- -h|--help) echo "source build_nvshmem.sh [--arch ARCH] [--jobs N] [--force-download]"; exit 0 ;;
+ -h|--help) echo "source build_nvshmem.sh [--arch ARCH] [--jobs N] [--force-download]"; return 0 2>/dev/null || exit 0 ;;🤖 Prompt for AI Agents
In `@tilelang/distributed/build_nvshmem.sh` around lines 3 - 6, This script is
intended to be sourced but currently uses a global "set -e" and "exit" which
will affect the caller's shell; modify build_nvshmem.sh so it does not change
the caller's shell state: save the existing errexit state at start, use a
subshell or a main function to run the body (e.g., wrap logic in a function like
main() or a subshell) so "set -e" is local and restored afterward, and replace
any "exit 0" used for the "--help" or error paths with "return 0" (or "return 1"
for errors) so sourcing does not terminate the caller; ensure the script still
exports NVSHMEM_SRC and NVSHMEM_PATH after the localized work completes.
| fi | ||
| cd "${SCRIPT_DIR}" | ||
| [[ -f "${TARBALL}" ]] || { wget -q --show-progress "${URL}" -O "${TARBALL}" || { echo "Download failed (login at developer.nvidia.com?)." >&2; exit 1; }; } | ||
| tar -zxf "${TARBALL}" |
There was a problem hiding this comment.
Wrong tar decompression flag for .txz (xz) archive.
tar -zxf invokes gzip decompression, but .txz files are xz-compressed. This will fail at extraction time.
Fix
- tar -zxf "${TARBALL}"
+ tar xf "${TARBALL}"Using tar xf (no explicit compressor flag) lets modern GNU tar auto-detect the compression format, which works for both .txz and .tar.gz.
🤖 Prompt for AI Agents
In `@tilelang/distributed/build_nvshmem.sh` at line 40, The tar extraction line
incorrectly forces gzip with the -z flag when extracting the archive referenced
by the TARBALL variable; change the tar invocation that currently runs against
"${TARBALL}" to remove the -z flag so tar auto-detects compression (i.e., use a
tar command without the z option) to support .txz (xz) and .tar.gz
transparently.
8399f60 to
603a499
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tilelang/distributed/build_nvshmem.sh`:
- Line 34: The rm/mv invocations that use the ".*" glob (referencing
NVSHMEM_SRC) may match "." and ".."; replace the unsafe ".*" usage with two safe
globs that exclude "." and ".." (one that matches names starting with a dot but
not dot/dotdot, and one that matches dot-dot-prefixed names with at least one
additional character) in both places (the rm at the line using NVSHMEM_SRC and
the similar mv on the other affected line) so the commands no longer risk
including "." or ".." while preserving dotfile handling.
🧹 Nitpick comments (3)
src/target/codegen_cuda.h (1)
26-37:std::tolower(char)is technically undefined behavior for negativecharvalues.When
charis signed (common on x86), passing a negative value tostd::tolower(int)is UB per the C/C++ standard — the argument must be representable asunsigned charor equalEOF. While env-var values like"true"/"on"are safely within ASCII range, the idiomatic safe cast is:♻️ Suggested safe cast
- for (auto &c : val) - c = std::tolower(c); + for (auto &c : val) + c = static_cast<char>(std::tolower(static_cast<unsigned char>(c)));tilelang/distributed/build_nvshmem.sh (2)
22-22: Unknown arguments are silently swallowed.
*) shift ;;discards any unrecognized flag without warning. A typo like--forc-downloadwould be silently ignored. Consider printing a warning or erroring out.Suggested
- *) shift ;; + *) echo "Warning: unknown option '$1'" >&2; shift ;;
62-67: Stale CMake cache may ignore updated options.
[[ -f CMakeCache.txt ]] || ${CMAKE} ..skips reconfiguration if the cache exists, even ifARCHor other options have changed. Consider adding a note in the usage comment that--force-downloador manually deletingbuild/CMakeCache.txtis needed to reconfigure, or alternatively always re-run cmake (it's fast when nothing changed).
| if [[ -n "$FORCE_DL" ]] || [[ ! -f "${NVSHMEM_SRC}/CMakeLists.txt" ]]; then | ||
| if [[ -f "${NVSHMEM_SRC}/CMakeLists.txt" ]]; then | ||
| rm -rf "${NVSHMEM_SRC:?}/"* | ||
| rm -rf "${NVSHMEM_SRC}/".* 2>/dev/null || true |
There was a problem hiding this comment.
.* glob matches . and .. — use a safer pattern.
On lines 34 and 44, the .* glob expands to include . and ... While GNU rm/mv usually refuse to operate on these, POSIX doesn't guarantee it, and some environments may behave differently.
Safer glob alternatives
- rm -rf "${NVSHMEM_SRC}/".* 2>/dev/null || true
+ rm -rf "${NVSHMEM_SRC}"/.[!.]* "${NVSHMEM_SRC}"/..?* 2>/dev/null || true- mv nvshmem_src/.* "${NVSHMEM_SRC}/" 2>/dev/null || true
+ mv nvshmem_src/.[!.]* nvshmem_src/..?* "${NVSHMEM_SRC}/" 2>/dev/null || trueThe pattern .[!.]* matches single-dot-prefixed names and ..?* matches double-dot-prefixed names with at least one more character, safely excluding . and ...
Also applies to: 44-44
🤖 Prompt for AI Agents
In `@tilelang/distributed/build_nvshmem.sh` at line 34, The rm/mv invocations that
use the ".*" glob (referencing NVSHMEM_SRC) may match "." and ".."; replace the
unsafe ".*" usage with two safe globs that exclude "." and ".." (one that
matches names starting with a dot but not dot/dotdot, and one that matches
dot-dot-prefixed names with at least one additional character) in both places
(the rm at the line using NVSHMEM_SRC and the similar mv on the other affected
line) so the commands no longer risk including "." or ".." while preserving
dotfile handling.
- build_nvshmem.sh: skip download when source exists, NVSHMEM_* and NVSHMEM_VERSION/NVSHMEM_SOURCE_URL overridable at runtime - Installation.md, examples/distributed/README: NVSHMEM install options - lower.py: require only USE_NVSHMEM for NVSHMEM include/lib Signed-off-by: Ruiqing Feng <fengrq@mail.ustc.edu.cn>
603a499 to
4abfdfb
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tilelang/distributed/build_nvshmem.sh`:
- Around line 2-3: Update the header comment in build_nvshmem.sh to instruct
users to run the script with "bash build_nvshmem.sh" (or "./build_nvshmem.sh")
instead of "source build_nvshmem.sh"; the script uses exit and changes the
working directory so it must not be sourced—modify the top-two-line comment that
currently reads "Run from repo root or set NVSHMEM_SRC." and "Usage: source
build_nvshmem.sh [--arch 90] [--jobs N] [--force-download]" to reflect the
correct invocation (bash) and keep the existing usage flags unchanged.
🧹 Nitpick comments (2)
tilelang/distributed/build_nvshmem.sh (2)
60-60: Unquoted${CMAKE}can break on paths with spaces.If
CMAKEis set to a path containing spaces (e.g.,/opt/my tools/cmake), the unquoted expansion will word-split.-[[ -f CMakeCache.txt ]] || ${CMAKE} .. \ +[[ -f CMakeCache.txt ]] || "${CMAKE}" .. \
19-20: Unknown flags are silently swallowed.Line 20 shifts any unrecognized argument without warning. A typo like
--forced-downloadwould be silently ignored. Consider emitting a warning or erroring out.Suggested fix
- *) shift ;; + *) echo "Warning: unknown option '$1'" >&2; shift ;;
| # Build NVSHMEM from source for TileLang device-side use. Run from repo root or set NVSHMEM_SRC. | ||
| # Usage: source build_nvshmem.sh [--arch 90] [--jobs N] [--force-download] |
There was a problem hiding this comment.
Inconsistent usage guidance: comment says source, help says bash.
Line 2 says "Run from repo root or set NVSHMEM_SRC" and line 3 says source build_nvshmem.sh, but the --help output on line 16 says bash build_nvshmem.sh. The Installation docs also use bash. Since the script uses exit (not return) and changes cwd, it should not be sourced. Update the header comment to match.
-# Usage: source build_nvshmem.sh [--arch 90] [--jobs N] [--force-download]
+# Usage: bash build_nvshmem.sh [--arch 90] [--jobs N] [--force-download]🤖 Prompt for AI Agents
In `@tilelang/distributed/build_nvshmem.sh` around lines 2 - 3, Update the header
comment in build_nvshmem.sh to instruct users to run the script with "bash
build_nvshmem.sh" (or "./build_nvshmem.sh") instead of "source
build_nvshmem.sh"; the script uses exit and changes the working directory so it
must not be sourced—modify the top-two-line comment that currently reads "Run
from repo root or set NVSHMEM_SRC." and "Usage: source build_nvshmem.sh [--arch
90] [--jobs N] [--force-download]" to reflect the correct invocation (bash) and
keep the existing usage flags unchanged.
Summary by CodeRabbit
Documentation
Bug Fixes
Chores