Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3bf6f64
Change runner from gpumode-nvidia-arc to Nvidia-A100
msaroufim Oct 13, 2025
5f40e36
Update nvidia-arc-health.yml
msaroufim Oct 13, 2025
e3ac730
Update nvidia-arc-health.yml
msaroufim Oct 13, 2025
c60090b
Feat: run health on b200
S1ro1 Nov 1, 2025
2a69a10
tmp
S1ro1 Nov 1, 2025
9a6c08d
tmp
S1ro1 Nov 1, 2025
aa2f894
tmp
S1ro1 Nov 1, 2025
fbc28ad
feat
S1ro1 Nov 1, 2025
6437e19
feat
S1ro1 Nov 1, 2025
6c4bde0
feat
S1ro1 Nov 1, 2025
a3e045c
replace nvidia workflow to point to our b200 cluster
alexzhang13 Nov 1, 2025
844d3bf
Fix: container
S1ro1 Nov 1, 2025
3275924
Fix: python->python3
S1ro1 Nov 1, 2025
b19b59b
Fix: add back deps
S1ro1 Nov 1, 2025
3e8eb6f
Fix: python->python3
S1ro1 Nov 1, 2025
998cf42
Fix: python->python3
S1ro1 Nov 1, 2025
1de31fd
Add nvidia-smi
S1ro1 Nov 1, 2025
d754094
split profiling into rocm/ncu;
ngc92 Nov 1, 2025
394e234
profile each benchmark individually for cleaner traces
ngc92 Nov 9, 2025
0e51cf5
profile in tempdir
ngc92 Nov 9, 2025
3e6a59c
send profile results as attached files
ngc92 Nov 9, 2025
f31e4bb
don't spam alerts
ngc92 Nov 9, 2025
00c215a
include default ncu report
ngc92 Nov 9, 2025
b014b79
attempt at filtered ncu
ngc92 Nov 9, 2025
f328eba
formatting fix
ngc92 Nov 9, 2025
eaa54f7
fix tests
ngc92 Nov 9, 2025
e83b0f4
Fix: good error for profile via api
S1ro1 Nov 10, 2025
716aca9
Fix: remove nvidia-smi from workflow
S1ro1 Nov 10, 2025
cb880a7
Fix: polling time to 15s
S1ro1 Nov 10, 2025
2621ca1
limit profiling report length
ngc92 Nov 10, 2025
af80b61
limit number of kernels to be profiled
ngc92 Nov 10, 2025
2931fd4
stricter matching for kernel name lines
ngc92 Nov 10, 2025
110386e
add an additional safety limit to ncu reports
ngc92 Nov 10, 2025
8a4c6b2
fix
ngc92 Nov 10, 2025
c9786fb
Fix: style
S1ro1 Nov 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions .github/workflows/nvidia-arc-health.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,15 @@ on:
- cron: '0 2 * * *'
workflow_dispatch:
push:
branches: [main]

jobs:
health-check:
runs-on: [gpumode-nvidia-arc]
runs-on: [nvidia-docker-b200-8-x86-64]
timeout-minutes: 5
container:
image: nvidia/cuda:12.4.0-devel-ubuntu22.04

steps:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow tries to import torch without installing it first. The previous version included steps to set up Python and install PyTorch, but these steps have been removed. This will cause the health check to fail. Consider adding back the installation steps or ensure torch is pre-installed in the runner environment.

Suggested change
steps:
steps:
- name: Install PyTorch
run: pip3 install torch

Copilot uses AI. Check for mistakes.
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install PyTorch
run: |
pip install torch

- name: GPU Health Check
run: python -c "import torch; torch.randn(5, device='cuda')"
run: python3 -c "import torch; torch.randn(5, device='cuda')"

env:
CUDA_VISIBLE_DEVICES: 0
38 changes: 6 additions & 32 deletions .github/workflows/nvidia_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,11 @@ run-name: 'NVIDIA Job - ${{ github.event.inputs.run_id }}'

jobs:
run:
runs-on: [gpumode-nvidia-arc]
runs-on: [nvidia-docker-b200-8-x86-64]
timeout-minutes: 10
container:
image: nvidia/cuda:12.4.0-devel-ubuntu22.04
steps:
- uses: actions/checkout@v3

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"

- name: Create input files
shell: bash
Comment on lines 27 to 28
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Create input files" step runs apt-get commands without sudo and without a container context. The previous version used a container (image: nvidia/cuda:12.4.0-devel-ubuntu22.04), which provided a root environment. Without a container, these commands will likely fail on the self-hosted runner unless the runner is configured to run as root (which is a security risk). Consider either restoring the container setup or using sudo for the apt-get commands, or ensure jq is pre-installed on the runner.

Copilot uses AI. Check for mistakes.
run: |
Expand All @@ -49,30 +37,18 @@ jobs:
# Now write to file (won't be logged since it's masked)
echo "$PAYLOAD" > payload.json

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"

- name: Setup Python environment
- name: Setup Virtual Environment and Install Dependencies
shell: bash
run: |
uv venv .venv
echo "VIRTUAL_ENV=$PWD/.venv" >> $GITHUB_ENV
echo "$PWD/.venv/bin" >> $GITHUB_PATH
pip install --upgrade pip
pip install -r "requirements.txt"
pip install -e .

if [[ -n "${{ github.event.inputs.requirements }}" ]]; then
cat > "requirements.txt" <<'EOL'
${{ github.event.inputs.requirements }}
EOL
uv pip install -r "requirements.txt"
fi
uv pip install -e .

- name: Run script
shell: bash
run: |
python src/runners/github-runner.py
python3 src/runners/github-runner.py

- name: Upload training artifacts
uses: actions/upload-artifact@v4
Expand All @@ -88,5 +64,3 @@ jobs:
name: profile-data
path: profile_data/*
retention-days: 1
env:
CUDA_VISIBLE_DEVICES: 0
33 changes: 29 additions & 4 deletions examples/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -500,9 +500,9 @@ def run_benchmarking(logger: PopcornOutput, pool: multiprocessing.Pool, tests: l
return 112


def _run_single_profile(test: TestCase) -> str:
def _run_single_profile_torch(test: TestCase) -> str:
"""
Runs a single test case. Do not call directly
Profiles a single benchmark using the torch profiler.
"""
from submission import custom_kernel
from torch.profiler import profile, ProfilerActivity
Expand All @@ -511,14 +511,36 @@ def _run_single_profile(test: TestCase) -> str:
data = generate_input(**test.args)
torch.cuda.synchronize()

cloned = _clone_data(data, 0)
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
with nvtx_range("custom_kernel"):
submission_output = custom_kernel(_clone_data(data, 0))
submission_output = custom_kernel(cloned)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable submission_output is not used.

Copilot uses AI. Check for mistakes.
torch.cuda.synchronize()

return prof.key_averages().table(sort_by="self_cuda_time_total", row_limit=20)


def _run_single_profile_ncu(test: TestCase) -> str:
"""
Profiles a single benchmark using ncu. Note: this does not
invoke NCU; instead, it is expected that eval is launched
under NCU, and this function will rurnthe kernel excactly
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the docstring: "rurnthe" should be "run the".

Suggested change
under NCU, and this function will rurnthe kernel excactly
under NCU, and this function will run the kernel excactly

Copilot uses AI. Check for mistakes.
once in the 'custom_kernel' nvtx range.
"""
from submission import custom_kernel

with nvtx_range("generate input"):
data = generate_input(**test.args)
torch.cuda.synchronize()

cloned = _clone_data(data, 0)
with nvtx_range("custom_kernel"):
submission_output = custom_kernel(cloned)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable submission_output is not used.

Suggested change
submission_output = custom_kernel(cloned)
custom_kernel(cloned)

Copilot uses AI. Check for mistakes.
torch.cuda.synchronize()

return ""


def _run_distributed_profile(test: TestCase, rank: int) -> "EventList":
"""
Runs a single profiling case. Do not call directly
Expand Down Expand Up @@ -610,7 +632,10 @@ def run_single_profile(test: TestCase, pool: multiprocessing.Pool) -> str:
"""
world_size = test.args.get("world_size", None)
if world_size is None:
return pool.apply(_run_single_profile, (test,))
if bool(os.getenv("POPCORN_NCU", "0")):
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition bool(os.getenv("POPCORN_NCU", "0")) will always evaluate to True because os.getenv() returns a string (either the env var value or the default "0"), and bool("0") is True. To properly check for a truthy environment variable, use:

if os.getenv("POPCORN_NCU", "0") != "0":

or

if os.getenv("POPCORN_NCU", "") in ("1", "true", "True"):
Suggested change
if bool(os.getenv("POPCORN_NCU", "0")):
if os.getenv("POPCORN_NCU", "") in ("1", "true", "True"):

Copilot uses AI. Check for mistakes.
return pool.apply(_run_single_profile_ncu, (test,))
else:
return pool.apply(_run_single_profile_torch, (test,))
else:
return run_multi_gpu_profile(pool, test, world_size)

Expand Down
4 changes: 2 additions & 2 deletions scripts/ci_test_cuda.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ def run_cuda_helper(sources: dict, headers: dict = None, arch=None, **kwargs):
headers = header_files

eval_result = run_cuda_script(
make_system_info(),
sources,
headers,
arch=arch,
mode=SubmissionMode.TEST.value,
tests="size: 256; seed: 42\n",
system=make_system_info(),
**kwargs,
)
return eval_result.compilation, eval_result.run
Expand Down Expand Up @@ -195,12 +195,12 @@ def test_include_dirs(tmp_path: Path):

# can also use generic flags argument
result = run_cuda_script(
make_system_info(),
{"eval.cu": eval_cu, "submission.cu": sub},
header_files,
flags=["-I.", f"-I{tmp_path}"],
mode=SubmissionMode.TEST.value,
tests="size: 256; seed: 42\n",
system=make_system_info(),
)

assert result.compilation.success is True
Expand Down
4 changes: 2 additions & 2 deletions scripts/ci_test_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@

def run_pytorch_helper(sources: dict, tests=None, **kwargs):
result = run_pytorch_script(
make_system_info(),
sources,
"eval.py",
mode=SubmissionMode.TEST.value,
tests=tests or "size: 256; seed: 42\n",
system=make_system_info(),
**kwargs,
)
return result.run
Expand Down Expand Up @@ -45,7 +45,7 @@ def custom_kernel(input):
run = run_pytorch_helper({**files, "submission.py": sub})
assert run.success is True
assert run.passed is False
assert "python eval.py test" in run.command
assert "python3 eval.py test" in run.command
assert run.stdout == ""
assert run.stderr == ""

Expand Down
17 changes: 10 additions & 7 deletions src/kernelbot/api/api_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,8 @@ async def display_report(self, title: str, report: RunResultReport):
elif isinstance(part, Log):
self.long_report += f"\n\n## {part.header}:\n"
self.long_report += f"```\n{part.content}```"


# ruff: noqa: C901
async def to_submit_info(
user_info: Any,
Expand All @@ -197,14 +199,12 @@ async def to_submit_info(
leaderboard_name: str,
gpu_type: str,
db_context: LeaderboardDB,
) -> tuple[SubmissionRequest, SubmissionMode]: # noqa: C901
) -> tuple[SubmissionRequest, SubmissionMode]: # noqa: C901
user_name = user_info["user_name"]
user_id = user_info["user_id"]

try:
submission_mode_enum: SubmissionMode = SubmissionMode(
submission_mode.lower()
)
submission_mode_enum: SubmissionMode = SubmissionMode(submission_mode.lower())
except ValueError:
raise HTTPException(
status_code=400,
Expand All @@ -222,6 +222,11 @@ async def to_submit_info(
SubmissionMode.BENCHMARK,
SubmissionMode.LEADERBOARD,
]
if submission_mode_enum == SubmissionMode.PROFILE:
raise HTTPException(
status_code=400,
detail="Profile submissions are not currently supported via API, use Discord instead.",
)
Comment on lines +225 to +229
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check for SubmissionMode.PROFILE on lines 214 and 225. The first check already raises an exception if the mode is PROFILE, so the second check on line 225 is unreachable. Consider removing the duplicate check on lines 225-229.

Suggested change
if submission_mode_enum == SubmissionMode.PROFILE:
raise HTTPException(
status_code=400,
detail="Profile submissions are not currently supported via API, use Discord instead.",
)

Copilot uses AI. Check for mistakes.
if submission_mode_enum not in allowed_modes:
raise HTTPException(
status_code=400,
Expand Down Expand Up @@ -263,9 +268,7 @@ async def to_submit_info(
except HTTPException:
raise
except Exception as e:
raise HTTPException(
status_code=400, detail=f"Error reading submission file: {e}"
) from e
raise HTTPException(status_code=400, detail=f"Error reading submission file: {e}") from e

try:
submission_code = submission_content.decode("utf-8")
Expand Down
8 changes: 7 additions & 1 deletion src/kernelbot/discord_reporter.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import discord
from discord_utils import _send_split_log
from discord_utils import _send_file, _send_split_log

from libkernelbot.report import (
File,
Link,
Log,
MultiProgressReporter,
Expand Down Expand Up @@ -70,6 +71,11 @@ async def display_report(self, title: str, report: RunResultReport):
message += part.text
elif isinstance(part, Log):
message = await _send_split_log(thread, message, part.header, part.content)
elif isinstance(part, File):
if len(message) > 0:
await thread.send(message)
await _send_file(thread, part.message, part.name, part.content)
message = ""
elif isinstance(part, Link):
if len(message) > 0:
await thread.send(message)
Expand Down
9 changes: 7 additions & 2 deletions src/kernelbot/discord_utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import functools
import logging
from io import BytesIO

import discord

Expand Down Expand Up @@ -124,7 +125,7 @@ async def _send_split_log(thread: discord.Thread, partial_message: str, header:
else:
if partial_message != "":
chunks.append(partial_message)
partial_message = line
partial_message = line + "\n"

if partial_message != "":
chunks.append(partial_message)
Expand All @@ -133,6 +134,10 @@ async def _send_split_log(thread: discord.Thread, partial_message: str, header:
for i, chunk in enumerate(chunks):
partial_message = f"\n\n## {header} ({i+1}/{len(chunks)}):\n"
partial_message += f"```\n{limit_length(chunk, 1900)}```"
await thread.send(partial_message)
await thread.send(partial_message, silent=True)

return ""


async def _send_file(thread: discord.Thread, message: str, name: str, file: bytes):
await thread.send(message, file=discord.File(BytesIO(file), filename=name), silent=True)
4 changes: 2 additions & 2 deletions src/libkernelbot/launchers/github.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ async def run_submission( # noqa: C901
# Update profile artifact to the actual download URL.
# For the GitHub launcher the profile_artifact currently just contains
# the name of the artifact.
if profile_res is not None:
if profile_res is not None and "profile-data" in index:
profile_res.download_url = index["profile-data"].public_download_url

res = EvalResult(
Expand Down Expand Up @@ -344,7 +344,7 @@ async def wait_for_completion(
return

await callback(self)
await asyncio.sleep(20) # Yield control while waiting
await asyncio.sleep(15) # Yield control while waiting
except TimeoutError:
raise # Re-raise the specific TimeoutError from the timeout block
except Exception as e:
Expand Down
Loading
Loading