Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions .github/workflows/atom-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ jobs:
runs-on: ${{ matrix.runner }}
env:
ATOM_BASE_NIGTHLY_IMAGE: rocm/atom-dev:latest
CONTAINER_NAME: atom_test_${{ strategy.job-index }}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strategy.job-index is not a valid GitHub Actions context variable. GitHub Actions does not provide a built-in job index for matrix jobs. Consider using a combination of matrix values to create unique container names instead. For example:

  • Use github.run_id with a sanitized version of matrix.model_name
  • Or use github.run_id with github.run_attempt and sanitized matrix values

Example: CONTAINER_NAME: atom_test_${{ github.run_id }}_${{ github.run_attempt }}_${{ matrix.model_name }}

Note: You'll need to sanitize matrix.model_name to remove special characters that are invalid in container names (e.g., replace spaces and special chars with underscores).

Suggested change
CONTAINER_NAME: atom_test_${{ strategy.job-index }}
CONTAINER_NAME: atom_test_${{ github.run_id }}_${{ github.run_attempt }}_${{ toLower(replace(replace(replace(replace(matrix.model_name, ' ', '_'), '/', '_'), ':', '_'), '.', '_')) }}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the co-pilot: Consider change CONTAINER_NAME: atom_test_${{ strategy.job-index }}
to:
CONTAINER_NAME: atom_test_${{ github.run_id }}_${{ strategy.job-index }}

The reason is strategy.job-index is only unique within a single workflow run. If two PRs trigger CI at the same time and are scheduled on the same runner, both runs may have job-index=0, resulting in both generating atom_test_0 and causing a naming collision. Adding github.run_id. Even if two workflow runs execute concurrently on the same runner, they will not collide.

GITHUB_REPO_URL: ${{ github.event.pull_request.head.repo.clone_url || 'https://github.com/ROCm/ATOM.git' }}
GITHUB_COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.event.head_commit.id }}

Expand Down Expand Up @@ -151,8 +152,8 @@ jobs:
- name: Start CI container
run: |
echo "Clean up containers..."
(docker ps -aq -f name=atom_test | xargs -r docker stop) || true
(docker ps -aq -f name=atom_test | xargs -r docker rm) || true
(docker ps -aq -f name="^${CONTAINER_NAME}$" | xargs -r docker stop) || true
(docker ps -aq -f name="^${CONTAINER_NAME}$" | xargs -r docker rm) || true
Comment on lines +155 to +156
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker's --filter name= does not support regex anchors like ^ and $. The filter performs a substring match by default. To prevent matching containers like atom_test_1_backup when looking for atom_test_1, you should rely on exact matching by ensuring unique enough names. However, since the base approach using strategy.job-index is not valid, this will need to be reconsidered once the container naming strategy is corrected.

Suggested change
(docker ps -aq -f name="^${CONTAINER_NAME}$" | xargs -r docker stop) || true
(docker ps -aq -f name="^${CONTAINER_NAME}$" | xargs -r docker rm) || true
(docker ps -aq -f name="${CONTAINER_NAME}" | xargs -r docker stop) || true
(docker ps -aq -f name="${CONTAINER_NAME}" | xargs -r docker rm) || true

Copilot uses AI. Check for mistakes.

if [ -f "/etc/podinfo/gha-render-devices" ]; then
DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices)
Expand Down Expand Up @@ -191,21 +192,21 @@ jobs:
-e ATOM_DISABLE_MMAP=true \
-v "${{ github.workspace }}:/workspace" \
-w /workspace \
--name atom_test \
--name "$CONTAINER_NAME" \
atom_test:ci

env:
GITHUB_WORKSPACE: ${{ github.workspace }}

- name: Check shm size
run: |
docker exec atom_test df -h /dev/shm
docker exec "$CONTAINER_NAME" df -h /dev/shm

- name: Download models
run: |
if [ -d "/models" ]; then
echo "/models directory found, downloading model to /models/${{ matrix.model_path }}"
if ! docker exec -e HF_TOKEN=${{ secrets.AMD_HF_TOKEN }} atom_test bash -lc "hf download ${{ matrix.model_path }} --local-dir /models/${{ matrix.model_path }}"; then
if ! docker exec -e HF_TOKEN=${{ secrets.AMD_HF_TOKEN }} "$CONTAINER_NAME" bash -lc "hf download ${{ matrix.model_path }} --local-dir /models/${{ matrix.model_path }}"; then
echo "Model download failed for '${{ matrix.model_path }}'. Aborting."
exit 1
fi
Expand All @@ -231,12 +232,12 @@ jobs:
ls -la $model_path || true
# Print debug logs
echo "========= Runner debug logs ==============="
ps aux
rocm-smi --showmemuse
rocm-smi --showpids
ps aux
docker ps -a
echo "========= End runner debug logs ==============="
docker exec atom_test bash -lc "
docker exec "$CONTAINER_NAME" bash -lc "
set -euo pipefail
python3 -m atom.examples.simple_inference \
--model \"$model_path\" \
Expand Down Expand Up @@ -275,12 +276,12 @@ jobs:
else
model_path="${{ matrix.model_path }}"
fi
docker exec atom_test bash -lc "
docker exec "$CONTAINER_NAME" bash -lc "
.github/scripts/atom_test.sh launch $model_path ${{ matrix.extraArgs }}
"
echo ""
echo "========== Running accuracy test =========="
docker exec atom_test bash -lc "
docker exec "$CONTAINER_NAME" bash -lc "
.github/scripts/atom_test.sh accuracy $model_path
" 2>&1 | tee atom_accuracy_output.txt

Expand All @@ -307,5 +308,5 @@ jobs:
if [[ ${{ matrix.runner }} == atom-mi355-8gpu.predownload ]]; then
docker run --rm -v "${GITHUB_WORKSPACE:-$PWD}":/workspace -w /workspace --privileged rocm/pytorch:latest bash -lc "rm -rf /workspace/atom/ /workspace/aiter/" || true
fi
docker stop atom_test || true
docker rm atom_test || true
docker stop "$CONTAINER_NAME" || true
docker rm "$CONTAINER_NAME" || true