Skip to content

Conversation

@anjannath
Copy link
Member

@anjannath anjannath commented Nov 4, 2025

with the self sufficient changes for the bundle
have to make sure it runs well with CRC as well
as via the cloud-init path with an user-data

for crc we have a test in openshift-ci that run
the bundle with CRC once it is generated

this adds a shell script, which we'll run on CI
for every PR to catch any regressions

Assisted-By: Gemini

Needs: cfergeau/podman#37

Summary by CodeRabbit

  • New Features
    • Added an automated provisioning script for end-to-end local cluster orchestration: VM provisioning, SSH key handling, dependency validation, cloud-init generation, DNS/port exposure, kubeconfig retrieval, and cluster readiness verification.
  • Chores
    • CI workflow updated to use the new provisioning flow, streamlining local VM setup while retaining test execution and artifact preparation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

Walkthrough

Adds a new Bash orchestration script that provisions a CRC-based VM via MacAdam (SSH keypair, dependencies, cloud-init, disk extraction, MacAdam boot, gvproxy port forwarding, DNS, kubeconfig retrieval) and updates CI to invoke this script instead of the previous CRC bootstrap flow.

Changes

Cohort / File(s) Summary
New orchestration script
run-with-macadam.sh
Adds a complete Bash orchestrator with 14 functions: lifecycle (cleanup, die, main), SSH key generation, dependency checks, resource/secret loading, cloud-init generation, CRC disk extraction, MacAdam presence/start, gvproxy port forwarding, API DNS setup, kubeconfig retrieval, and cluster readiness checks; includes trap-based cleanup and progressive logging.
CI entrypoint update
ci.sh
Replaces previous CRC bootstrap/build/start sequence with a streamlined invocation of run-with-macadam.sh; sets CRC_BUNDLE_PATH and PULL_SECRET_PATH, preserves test execution steps and artifact preparation, and removes earlier crc build/move steps.

Sequence Diagram

sequenceDiagram
    actor User
    participant Script as run-with-macadam.sh
    participant SSH as SSH Keypair
    participant Deps as Host deps (gvproxy, macadam, oc, jq)
    participant CRC as CRC bundle/disk image
    participant MacAdam as MacAdam VM runtime
    participant GV as gvproxy
    participant DNS as DNS config
    participant K8s as Kubernetes API

    User->>Script: run main()
    Script->>Script: setup trap/cleanup
    Script->>SSH: generate_ssh_keypair()
    SSH-->>Script: keys ready
    Script->>Deps: ensure_deps()
    Deps-->>Script: deps verified
    Script->>CRC: extract_disk_img()
    CRC-->>Script: disk image
    Script->>MacAdam: ensure_macadam_exists()
    MacAdam-->>Script: ready
    Script->>MacAdam: start_macadam_vm(disk, cloud-init)
    MacAdam-->>Script: VM running
    Script->>MacAdam: wait for SSH
    Script->>GV: forward_port_gvproxy(api ports)
    GV-->>Script: ports exposed
    Script->>DNS: add_api_server_dns()
    DNS-->>Script: DNS configured
    Script->>K8s: wait for API & get_kubeconfig()
    K8s-->>Script: kubeconfig / API ready
    Script->>K8s: check_cluster_status()
    K8s-->>Script: cluster stable
    Script-->>User: done
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • trap-based cleanup and resource teardown correctness
    • error handling via die() across steps and exit-code interpretations
    • security of SSH key generation and secret loading paths
    • cloud-init templating and variable interpolation
    • interactions with external tooling (macadam, gvproxy, oc) and their assumptions
    • polling/wait loops for SSH, API readiness, and cluster stability

Suggested reviewers

  • gbraad
  • praveenkumar

Poem

🐰 I dug a tunnel, scripts in paw,

Keys and clouds and DNS to draw,
MacAdam hums, gvproxy lights the way,
Kubernetes wakes at break of day,
Hoppy deploys — a bashy hooray! 🚀

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'wip: ssf: add script to start bundle with macadam' accurately reflects the main change: adding a shell script (run-with-macadam.sh) to start a CRC bundle with macadam as part of self-sufficient bundle changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
run-with-macadam.sh (2)

284-292: Fragile regex extraction of SSH port from gvproxy.

Similar to the socket path extraction, the sed regex sed -n 's/.*-ssh-port \([0-9]*\).*/\1/p' is brittle. If the command-line format changes, this fails silently.

Verify the format of gvproxy's -ssh-port flag and consider extracting both socket and port using a more robust approach (e.g., structured parsing or environment variables).


277-278: Hard-coded 180-second sleep inflexible for varying infra.

The sleep 180 assumes all VMs take exactly 3 minutes to boot, which may not hold across different hardware or CI runners. For slower infra, the wait is insufficient; for faster infra, it wastes time. Consider polling with timeout instead.

echo "Waiting for VM to start (up to 10 minutes)..."
local timeout=600
local elapsed=0
while [[ $elapsed -lt $timeout ]]; do
    if macadam ssh crc-ng -- exit 0 2>/dev/null; then
        echo "VM is reachable."
        break
    fi
    sleep 5
    ((elapsed += 5))
done
if [[ $elapsed -ge $timeout ]]; then
    die "VM did not become reachable within ${timeout}s."
fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2914074 and 8c6368f.

📒 Files selected for processing (1)
  • run-with-macadam.sh (1 hunks)

@praveenkumar
Copy link
Member

This script should be added as part of CI run.

GVPROXY_SOCKET_PATH=$(echo "${gvproxy_cmd}" | sed -n 's/.*-services unix:\/\/\([^ ]*\).*/\1/p')
if [[ -z "${GVPROXY_SOCKET_PATH}" ]]; then
die "Could not find the gvproxy socket path."
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is / should be exposed in macadam inspect

content: '$PASS_DEVELOPER'
permissions: '0644'
- path: /opt/crc/ocp-custom-domain.service.done
permissions: '0644'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this PR, but hopefully handling of custom domains can be improved so that this is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ocp-custom-domain.service has ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh, will it be run with macadam?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While running with macadam with a cloud-init user-data it is running in self-sufficient mode (i.e CRC_SEFL_SUFFICIENT=1 and systemd services doing the setup)

so this service for domain name change will also run, I think we could add another config variable in the /etc/sysconfig/crc-env to enable/disable it and provide a proper way to control it, for now this is a hack to skip the service

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
run-with-macadam.sh (6)

17-22: cleanup() exits early on macadam failure, blocking file cleanup.

The macadam rm command at line 21 fails if macadam is unavailable or the VM doesn't exist. Since cleanup() runs under set -e, it exits with error status before reaching the rm -f statements, leaving temporary files behind and violating the trap contract.

Apply this diff:

  function cleanup() {
      echo "--- Cleaning up ---"
      rm -f user-data crc.img crc.qcow2 bundle.tar kubeconfig "${PRIV_KEY_PATH}" "${PUB_KEY_PATH}"
-     macadam rm crc-ng --force
+     macadam rm crc-ng --force 2>/dev/null || true
  }

149-164: macadam version not pinned; "latest" tag breaks reproducibility and complicates CI debugging.

Line 151 uses the mutable latest release tag, which allows different CI runs to download different versions. This prevents reproducible builds and complicates root-cause analysis when failures occur. Additionally, lines 155 and 159 invoke sudo without verifying passwordless access; if the CI environment requires a password, the script will hang or fail silently.

Pin the macadam version and verify passwordless sudo:

+ # Ensure passwordless sudo is available for macadam installation
+ if ! sudo -n true 2>/dev/null; then
+     die "This script requires passwordless sudo access to install macadam to /opt/macadam."
+ fi
+
- local version="latest"
+ # Pin to a specific stable version for reproducibility; update when macadam releases compatible changes
+ local version="v0.0.1"  # Replace with current stable version
  local url="https://github.com/crc-org/macadam/releases/download/${version}/macadam-linux-amd64"

Alternatively, install to a user-writable directory (e.g., $HOME/.local/bin) to avoid sudo altogether.


58-72: xtrace disabled but never restored, impairing script debugging.

Line 61 disables xtrace to protect secrets, but there is no corresponding restoration. If the script is invoked with bash -x for debugging, tracing remains disabled for all subsequent code, hindering troubleshooting.

Apply this diff to capture and restore xtrace state:

  function load_resources() {
      # Unset xtrace to prevent secrets from being exposed if the script is run with -x
+     local xtrace_was_set=0
+     if [[ "$-" == *x* ]]; then
+         xtrace_was_set=1
+     fi
      set +x
      echo "--- Loading resources ---"
      if [[ -z "${PULL_SECRET_PATH}" || ! -f "${PULL_SECRET_PATH}" ]]; then
          die "Path to pull secret file must be set via PULL_SECRET_PATH, and the file must exist."
      fi
      if [[ ! -f "${PUB_KEY_PATH}" ]]; then
          die "Public key file not found at ${PUB_KEY_PATH}. This should have been generated."
      fi
  
      PULL_SECRET=$(cat "${PULL_SECRET_PATH}")
      PUB_KEY=$(cat "${PUB_KEY_PATH}")
+     if [[ $xtrace_was_set -eq 1 ]]; then
+         set -x
+     fi
  }

180-182: macadam start failure does not halt execution, causing silent cascade failures.

When macadam start crc-ng fails (line 180), the script only echoes a message and continues. Downstream functions (DNS setup, port forwarding, kubeconfig retrieval) will execute against a non-running VM, causing confusing failures downstream instead of a clear error at the point of failure.

Replace with immediate termination:

- if ! macadam start crc-ng --log-level debug; then
-     echo "Machine didn't come up in time"
- fi
+ if ! macadam start crc-ng --log-level debug; then
+     die "Machine didn't come up in time"
+ fi

230-237: Regex for gvproxy socket path extraction is fundamentally broken.

The sed regex at line 234 ('s/.*-services unix:\/\/\([^ ]*\).*/\1/p') does not match actual gvproxy command-line output. Gvproxy uses --listen unix:///path (double dash, triple slashes), but the regex looks for -services unix:// (single dash, double slashes). This regex will silently fail to extract the socket path, causing the check at line 235 to die with a generic error message that masks the real problem.

Replace with correct extraction targeting the --listen flag:

- GVPROXY_SOCKET_PATH=$(echo "${gvproxy_cmd}" | sed -n 's/.*-services unix:\/\/\([^ ]*\).*/\1/p')
+ GVPROXY_SOCKET_PATH=$(echo "${gvproxy_cmd}" | sed -n 's/.*--listen unix:\/\/\/\([^ ]*\).*/\1/p')

Verify this against actual gvproxy process output (e.g., pgrep -af gvproxy) to confirm the format matches.


74-114: xtrace disabled but never restored; credentials written with world-readable permissions.

Line 79 disables xtrace to protect secrets in the cloud-init heredoc, but it is never restored, compounding the issue from load_resources(). Additionally, sensitive credentials at lines 100–108 (pull-secret, pass_kubeadmin, pass_developer) are written with permissions: '0644', making them world-readable. These should be restricted to 0600 (owner only).

Apply this diff:

  function gen_cloud_init() {
      echo "--- Generating cloud-init user-data ---"
      rm -f user-data
      # Unset xtrace to prevent secrets from being exposed if the script is run with -x
+     local xtrace_was_set=0
+     if [[ "$-" == *x* ]]; then
+         xtrace_was_set=1
+     fi
      set +x
      cat <<EOF >user-data
  #cloud-config
  runcmd:
    - systemctl enable --now kubelet
  write_files:
  - path: /home/core/.ssh/authorized_keys
    content: '$PUB_KEY'
    owner: core
    permissions: '0600'
  - path: /opt/crc/id_rsa.pub
    content: '$PUB_KEY'
    owner: root:root
    permissions: '0644'
  - path: /etc/sysconfig/crc-env
    content: |
      CRC_SELF_SUFFICIENT=1
      CRC_NETWORK_MODE_USER=1
    owner: root:root
    permissions: '0644'
  - path: /opt/crc/pull-secret
    content: |
      $PULL_SECRET
-   permissions: '0644'
+   permissions: '0600'
  - path: /opt/crc/pass_kubeadmin
    content: '$PASS_KUBEADMIN'
-   permissions: '0644'
+   permissions: '0600'
  - path: /opt/crc/pass_developer
    content: '$PASS_DEVELOPER'
-   permissions: '0644'
+   permissions: '0600'
  - path: /opt/crc/ocp-custom-domain.service.done
    permissions: '0644'
  EOF
      USER_DATA_PATH="$(pwd)/user-data"
      echo "cloud-init user-data file created."
+     if [[ $xtrace_was_set -eq 1 ]]; then
+         set -x
+     fi
  }
🧹 Nitpick comments (2)
run-with-macadam.sh (2)

257-298: SSH port extraction relies on fragile regex pattern.

The regex at line 272 ('s/.*-ssh-port \([0-9]*\).*/\1/p') extracts the SSH port from gvproxy command-line output. While this pattern is simpler and likely more robust than the socket path extraction, it still assumes a fixed flag format and no embedded spaces. If gvproxy's command-line interface evolves, this extraction will silently fail. At a minimum, verify this regex against actual pgrep -af gvproxy output for your environment.

Consider validating the extracted port (ensure it's a valid port number between 1–65535) to catch extraction failures early:

if [[ -z "${ssh_port}" || ! "${ssh_port}" =~ ^[0-9]+$ ]] || (( ssh_port < 1 || ssh_port > 65535 )); then
    die "Invalid SSH port extracted from gvproxy: ${ssh_port}"
fi

1-16: Script setup and global variables are well-organized.

The use of set -euo pipefail enables strict error handling. Constants are properly marked readonly, and mutable globals are clearly documented. The default bundle path includes an architecture-specific reference (arm64); consider making this configurable or architecture-agnostic for broader CI portability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6368f and 93712c9.

📒 Files selected for processing (1)
  • run-with-macadam.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-02T13:13:28.229Z
Learnt from: kpouget
Repo: crc-org/snc PR: 1168
File: systemd/dnsmasq.sh.template:3-12
Timestamp: 2025-10-02T13:13:28.229Z
Learning: The file `/etc/sysconfig/crc-env` in the crc-org/snc repository contains configuration variables and file paths (e.g., `CRC_PASS_KUBEADMIN_PATH`, `CRC_PUB_KEY_PATH`, `CRC_NETWORK_MODE_USER`, `APPS_DOMAIN`), not the actual secrets. The secrets themselves are stored in separate files referenced by these paths. Sourcing `crc-env` with xtrace enabled does not leak sensitive information.

Applied to files:

  • run-with-macadam.sh
📚 Learning: 2025-10-07T07:03:06.884Z
Learnt from: kpouget
Repo: crc-org/snc PR: 1168
File: systemd/crc-self-sufficient-env.sh:10-17
Timestamp: 2025-10-07T07:03:06.884Z
Learning: In the crc-org/snc repository, the CRC_CLOUD environment variable has been removed from the codebase. The self-sufficient environment check in systemd/crc-self-sufficient-env.sh now only validates CRC_SELF_SUFFICIENT=1, not CRC_CLOUD.

Applied to files:

  • run-with-macadam.sh
🔇 Additional comments (5)
run-with-macadam.sh (5)

31-42: SSH keypair generation is well-implemented.

The function correctly generates a temporary RSA keypair with an empty passphrase, suitable for CI automation. The cleanup function (line 20) properly removes these temporary files on exit.


44-56: Dependency check and installation is straightforward and pragmatic for CI.

The function correctly uses command -v to detect missing dependencies and attempts installation via yum if needed. This is appropriate for CI environments where tools may not be pre-installed.


116-133: Disk image extraction is robust with proper validation.

The function correctly uses tar --zstd to decompress and stream the qcow2 disk image from the bundle. The validation check at line 128 (-s) ensures the file is non-empty, catching extraction failures early.


185-197: Port forwarding via gvproxy HTTP API is well-implemented.

The function correctly uses curl with --unix-socket to communicate with gvproxy and expose the API server port. Error handling is in place with the -f flag.


300-331: Cluster status validation and main orchestration flow are well-structured.

The cluster readiness check uses appropriate timeout and stability thresholds. The main orchestration flow correctly chains dependency checks, resource preparation, VM provisioning, cluster setup, and validation. The trap for cleanup ensures proper resource teardown on exit.

@anjannath anjannath force-pushed the ssf-test branch 2 times, most recently from d3bd852 to 4f7813e Compare November 5, 2025 12:01
@anjannath anjannath self-assigned this Nov 12, 2025
@anjannath anjannath moved this to Work In Progress in Project planning: crc Nov 12, 2025
@cfergeau
Copy link
Contributor

this adds a shell script, which we'll run on CI
for every PR to catch any regressions

Since it’s preferable not to maintain such a shell script in the long run, if we add it to CI, we’ll have to remember to replace it with something else at some point.

@anjannath anjannath moved this from Work In Progress to Hold in Project planning: crc Nov 24, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from anjannath. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anjannath
Copy link
Member Author

this adds a shell script, which we'll run on CI
for every PR to catch any regressions

Since it’s preferable not to maintain such a shell script in the long run, if we add it to CI, we’ll have to remember to replace it with something else at some point.

once we have the self sufficient bundle/macadam support with crc (crc-ng) then we can just remove this script, as then test will be with crc itself

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
run-with-macadam.sh (5)

59-72: Restore xtrace setting after loading secrets.

The function disables xtrace at line 61 but never restores it, so tracing remains disabled if the script was invoked with bash -x. This breaks debugging for the remainder of the script execution.

Apply this diff to capture and restore the original trace state:

 function load_resources() {
     # Unset xtrace to prevent secrets from being exposed if the script is run with -x
+    local xtrace_state
+    xtrace_state=$([[ $- == *x* ]] && echo "set -x" || echo "set +x")
     set +x
     echo "--- Loading resources ---"
     if [[ -z "${PULL_SECRET_PATH}" || ! -f "${PULL_SECRET_PATH}" ]]; then
         die "Path to pull secret file must be set via PULL_SECRET_PATH, and the file must exist."
     fi
     if [[ ! -f "${PUB_KEY_PATH}" ]]; then
         die "Public key file not found at ${PUB_KEY_PATH}. This should have been generated."
     fi
 
     PULL_SECRET=$(cat "${PULL_SECRET_PATH}")
     PUB_KEY=$(cat "${PUB_KEY_PATH}")
+    eval "$xtrace_state"
 }

75-117: Restore xtrace and fix credentials file permissions.

Two issues in this function:

  1. Trace restoration (line 79): Like load_resources(), xtrace is disabled but never restored, breaking debugging.
  2. World-readable secrets (lines 100-108): Files containing credentials (pull-secret, pass_kubeadmin, pass_developer) are written with 0644 permissions, making them readable by any user on the system. These should be 0600.

Apply this diff to fix both issues:

 function gen_cloud_init() {
     echo "--- Generating cloud-init user-data ---"
     rm -f user-data
     # Unset xtrace to prevent secrets from being exposed if the script is run with -x
+    local xtrace_state
+    xtrace_state=$([[ $- == *x* ]] && echo "set -x" || echo "set +x")
     set +x
     cat <<EOF >user-data
 #cloud-config
 runcmd:
   - systemctl enable --now kubelet
 write_files:
 - path: /home/core/.ssh/authorized_keys
   content: '$PUB_KEY'
   owner: core
   permissions: '0600'
 - path: /opt/crc/id_rsa.pub
   content: '$PUB_KEY'
   owner: root:root
   permissions: '0644'
 - path: /etc/sysconfig/crc-env
   content: |
     CRC_SELF_SUFFICIENT=1
     CRC_NETWORK_MODE_USER=1
   owner: root:root
   permissions: '0644'
 - path: /opt/crc/pull-secret
   content: |
     $PULL_SECRET
-  permissions: '0644'
+  permissions: '0600'
 - path: /opt/crc/pass_kubeadmin
   content: '$PASS_KUBEADMIN'
-  permissions: '0644'
+  permissions: '0600'
 - path: /opt/crc/pass_developer
   content: '$PASS_DEVELOPER'
-  permissions: '0644'
+  permissions: '0600'
 - path: /opt/crc/ocp-custom-domain.service.done
   permissions: '0644'
 EOF
     USER_DATA_PATH="$(pwd)/user-data"
     if ! cloud-init schema -c "${USER_DATA_PATH}" --annotate; then
         die "Invalid cloud-config generated."
     fi
     echo "cloud-init user-data file created."
+    eval "$xtrace_state"
 }

139-167: Pin macadam version and verify passwordless sudo access.

Two issues in this function:

  1. Mutable version tag (lines 154–155): Using "latest" is not reproducible and can introduce breaking changes between CI runs. Pin to a specific stable release.
  2. Unverified sudo privilege (lines 158–162): The script calls sudo mkdir -p /opt/macadam and sudo mv without checking if passwordless sudo is available. In CI environments where sudo requires a password, this will hang or fail silently.

Apply this diff to pin the version and verify sudo privileges:

     echo "macadam not found, downloading..."
 
-    local version="latest"
+    local version="v0.1.0"  # Pin to a stable version; update when new releases are validated
     local url="https://github.com/crc-org/macadam/releases/download/${version}/macadam-linux-amd64"
 
     echo "Downloading from ${url}"
+    if ! sudo -n true 2>/dev/null; then
+        die "This script requires passwordless sudo access. Please configure sudo NOPASSWD for this user."
+    fi
     sudo mkdir -p /opt/macadam
     if ! curl -fL -o /tmp/macadam "${url}"; then
         die "Failed to download macadam."
     fi
     sudo mv /tmp/macadam /opt/macadam/macadam
     sudo chmod +x /opt/macadam/macadam
 
     export PATH="/opt/macadam:$PATH"
     echo "macadam downloaded to /opt/macadam and added to PATH."

Documentation note: After determining the stable macadam version, update the version variable and document it in the PR description or a configuration file.


198-214: Exit on macadam VM startup failure.

When macadam start fails (line 211), the script logs a message but continues execution, causing downstream operations to silently fail. This should halt the script immediately.

Apply this diff to exit on failure:

 function start_macadam_vm() {
     echo "--- Creating VM ---"
     macadam init \
         "${DISK_IMAGE_PATH}" \
         --disk-size 31 \
         --memory 11264 \
         --name crc-ng \
         --username core \
         --ssh-identity-path "${PRIV_KEY_PATH}" \
         --cpus 6 \
         --cloud-init "${USER_DATA_PATH}" \
         --log-level debug
 
     if ! macadam start crc-ng --log-level debug; then
-        echo "Machine didn't come up in time"
+        die "Machine didn't come up in time"
     fi
 }

274-325: Fix gvproxy socket path extraction regex.

The regex at line 304 is fundamentally broken and will not match actual gvproxy command-line output:

# Current (incorrect):
sed -n 's/.*-services unix:\/\/\([^ ]*\).*/\1/p'

# Expected gvproxy format:
gvproxy --services --listen unix:///tmp/network.sock

Issues:

  • Looks for -services (single dash) but the flag is --services (double dash)
  • Looks for unix:// (two slashes) but the actual format uses unix:/// (three slashes for UNIX socket paths)
  • The regex will fail silently and extract nothing, causing the script to die with a confusing error

Apply this diff to extract the socket path from the correct --listen flag:

     local gvproxy_cmd
     if ! gvproxy_cmd=$(pgrep -af gvproxy); then
         die "The gvproxy process is not running."
     fi
 
-    GVPROXY_SOCKET_PATH=$(echo "${gvproxy_cmd}" | sed -n 's/.*-services unix:\/\/\([^ ]*\).*/\1/p')
+    # Extract the socket path from the --listen flag (format: --listen unix:///path/to/socket)
+    GVPROXY_SOCKET_PATH=$(echo "${gvproxy_cmd}" | sed -n 's/.*--listen unix:\/\/\/\([^ ]*\).*/\1/p')
     if [[ -z "${GVPROXY_SOCKET_PATH}" ]]; then
         die "Could not find the gvproxy socket path."
     fi

For additional robustness, consider validating that the extracted path exists and is a UNIX socket:

if [[ ! -S "${GVPROXY_SOCKET_PATH}" ]]; then
    die "Extracted gvproxy socket path does not exist or is not a UNIX socket: ${GVPROXY_SOCKET_PATH}"
fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f7813e and 2fee6c1.

📒 Files selected for processing (2)
  • ci.sh (1 hunks)
  • run-with-macadam.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-02T13:13:28.229Z
Learnt from: kpouget
Repo: crc-org/snc PR: 1168
File: systemd/dnsmasq.sh.template:3-12
Timestamp: 2025-10-02T13:13:28.229Z
Learning: The file `/etc/sysconfig/crc-env` in the crc-org/snc repository contains configuration variables and file paths (e.g., `CRC_PASS_KUBEADMIN_PATH`, `CRC_PUB_KEY_PATH`, `CRC_NETWORK_MODE_USER`, `APPS_DOMAIN`), not the actual secrets. The secrets themselves are stored in separate files referenced by these paths. Sourcing `crc-env` with xtrace enabled does not leak sensitive information.

Applied to files:

  • ci.sh
  • run-with-macadam.sh
📚 Learning: 2025-10-07T07:03:06.884Z
Learnt from: kpouget
Repo: crc-org/snc PR: 1168
File: systemd/crc-self-sufficient-env.sh:10-17
Timestamp: 2025-10-07T07:03:06.884Z
Learning: In the crc-org/snc repository, the CRC_CLOUD environment variable has been removed from the codebase. The self-sufficient environment check in systemd/crc-self-sufficient-env.sh now only validates CRC_SELF_SUFFICIENT=1, not CRC_CLOUD.

Applied to files:

  • ci.sh
  • run-with-macadam.sh
🪛 Shellcheck (0.11.0)
ci.sh

[warning] 59-59: Brace expansions and globs are literal in assignments. Quote it or use an array.

(SC2125)

🔇 Additional comments (8)
run-with-macadam.sh (8)

328-368: Approve kubeconfig retrieval logic.

The kubeconfig retrieval function handles SSH connection establishment, API server readiness, and kubeconfig update with reasonable retry logic and error handling. The ssh-port extraction from gvproxy process arguments follows a sensible pattern and the subsequent certificate rotation wait is appropriate.

Verify that the gvproxy -ssh-port flag format (line 342) matches your gvproxy version's output. If you update the --listen unix:/// extraction logic per the earlier comment, cross-check that all pgrep-based flag extractions are consistent.


371-377: Cluster status check looks good.

The cluster stability verification with appropriate timeouts and error handling is correct.


380-401: Main orchestration flow is logical.

The function properly sets up cleanup via EXIT trap and sequences the provisioning steps in the correct order. The final cluster status validation provides good end-to-end verification.

Verify that all critical issues flagged in earlier comments are resolved before merging, as they directly impact the success of this orchestration pipeline.


18-42: SSH keypair generation and cleanup functions look good.

cleanup() properly handles macadam failures with || true to ensure all cleanup steps complete, and generate_ssh_keypair() correctly generates and validates the temporary SSH keypair.


45-56: Dependency checking and installation is solid.

The ensure_deps() function correctly checks for required tools and attempts installation via yum with appropriate error handling.


120-136: Disk image extraction with validation is correct.

The function properly extracts the disk image from the CRC bundle using tar with zstd decompression and validates that the extraction succeeded by checking file size.


217-253: gvproxy capability verification is comprehensive.

The function properly locates gvproxy, checks for the required capability, sets it if missing, and verifies success with appropriate error messages at each step.


256-271: Port forwarding logic is correct.

The function properly constructs gvproxy API requests to expose the required ports (6443, 443, 80) with proper error handling for each exposure attempt.

@anjannath anjannath moved this from Hold to Work In Progress in Project planning: crc Dec 3, 2025
@anjannath
Copy link
Member Author

/test e2e-snc

@gbraad
Copy link
Collaborator

gbraad commented Dec 8, 2025

I hope we do not replace this with 'crc' itself to test with, as that would re-introduce a dependency on what the script should run with. this exists as a smoke test, not a full test; even if crc would be used, 20 to 30% would still exists from this script, like download, etc. so if it is needed, better go to as python script instead to collect the pieces and feed macadam.

@praveenkumar
Copy link
Member

I hope we do not replace this with 'crc' itself to test with, as that would re-introduce a dependency on what the script should run with. this exists as a smoke test, not a full test; even if crc would be used, 20 to 30% would still exists from this script, like download, etc. so if it is needed, better go to as python script instead to collect the pieces and feed macadam.

This is not going to remove crc itself from testing the generating bundle which should be same as before. I think we should first do the smoke test with this script which will download bundle and put is the location then crc test also reuse it. As of now since bundle is generated on same machine where test is going to run so no download needed just the path.

with the self sufficient changes for the bundle
have to make sure it runs well with CRC as well
as via the cloud-init path with an user-data

for crc we have a test in openshift-ci that run
the bundle with CRC once it is generated

this adds a shell script, which we'll run on CI
for every PR to catch any regressions

Assisted-By: Gemini
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ci.sh (1)

59-62: [Duplicate] Fix unsafe glob expansion and mask return value issue on line 60.

This issue was previously flagged in code review and remains unresolved. Line 60 uses an unquoted glob with ls, which fails silently if zero or multiple bundles exist, and export masks the ls exit code (ShellCheck SC2155).

Apply this diff to safely resolve the glob and validate exactly one bundle exists:

 # run-with-macadam smoke test
-export CRC_BUNDLE_PATH=$(ls ./crc_libvirt_*.crcbundle)
+shopt -s nullglob
+bundles=(./crc_libvirt_*.crcbundle)
+if [[ ${#bundles[@]} -ne 1 ]]; then
+  echo "Error: Expected exactly one bundle file, found ${#bundles[@]}" >&2
+  exit 1
+fi
+export CRC_BUNDLE_PATH="${bundles[0]}"
 export PULL_SECRET_PATH="${HOME}/pull-secret"
🧹 Nitpick comments (1)
ci.sh (1)

64-72: Consider removing or documenting the commented-out CRC code block.

Lines 64–72 contain the previous CRC clone/build/config/start workflow, now replaced by run-with-macadam.sh. Since this is a WIP PR, the comments may be temporary scaffolding. Once the macadam approach is validated and stable, remove these lines to reduce visual clutter and avoid confusion about fallback paths.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9abf500 and e4c2591.

📒 Files selected for processing (1)
  • ci.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-07T07:03:06.884Z
Learnt from: kpouget
Repo: crc-org/snc PR: 1168
File: systemd/crc-self-sufficient-env.sh:10-17
Timestamp: 2025-10-07T07:03:06.884Z
Learning: In the crc-org/snc repository, the CRC_CLOUD environment variable has been removed from the codebase. The self-sufficient environment check in systemd/crc-self-sufficient-env.sh now only validates CRC_SELF_SUFFICIENT=1, not CRC_CLOUD.

Applied to files:

  • ci.sh
📚 Learning: 2025-10-02T13:13:28.229Z
Learnt from: kpouget
Repo: crc-org/snc PR: 1168
File: systemd/dnsmasq.sh.template:3-12
Timestamp: 2025-10-02T13:13:28.229Z
Learning: The file `/etc/sysconfig/crc-env` in the crc-org/snc repository contains configuration variables and file paths (e.g., `CRC_PASS_KUBEADMIN_PATH`, `CRC_PUB_KEY_PATH`, `CRC_NETWORK_MODE_USER`, `APPS_DOMAIN`), not the actual secrets. The secrets themselves are stored in separate files referenced by these paths. Sourcing `crc-env` with xtrace enabled does not leak sensitive information.

Applied to files:

  • ci.sh
🪛 Shellcheck (0.11.0)
ci.sh

[warning] 60-60: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (1)
ci.sh (1)

74-76: Verify KUBECONFIG path matches output of run-with-macadam.sh; add defensive checks.

Line 75 hardcodes KUBECONFIG to ${HOME}/.crc/machines/crc/kubeconfig, assuming run-with-macadam.sh (invoked on line 62) creates kubeconfig at this exact location. If the macadam workflow outputs it elsewhere, the tests on line 76 will fail with a confusing error.

Verify that run-with-macadam.sh guarantees this output path and add a defensive check:

 mkdir -p crc-tmp-install-data/test-artifacts
 export KUBECONFIG="${HOME}"/.crc/machines/crc/kubeconfig
+if [[ ! -f "${KUBECONFIG}" ]]; then
+  echo "Error: KUBECONFIG not found at ${KUBECONFIG}" >&2
+  exit 1
+fi

Alternatively, if run-with-macadam.sh exports KUBECONFIG, source or capture that value instead of hardcoding the path.

@openshift-ci
Copy link

openshift-ci bot commented Dec 9, 2025

@anjannath: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-snc e4c2591 link true /test e2e-snc

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@anjannath anjannath moved this from Work In Progress to Hold in Project planning: crc Dec 17, 2025
@anjannath anjannath moved this from Hold to Ready for review in Project planning: crc Jan 7, 2026
@anjannath anjannath moved this from Ready for review to Work In Progress in Project planning: crc Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Work In Progress

Development

Successfully merging this pull request may close these issues.

4 participants