Skip to content

Add unified k3d local development with helm mode#485

Draft
a5anka wants to merge 8 commits intomainfrom
feat/unified-k3d-helm-mode
Draft

Add unified k3d local development with helm mode#485
a5anka wants to merge 8 commits intomainfrom
feat/unified-k3d-helm-mode

Conversation

@a5anka
Copy link
Member

@a5anka a5anka commented Mar 4, 2026

Summary

The current local dev setup requires Docker Compose for AMP services and a separate k3d cluster for OpenChoreo, bridging networks via host.docker.internal hacks and custom kubeconfig generation. This adds a DEV_MODE=helm option that runs all services inside a single k3d cluster, matching the production Kubernetes topology.

Docker Compose remains the default mode — existing make dev-up workflows are unchanged.

Also adds dev-pause / dev-resume targets to stop and start Colima + k3d, freeing laptop resources when not coding.

Key developer workflows:

  • DEV_MODE=helm make setup — one-time setup with images built and deployed
  • make helm-sync-api — rebuild API + import + restart (~35s)
  • make helm-sync-console — rebuild Console + import + restart (~80s)
  • make dev-pause / make dev-resume — save laptop resources between sessions

Changes

File Change
deployments/dev-cluster-config.yaml New k3d config extending single-cluster-config.yaml with AMP port mappings (3000, 9000, 9243, 9098, OTel)
deployments/scripts/build-and-import.sh New script to build Docker images from production Dockerfiles and import into k3d (accepts component names as args)
deployments/scripts/helm-deploy-amp.sh New script for helm upgrade --install / --uninstall with local chart and values
deployments/scripts/env.sh Add AMP_NAMESPACE, AMP_RELEASE_NAME, AMP_IMAGE_TAG variables
deployments/scripts/setup-k3d.sh Use dev-cluster-config.yaml when DEV_MODE=helm; validate AMP port mappings on existing clusters
deployments/values/values-local.yaml Local image repos (amp-api, amp-console) with pullPolicy: Never, in-cluster service URLs (no host.docker.internal)
Makefile DEV_MODE dispatch for setup/dev-up/dev-down/dev-rebuild/dev-logs + helm-* targets + dev-pause/dev-resume
CLAUDE.md Document both development modes and helm-sync workflow

Summary by CodeRabbit

  • New Features

    • Added Helm-based development mode with full lifecycle commands (build/import/deploy, upgrade, sync, logs) and k3d support
    • New build-and-import flow and Helm deploy flow for local clusters
    • Pause/resume cluster controls for resource-aware development
  • Enhancements

    • Mode-aware setup, help, and daily workflows with clearer mode messaging
    • Improved deployment reliability with server-side apply, readiness waits, and better status/logging
  • Chores

    • Added local Helm values and environment configuration for Helm dev mode

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 647aad54-a8f5-455e-8991-09e4f68a09a4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Helm-based development mode (DEV_MODE=helm) alongside Compose mode: k3d cluster config and Helm values for local dev, scripts to build/import Docker images into k3d, Helm deployment/uninstall for AMP, and Makefile targets and mode-aware flows to drive Helm vs Compose workflows.

Changes

Cohort / File(s) Summary
Makefile Mode Support
Makefile
Adds public DEV_MODE (default compose) and many Helm-aware targets (helm-build, helm-import, helm-install/upgrade/sync/status/logs, dev-pause/dev-resume, status, etc.). Implements mode-aware help/setup/daily-dev dispatch and messaging.
k3d Cluster Configuration
deployments/dev-cluster-config.yaml, deployments/scripts/setup-k3d.sh
New k3d cluster config for Helm mode with extensive port mappings and registry mirror; setup-k3d.sh selects config via K3D_CONFIG, validates AMP ports when DEV_MODE=helm, and adjusts cluster create/validation flows.
Build & Import Script
deployments/scripts/build-and-import.sh
New script to build component images (api, console, traces-observer, evaluation-job), tag with AMP_IMAGE_TAG, and import images into a k3d cluster; accepts subset or all components and aggregates failures.
Helm Deploy Script
deployments/scripts/helm-deploy-amp.sh
New Helm deployment/uninstall script for AMP: verifies cluster access, creates namespace, updates chart deps, runs helm upgrade --install, waits for deployments and prints status/endpoints.
Environment Defaults
deployments/scripts/env.sh
Adds AMP_NAMESPACE, AMP_RELEASE_NAME, and AMP_IMAGE_TAG environment variables for Helm-driven workflows.
Helm Values
deployments/values/values-local.yaml
New local Helm values for dev: image repos/tags (pullPolicy: Never), LoadBalancer services, cluster-local endpoints, auth/logging overrides, and PostgreSQL persistence settings.
OpenChoreo Setup Changes
deployments/scripts/setup-openchoreo.sh
Replaced many client-side kubectl apply calls with server-side kubectl apply --server-side --force-conflicts -f; added CA-remediation annotation/patch steps, explicit readiness/wait loops, additional logging and non-fatal fallbacks.

Sequence Diagram

sequenceDiagram
    actor Dev as Developer
    participant Makefile
    participant BuildScript as build-and-import.sh
    participant Docker
    participant k3d as k3d Cluster
    participant Helm
    participant kubectl

    Dev->>Makefile: make helm-build DEV_MODE=helm
    Makefile->>BuildScript: invoke build-and-import.sh
    BuildScript->>Docker: docker build (api, console, observers...)
    Docker-->>BuildScript: images built
    BuildScript->>k3d: k3d image import (api, console...)
    k3d-->>BuildScript: images available

    Dev->>Makefile: make helm-install DEV_MODE=helm
    Makefile->>Helm: invoke helm-deploy-amp.sh
    Helm->>kubectl: create namespace / apply resources
    kubectl-->>Helm: resources created
    Helm->>kubectl: helm upgrade --install AMP
    kubectl->>k3d: schedule deployments
    k3d-->>kubectl: pods/objects created
    kubectl-->>Helm: readiness checks pass
    Helm-->>Dev: AMP deployed / status output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped through Makefile, k3d, and chart,
Built images, imported each tiny part.
Helm or Compose — choose your dev delight,
Pods wake, ports map, and everything’s alright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete against the required template. It lacks Purpose/Goals, User stories, Release notes, Documentation, Training, Certification, Marketing, Tests, Security, Samples, Related PRs, Migrations, Test environment, and Learning sections. Complete the PR description using the template: add Purpose with issue links, Goals, Approach, User stories, Release notes, Documentation links, and other required template sections.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a unified k3d local development environment with helm mode support.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/unified-k3d-helm-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
Makefile (1)

248-249: Avoid hard-coded k3d cluster name in Helm targets.

These lines hard-code openchoreo-local-v0.14.0, which can drift from shared env values and break image imports after version bumps.

Suggested fix
 CLUSTER_CONTEXT := k3d-openchoreo-local-v0.14.0
+CLUSTER_NAME := $(patsubst k3d-%,%,$(CLUSTER_CONTEXT))
...
-	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG) amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG) amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)
...
-	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)
...
-	`@k3d` image import amp-console:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-console:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)

Also applies to: 270-270, 279-279

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 248 - 249, The Makefile uses a hard-coded k3d cluster
name in the k3d image import command which will break when the cluster version
changes; replace the literal "openchoreo-local-v0.14.0" with a configurable
variable (e.g., K3D_CLUSTER or K3D_CLUSTER_NAME) and reference that variable in
the k3d image import invocation and the other occurrences (the lines with the
same import command at the noted locations), provide a sensible default for the
variable if unset, and update any related Helm targets to use this variable so
cluster name changes are centralized and no longer hard-coded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 93-109: The fenced ASCII-art block (the triple-backtick fence
containing the architecture diagram) lacks a language hint which triggers
markdownlint MD040; update the opening fence by adding a language tag (e.g.,
text) immediately after the three backticks for that ASCII diagram so it becomes
```text to silence the linter and preserve formatting.

In `@deployments/scripts/build-and-import.sh`:
- Around line 18-30: After sourcing env.sh, add a Bash version guard that checks
the current shell is Bash >= 4 and exits with a clear error if not; specifically
validate BASH_VERSINFO (or bash --version) before the script reaches the declare
-A usage for COMPONENT_CONTEXT and COMPONENT_IMAGE so the script fails with a
helpful message on macOS rather than crashing at the declare -A lines, and
include guidance to run with a newer bash (e.g., brew-installed bash) or using
env to find the correct interpreter.

In `@deployments/scripts/helm-deploy-amp.sh`:
- Around line 57-63: The kubectl wait call that uses "kubectl wait
--for=condition=Available deployment --all -n \"$AMP_NAMESPACE\" --context
\"${CLUSTER_CONTEXT}\" --timeout=300s 2>/dev/null || true" is suppressing
failures; remove the "|| true" and stop swallowing stderr so the script can
detect non-zero exit: run the kubectl wait without trailing "|| true", keep or
remove the 2>/dev/null as desired, and if the kubectl wait command fails, log an
error (including AMP_NAMESPACE and CLUSTER_CONTEXT) and exit with a non-zero
status so the script does not print "AMP deployed successfully!" on failed
readiness.

In `@deployments/scripts/setup-k3d.sh`:
- Around line 67-71: The port-check loop only tests 3000 and 9000; expand it to
include all Helm-required ports (9243, 9098, 21893, 22893, 22894) so the check
properly detects missing mappings: update the for PORT in ... loop that
references CLUSTER_NAME and appends to MISSING_PORTS (the docker port
"k3d-${CLUSTER_NAME}-serverlb" "${PORT}/tcp" call) to iterate over the full set
of ports and keep the same MISSING_PORTS behavior when a port is absent.

In `@Makefile`:
- Around line 260-265: The Makefile silences failures for the kubectl rollout
restart/status commands in the helm-sync target (and the similar target that
prints “Restart complete.”), so failed rollouts still show “Sync complete.”;
remove the trailing "|| true" from the kubectl commands (kubectl rollout restart
and kubectl rollout status) in the helm-sync rule and the other restart rule so
failures propagate and the Make target fails, or alternatively replace "|| true"
with explicit error-handling that logs the failure and exits non-zero; update
the commands referenced (kubectl rollout restart deployment -n $(AMP_NAMESPACE)
--context $(CLUSTER_CONTEXT) and kubectl rollout status deployment -n
$(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s) accordingly.
- Around line 221-224: The dev-migrate Makefile target unconditionally runs the
docker command which fails in Helm mode; update the dev-migrate target (symbol:
dev-migrate) to check the DEV_MODE variable and run the docker exec "docker exec
agent-manager-service sh -c 'cd /go/src && make dev-migrate'" only when DEV_MODE
is not "helm" (or when DEV_MODE equals "docker"/local), otherwise run the
Helm-appropriate migration command or no-op for Helm mode; ensure the
conditional uses the existing DEV_MODE variable and preserves the echo messages
around the chosen branch.

---

Nitpick comments:
In `@Makefile`:
- Around line 248-249: The Makefile uses a hard-coded k3d cluster name in the
k3d image import command which will break when the cluster version changes;
replace the literal "openchoreo-local-v0.14.0" with a configurable variable
(e.g., K3D_CLUSTER or K3D_CLUSTER_NAME) and reference that variable in the k3d
image import invocation and the other occurrences (the lines with the same
import command at the noted locations), provide a sensible default for the
variable if unset, and update any related Helm targets to use this variable so
cluster name changes are centralized and no longer hard-coded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a734be60-f635-47e1-bb65-32c33eaa2243

📥 Commits

Reviewing files that changed from the base of the PR and between a947bd4 and 130917c.

📒 Files selected for processing (8)
  • CLAUDE.md
  • Makefile
  • deployments/dev-cluster-config.yaml
  • deployments/scripts/build-and-import.sh
  • deployments/scripts/env.sh
  • deployments/scripts/helm-deploy-amp.sh
  • deployments/scripts/setup-k3d.sh
  • deployments/values/values-local.yaml

CLAUDE.md Outdated
Comment on lines +93 to +109
```
┌─────────────────────┐ ┌────────────────────────┐
│ Console (React) │────▶│ Agent Manager API │
│ Port 3000 │ │ (Go) Port 8080 │
└─────────────────────┘ └──────────┬─────────────┘
┌──────────────────────────────┼───────────────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ OpenChoreo │ │ Traces Observer │
│ Port 5432 │ │ (Kubernetes) │ │ Port 9098 │
└───────────────┘ └─────────────────┘ └────────┬────────┘
┌────────▼────────┐
│ OpenSearch │
└─────────────────┘
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language hint to the architecture fence.

Line 93 uses a fenced block without a language, which triggers markdownlint (MD040).

Suggested fix
-```
+```text
 ┌─────────────────────┐     ┌────────────────────────┐
 │  Console (React)    │────▶│  Agent Manager API     │
 │  Port 3000          │     │  (Go) Port 8080        │
 └─────────────────────┘     └──────────┬─────────────┘
                                        │
         ┌──────────────────────────────┼───────────────────────────────┐
         ▼                              ▼                               ▼
 ┌───────────────┐            ┌─────────────────┐             ┌─────────────────┐
 │  PostgreSQL   │            │   OpenChoreo    │             │ Traces Observer │
 │  Port 5432    │            │  (Kubernetes)   │             │   Port 9098     │
 └───────────────┘            └─────────────────┘             └────────┬────────┘
                                                                       │
                                                              ┌────────▼────────┐
                                                              │   OpenSearch    │
                                                              └─────────────────┘
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
┌─────────────────────┐ ┌────────────────────────┐
│ Console (React) │────▶│ Agent Manager API │
│ Port 3000 │ │ (Go) Port 8080 │
└─────────────────────┘ └──────────┬─────────────┘
┌──────────────────────────────┼───────────────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ OpenChoreo │ │ Traces Observer │
│ Port 5432 │ │ (Kubernetes) │ │ Port 9098 │
└───────────────┘ └─────────────────┘ └────────┬────────┘
┌────────▼────────┐
│ OpenSearch │
└─────────────────┘
```
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 93 - 109, The fenced ASCII-art block (the
triple-backtick fence containing the architecture diagram) lacks a language hint
which triggers markdownlint MD040; update the opening fence by adding a language
tag (e.g., text) immediately after the three backticks for that ASCII diagram so
it becomes ```text to silence the linter and preserve formatting.

Comment on lines +18 to +30
declare -A COMPONENT_CONTEXT=(
[api]="agent-manager-service"
[console]="console"
[traces-observer]="traces-observer-service"
[evaluation-job]="evaluation-job"
)

declare -A COMPONENT_IMAGE=(
[api]="amp-api"
[console]="amp-console"
[traces-observer]="amp-traces-observer"
[evaluation-job]="amp-evaluation-job"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n deployments/scripts/build-and-import.sh | head -n 35

Repository: wso2/agent-manager

Length of output: 1357


🏁 Script executed:

# Check what Bash version shows declare -A support issue
# First, let's see if the file exists and its content
file deployments/scripts/build-and-import.sh
wc -l deployments/scripts/build-and-import.sh

Repository: wso2/agent-manager

Length of output: 164


🏁 Script executed:

# Run the verification script from the review comment
echo "Checking /bin/bash version..."
/bin/bash --version | head -n1

echo
echo "Checking associative array support in /bin/bash..."
if /bin/bash -lc 'declare -A m=([k]=v); echo ok' >/dev/null 2>&1; then
  echo "PASS: /bin/bash supports associative arrays."
else
  echo "FAIL: /bin/bash does not support associative arrays."
fi

echo
echo "Locating associative-array usage in repo:"
rg -n 'declare -A' deployments/scripts/build-and-import.sh || echo "No matches found"

Repository: wso2/agent-manager

Length of output: 355


Add Bash 4+ version guard to prevent script failure on macOS.

The script uses declare -A (associative arrays) on lines 18 and 25, which requires Bash 4+. Default macOS /bin/bash is version 3.2 (due to Apple's GPL licensing constraints), causing the script to fail immediately on standard macOS systems before any build/import work runs.

Add this check after sourcing env.sh
 source "$SCRIPT_DIR/env.sh"

+# Require Bash 4+ for associative arrays.
+if [ "${BASH_VERSINFO[0]}" -lt 4 ]; then
+    echo "This script requires Bash 4+ (found ${BASH_VERSINFO[0]}.${BASH_VERSINFO[1]})."
+    echo "Use a newer bash (e.g., via Homebrew) or refactor to avoid associative arrays."
+    exit 1
+fi
+
 # Components and their build contexts (relative to ROOT_DIR)
 declare -A COMPONENT_CONTEXT=(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/build-and-import.sh` around lines 18 - 30, After sourcing
env.sh, add a Bash version guard that checks the current shell is Bash >= 4 and
exits with a clear error if not; specifically validate BASH_VERSINFO (or bash
--version) before the script reaches the declare -A usage for COMPONENT_CONTEXT
and COMPONENT_IMAGE so the script fails with a helpful message on macOS rather
than crashing at the declare -A lines, and include guidance to run with a newer
bash (e.g., brew-installed bash) or using env to find the correct interpreter.

Comment on lines +57 to +63
kubectl wait --for=condition=Available deployment --all \
-n "$AMP_NAMESPACE" \
--context "${CLUSTER_CONTEXT}" \
--timeout=300s 2>/dev/null || true

echo ""
echo "AMP deployed successfully!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t suppress deployment readiness failures.

Line 60 uses || true, so the script can print “AMP deployed successfully!” even when deployments are not actually Available.

Suggested fix
 kubectl wait --for=condition=Available deployment --all \
     -n "$AMP_NAMESPACE" \
     --context "${CLUSTER_CONTEXT}" \
-    --timeout=300s 2>/dev/null || true
+    --timeout=300s
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl wait --for=condition=Available deployment --all \
-n "$AMP_NAMESPACE" \
--context "${CLUSTER_CONTEXT}" \
--timeout=300s 2>/dev/null || true
echo ""
echo "AMP deployed successfully!"
kubectl wait --for=condition=Available deployment --all \
-n "$AMP_NAMESPACE" \
--context "${CLUSTER_CONTEXT}" \
--timeout=300s
echo ""
echo "AMP deployed successfully!"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/helm-deploy-amp.sh` around lines 57 - 63, The kubectl
wait call that uses "kubectl wait --for=condition=Available deployment --all -n
\"$AMP_NAMESPACE\" --context \"${CLUSTER_CONTEXT}\" --timeout=300s 2>/dev/null
|| true" is suppressing failures; remove the "|| true" and stop swallowing
stderr so the script can detect non-zero exit: run the kubectl wait without
trailing "|| true", keep or remove the 2>/dev/null as desired, and if the
kubectl wait command fails, log an error (including AMP_NAMESPACE and
CLUSTER_CONTEXT) and exit with a non-zero status so the script does not print
"AMP deployed successfully!" on failed readiness.

Comment on lines +67 to +71
for PORT in 3000 9000; do
if ! docker port "k3d-${CLUSTER_NAME}-serverlb" "${PORT}/tcp" &>/dev/null; then
MISSING_PORTS="${MISSING_PORTS} ${PORT}"
fi
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Helm port validation is incomplete.

At Line 67, the check only verifies 3000 and 9000. Helm mode also relies on 9243, 9098, and OTel ports (21893, 22893, 22894), so an existing cluster can pass this check but still be misconfigured.

Suggested fix
-        for PORT in 3000 9000; do
+        for PORT in 3000 9000 9243 9098 21893 22893 22894; do
             if ! docker port "k3d-${CLUSTER_NAME}-serverlb" "${PORT}/tcp" &>/dev/null; then
                 MISSING_PORTS="${MISSING_PORTS} ${PORT}"
             fi
         done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for PORT in 3000 9000; do
if ! docker port "k3d-${CLUSTER_NAME}-serverlb" "${PORT}/tcp" &>/dev/null; then
MISSING_PORTS="${MISSING_PORTS} ${PORT}"
fi
done
for PORT in 3000 9000 9243 9098 21893 22893 22894; do
if ! docker port "k3d-${CLUSTER_NAME}-serverlb" "${PORT}/tcp" &>/dev/null; then
MISSING_PORTS="${MISSING_PORTS} ${PORT}"
fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-k3d.sh` around lines 67 - 71, The port-check loop
only tests 3000 and 9000; expand it to include all Helm-required ports (9243,
9098, 21893, 22893, 22894) so the check properly detects missing mappings:
update the for PORT in ... loop that references CLUSTER_NAME and appends to
MISSING_PORTS (the docker port "k3d-${CLUSTER_NAME}-serverlb" "${PORT}/tcp"
call) to iterate over the full set of ports and keep the same MISSING_PORTS
behavior when a port is absent.

Comment on lines 221 to +224
dev-migrate:
@echo "🗄️ Running database migrations..."
@echo "Running database migrations..."
@docker exec agent-manager-service sh -c "cd /go/src && make dev-migrate"
@echo "✅ Migrations completed"
@echo "Migrations completed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

dev-migrate is not mode-aware and fails in Helm mode.

Line 223 always runs docker exec agent-manager-service ..., which won’t exist when DEV_MODE=helm.

Suggested fix (explicit mode gate)
 dev-migrate:
+ifeq ($(DEV_MODE),helm)
+	`@echo` "dev-migrate is currently Compose-only. Use Helm-native migration flow."
+	`@exit` 1
+else
 	`@echo` "Running database migrations..."
 	`@docker` exec agent-manager-service sh -c "cd /go/src && make dev-migrate"
 	`@echo` "Migrations completed"
+endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 221 - 224, The dev-migrate Makefile target
unconditionally runs the docker command which fails in Helm mode; update the
dev-migrate target (symbol: dev-migrate) to check the DEV_MODE variable and run
the docker exec "docker exec agent-manager-service sh -c 'cd /go/src && make
dev-migrate'" only when DEV_MODE is not "helm" (or when DEV_MODE equals
"docker"/local), otherwise run the Helm-appropriate migration command or no-op
for Helm mode; ensure the conditional uses the existing DEV_MODE variable and
preserves the echo messages around the chosen branch.

Comment on lines +260 to +265
helm-sync: helm-build
@echo "Restarting AMP deployments..."
@kubectl rollout restart deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) 2>/dev/null || true
@echo "Waiting for rollout..."
@kubectl rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
@echo "Sync complete."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rollout failures are currently hidden.

Lines 264 and 289 append || true, so failed rollouts can still print “Sync complete.” / “Restart complete.”

Suggested fix
 	`@echo` "Waiting for rollout..."
-	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
+	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s
 	`@echo` "Sync complete."
...
-	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
+	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s
 	`@echo` "Restart complete."

Also applies to: 286-290

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 260 - 265, The Makefile silences failures for the
kubectl rollout restart/status commands in the helm-sync target (and the similar
target that prints “Restart complete.”), so failed rollouts still show “Sync
complete.”; remove the trailing "|| true" from the kubectl commands (kubectl
rollout restart and kubectl rollout status) in the helm-sync rule and the other
restart rule so failures propagate and the Make target fails, or alternatively
replace "|| true" with explicit error-handling that logs the failure and exits
non-zero; update the commands referenced (kubectl rollout restart deployment -n
$(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) and kubectl rollout status
deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s)
accordingly.

a5anka added 2 commits March 4, 2026 15:54
Enable all AMP services to run inside the same k3d cluster as OpenChoreo,
eliminating the need for Docker Compose network bridging and
host.docker.internal hacks. Docker Compose remains the default mode.

New files:
- deployments/dev-cluster-config.yaml: k3d config with AMP port mappings
- deployments/scripts/build-and-import.sh: build images and import into k3d
- deployments/scripts/helm-deploy-amp.sh: helm install/upgrade/uninstall

Modified files:
- Makefile: DEV_MODE dispatch + helm-* targets (sync, build, logs, status)
- deployments/scripts/env.sh: AMP_NAMESPACE, AMP_RELEASE_NAME, AMP_IMAGE_TAG
- deployments/scripts/setup-k3d.sh: use dev config when DEV_MODE=helm
- deployments/values/values-local.yaml: local image repos + in-cluster URLs
- CLAUDE.md: document Helm mode commands
@a5anka a5anka force-pushed the feat/unified-k3d-helm-mode branch from 6b0636b to d9e1cca Compare March 4, 2026 10:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
Makefile (3)

264-269: ⚠️ Potential issue | 🟠 Major

Rollout failures are hidden by || true.

Lines 266 and 268 suppress failures, so a failed rollout will still print "Sync complete." This masks deployment issues that developers need to see.

Suggested fix
 helm-sync: helm-build
 	`@echo` "Restarting AMP deployments..."
-	`@kubectl` rollout restart deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) 2>/dev/null || true
+	`@kubectl` rollout restart deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT)
 	`@echo` "Waiting for rollout..."
-	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
+	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s
 	`@echo` "Sync complete."

Note: helm-sync-api and helm-sync-console correctly propagate errors without || true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 264 - 269, The helm-sync target currently silences
kubectl errors by appending "|| true" to the kubectl rollout restart and kubectl
rollout status commands, so failures are masked; update the helm-sync recipe to
remove "|| true" (leave stderr redirection 2>/dev/null only if you want to hide
noise, but do not swallow non-zero exit codes) so that kubectl rollout restart
and kubectl rollout status will propagate errors (identify the helm-sync target
and the two kubectl commands: "kubectl rollout restart deployment -n
$(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT)" and "kubectl rollout status
deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s") and
ensure the Makefile target fails when those commands fail.

289-294: ⚠️ Potential issue | 🟠 Major

Rollout status failure is hidden in helm-restart.

Line 293 uses || true, so a failed rollout will still show "Restart complete."

Suggested fix
 helm-restart:
 	`@echo` "Restarting all AMP deployments..."
 	`@kubectl` rollout restart deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT)
-	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
+	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s
 	`@echo` "Restart complete."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 289 - 294, The helm-restart target masks rollout
failures by appending "|| true" to the kubectl rollout status command; replace
that masking with explicit error handling so failures surface and the Makefile
exits non-zero. Modify the helm-restart recipe (target name: helm-restart) to
run kubectl rollout status -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT)
--timeout=300s and, if it fails, print an error to stderr (e.g., "Rollout status
check failed for namespace $(AMP_NAMESPACE)") and exit 1 (for example: if !
kubectl rollout status ...; then echo ... >&2; exit 1; fi) instead of using "||
true".

225-228: ⚠️ Potential issue | 🟠 Major

dev-migrate is not mode-aware and will fail in Helm mode.

This target unconditionally runs docker exec agent-manager-service, which doesn't exist when DEV_MODE=helm. The Helm deployment runs the service in a pod, not a Docker container.

Suggested fix
 dev-migrate:
+ifeq ($(DEV_MODE),helm)
+	`@echo` "Running database migrations (Helm mode)..."
+	`@kubectl` exec -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) \
+		$$(kubectl get pod -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) -l "app.kubernetes.io/component=agent-manager-service" -o jsonpath='{.items[0].metadata.name}') \
+		-- sh -c "cd /go/src && make dev-migrate"
+else
 	`@echo` "Running database migrations..."
 	`@docker` exec agent-manager-service sh -c "cd /go/src && make dev-migrate"
+endif
 	`@echo` "Migrations completed"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 225 - 228, The dev-migrate Makefile target
unconditionally uses docker exec against agent-manager-service and fails in Helm
mode; update the dev-migrate target to check DEV_MODE and branch: when
DEV_MODE=helm, run the migration inside the running Kubernetes pod (use kubectl
exec against the agent-manager-service pod discovered by label/selector) and
when not helm, keep the existing docker exec agent-manager-service invocation;
modify the dev-migrate target to perform this conditional logic and ensure it
still runs the same command string ("cd /go/src && make dev-migrate") inside the
selected runtime.
🧹 Nitpick comments (1)
Makefile (1)

7-11: Cluster name is hardcoded in multiple places; consider consolidating.

CLUSTER_NAME is defined at line 397, but k3d commands at lines 252, 274, and 283 use the hardcoded string openchoreo-local-v0.14.0. If the version changes, you'll need to update multiple locations.

♻️ Suggested consolidation

Move CLUSTER_NAME near the other AMP variables and use it consistently:

 # AMP variables (keep in sync with deployments/scripts/env.sh)
 AMP_NAMESPACE := wso2-amp
 AMP_RELEASE_NAME := amp
 AMP_IMAGE_TAG := 0.0.0-dev
+CLUSTER_NAME := openchoreo-local-v0.14.0
 CLUSTER_CONTEXT := k3d-$(CLUSTER_NAME)

Then update the k3d commands to use the variable:

-	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG) amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG) amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)

And remove the duplicate definition at line 397.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 7 - 11, Hardcoded cluster name strings
"openchoreo-local-v0.14.0" are used in several k3d commands while CLUSTER_NAME
is defined elsewhere; move a single CLUSTER_NAME definition next to the AMP
variables (near AMP_NAMESPACE/AMP_RELEASE_NAME/AMP_IMAGE_TAG/CLUSTER_CONTEXT),
set it to the current cluster identifier, update all k3d commands that use the
literal "openchoreo-local-v0.14.0" to reference $(CLUSTER_NAME) instead, and
remove the duplicate CLUSTER_NAME definition later in the file so there is one
canonical variable used everywhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@Makefile`:
- Around line 264-269: The helm-sync target currently silences kubectl errors by
appending "|| true" to the kubectl rollout restart and kubectl rollout status
commands, so failures are masked; update the helm-sync recipe to remove "||
true" (leave stderr redirection 2>/dev/null only if you want to hide noise, but
do not swallow non-zero exit codes) so that kubectl rollout restart and kubectl
rollout status will propagate errors (identify the helm-sync target and the two
kubectl commands: "kubectl rollout restart deployment -n $(AMP_NAMESPACE)
--context $(CLUSTER_CONTEXT)" and "kubectl rollout status deployment -n
$(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s") and ensure the
Makefile target fails when those commands fail.
- Around line 289-294: The helm-restart target masks rollout failures by
appending "|| true" to the kubectl rollout status command; replace that masking
with explicit error handling so failures surface and the Makefile exits
non-zero. Modify the helm-restart recipe (target name: helm-restart) to run
kubectl rollout status -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT)
--timeout=300s and, if it fails, print an error to stderr (e.g., "Rollout status
check failed for namespace $(AMP_NAMESPACE)") and exit 1 (for example: if !
kubectl rollout status ...; then echo ... >&2; exit 1; fi) instead of using "||
true".
- Around line 225-228: The dev-migrate Makefile target unconditionally uses
docker exec against agent-manager-service and fails in Helm mode; update the
dev-migrate target to check DEV_MODE and branch: when DEV_MODE=helm, run the
migration inside the running Kubernetes pod (use kubectl exec against the
agent-manager-service pod discovered by label/selector) and when not helm, keep
the existing docker exec agent-manager-service invocation; modify the
dev-migrate target to perform this conditional logic and ensure it still runs
the same command string ("cd /go/src && make dev-migrate") inside the selected
runtime.

---

Nitpick comments:
In `@Makefile`:
- Around line 7-11: Hardcoded cluster name strings "openchoreo-local-v0.14.0"
are used in several k3d commands while CLUSTER_NAME is defined elsewhere; move a
single CLUSTER_NAME definition next to the AMP variables (near
AMP_NAMESPACE/AMP_RELEASE_NAME/AMP_IMAGE_TAG/CLUSTER_CONTEXT), set it to the
current cluster identifier, update all k3d commands that use the literal
"openchoreo-local-v0.14.0" to reference $(CLUSTER_NAME) instead, and remove the
duplicate CLUSTER_NAME definition later in the file so there is one canonical
variable used everywhere.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9795ec20-53aa-4fac-8366-c0784e95427a

📥 Commits

Reviewing files that changed from the base of the PR and between 130917c and 8e377c7.

📒 Files selected for processing (1)
  • Makefile

Use server-side apply with --force-conflicts on all kubectl apply calls
to prevent field ownership conflicts with Helm on re-runs. Add a wait
for the data plane agent pod before reading its logs.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
deployments/scripts/build-and-import.sh (1)

15-25: ⚠️ Potential issue | 🔴 Critical

Add a Bash 4+ guard before associative arrays.

declare -A will break on default macOS /bin/bash (3.2), so this script can fail before doing any work.

Suggested fix
 source "$SCRIPT_DIR/env.sh"
 
+# Require Bash 4+ for associative arrays.
+if [ "${BASH_VERSINFO[0]}" -lt 4 ]; then
+    echo "❌ This script requires Bash 4+ (found ${BASH_VERSINFO[0]}.${BASH_VERSINFO[1]})."
+    echo "   Run with a newer bash (e.g., Homebrew bash) and retry."
+    exit 1
+fi
+
 # Components and their build contexts (relative to ROOT_DIR)
 declare -A COMPONENT_CONTEXT=(

Verification:

#!/bin/bash
set -euo pipefail

echo "Shebang:"
sed -n '1p' deployments/scripts/build-and-import.sh
echo

echo "Associative-array usage:"
grep -n 'declare -A' deployments/scripts/build-and-import.sh
echo

echo "Current /bin/bash version:"
/bin/bash --version | head -n1
echo

echo "Associative-array support test in /bin/bash:"
if /bin/bash -lc 'declare -A t=([k]=v); echo ok' >/dev/null 2>&1; then
  echo "PASS: /bin/bash supports declare -A"
else
  echo "FAIL: /bin/bash does not support declare -A"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/build-and-import.sh` around lines 15 - 25, Add a Bash 4+
guard before any use of associative arrays (the declare -A lines, e.g.,
COMPONENT_CONTEXT and COMPONENT_IMAGE) so the script fails early on macOS
default bash 3.2; implement a check right after the shebang that either verifies
BASH_VERSION major >= 4 or runs a quick test like attempting declare -A in a
subshell, and if it fails emit a clear error instructing to run the script with
a Bash 4+ interpreter (or change the shebang to /usr/bin/env bash) and exit
nonzero.
Makefile (2)

264-269: ⚠️ Potential issue | 🟠 Major

Don’t swallow rollout failures in Helm sync/restart.

|| true here can print success even when rollout failed.

Suggested fix
 	`@echo` "Restarting AMP deployments..."
-	`@kubectl` rollout restart deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) 2>/dev/null || true
+	`@kubectl` rollout restart deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT)
 	`@echo` "Waiting for rollout..."
-	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
+	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s
 	`@echo` "Sync complete."
@@
-	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s 2>/dev/null || true
+	`@kubectl` rollout status deployment -n $(AMP_NAMESPACE) --context $(CLUSTER_CONTEXT) --timeout=300s
 	`@echo` "Restart complete."

Also applies to: 293-294

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 264 - 269, The helm-sync Makefile target is swallowing
kubectl rollout failures by appending "|| true" to the kubectl rollout restart
and kubectl rollout status commands; remove the "|| true" (or replace it with an
explicit failure handler like "|| { echo 'rollout failed'; exit 1; }") so
non-zero exit codes propagate and make fails on rollout errors, and apply the
same change to the other occurrences noted (lines 293-294) to ensure both
kubectl rollout restart and kubectl rollout status in the Makefile surface
failures instead of printing success when rollout fails.

225-228: ⚠️ Potential issue | 🟠 Major

Make dev-migrate mode-aware.

In Helm mode there is no agent-manager-service Compose container, so this target fails.

Suggested fix
 dev-migrate:
+ifeq ($(DEV_MODE),helm)
+	`@echo` "dev-migrate is Compose-only right now. Use Helm-native migration flow."
+	`@exit` 1
+else
 	`@echo` "Running database migrations..."
 	`@docker` exec agent-manager-service sh -c "cd /go/src && make dev-migrate"
 	`@echo` "Migrations completed"
+endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 225 - 228, The dev-migrate Makefile target assumes a
docker Compose container named agent-manager-service exists and fails in Helm
mode; update the dev-migrate target to be mode-aware by checking a mode
indicator (e.g., an env var like HELM_MODE or K8S_MODE) or detecting the
agent-manager-service container, and branch: if not Helm/Compose, keep the
existing docker exec into agent-manager-service; if Helm/K8S, run the equivalent
migration via kubectl exec into the running pod or invoke a new target (e.g.,
helm-migrate) that runs the migration in the Helm deployment. Ensure references
to the agent-manager-service container name and the new helm alternative are
used so reviewers can find the change.
🧹 Nitpick comments (3)
deployments/scripts/setup-openchoreo.sh (1)

140-144: Keep log collection aligned with the label fallback.

Readiness checks both app=cluster-agent-dataplane and app=cluster-agent, but logs are fetched only for app=cluster-agent, which can miss logs when only the first label exists.

Suggested fix
-kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
+kubectl logs -n openchoreo-data-plane -l app=cluster-agent-dataplane --tail=10 2>/dev/null || \
+kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` around lines 140 - 144, The
readiness check uses two label selectors (kubectl wait ... -l
app=cluster-agent-dataplane and ... -l app=cluster-agent) but kubectl logs only
fetches logs for -l app=cluster-agent, so when only cluster-agent-dataplane pods
exist you miss logs; update the log collection to try the same label fallback as
the wait (attempt kubectl logs -l app=cluster-agent-dataplane || kubectl logs -l
app=cluster-agent || true) so logs are fetched for whichever label succeeded,
keeping namespace (-n openchoreo-data-plane) and tail behavior consistent with
the readiness check.
Makefile (1)

252-252: Use $(CLUSTER_NAME) instead of hardcoded cluster literals.

Hardcoding the versioned cluster name in multiple targets is drift-prone.

Suggested fix
-	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG) amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG) amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)
@@
-	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-api:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)
@@
-	`@k3d` image import amp-console:$(AMP_IMAGE_TAG) -c openchoreo-local-v0.14.0
+	`@k3d` image import amp-console:$(AMP_IMAGE_TAG) -c $(CLUSTER_NAME)

Also applies to: 274-274, 283-283

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 252, Replace the hardcoded cluster literal
"openchoreo-local-v0.14.0" used in k3d image import invocations with the
Makefile variable $(CLUSTER_NAME); update every occurrence (including the
instances at the other k3d image import targets) to use $(CLUSTER_NAME) so the
image import commands (the lines invoking "k3d image import
amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG)
amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c
...") reference the cluster variable instead of the versioned literal, and
ensure CLUSTER_NAME is defined elsewhere in the Makefile if not already.
deployments/values/values-local.yaml (1)

10-10: Avoid repeating the image tag in three locations.

This is easy to desync from the build/import tag source and can make “sync” workflows appear successful while running stale images.

One-file consolidation option
+# Shared local tag
+x-amp-image-tag: &ampImageTag "0.0.0-dev"
+
 agentManagerService:
   image:
     repository: amp-api
-    tag: "0.0.0-dev"
+    tag: *ampImageTag
@@
 console:
   image:
     repository: amp-console
-    tag: "0.0.0-dev"
+    tag: *ampImageTag
@@
 dbMigration:
   image:
     repository: amp-api
-    tag: "0.0.0-dev"
+    tag: *ampImageTag

Also applies to: 39-39, 51-51

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/values/values-local.yaml` at line 10, Multiple occurrences of the
image tag (the YAML key "tag" set to "0.0.0-dev") are repeated across this
values file; replace the repeated literal with a single authoritative value and
reference it elsewhere (for example, consolidate the tag under one top-level key
like image.tag and update other places to use that single value or a YAML
anchor/alias) so updates only need to be made in one place and cannot desync;
locate the repeated "tag" entries and change them to reference the consolidated
symbol instead of duplicating the literal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/scripts/setup-openchoreo.sh`:
- Line 254: The kubectl apply line bypasses PROJECT_ROOT and leaves the path
unquoted; update the command that runs kubectl (the line using kubectl apply
--server-side --force-conflicts -f
$1/deployments/values/oc-collector-configmap.yaml -n
openchoreo-observability-plane) to use the PROJECT_ROOT variable and quote the
path (e.g., "$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml") so
path expansion is robust and safe.

In `@Makefile`:
- Around line 333-336: The openchoreo lifecycle targets (e.g., the openchoreo-up
target and the other lifecycle targets around it) are still referencing
kind-specific names/contexts; update these targets to use the k3d-centric names
and variables used elsewhere in the Makefile (for example replace any hardcoded
kind-openchoreo-local context/container checks with the k3d cluster name
variable (e.g., $(K3D_CLUSTER_NAME)) and the actual k3d container names like
openchoreo-local-control-plane/openchoreo-local-worker), and switch any kind
commands/inspections to their k3d equivalents so the start/inspect/wait logic
consistently uses the k3d flow across openchoreo-up and the related lifecycle
targets.

---

Duplicate comments:
In `@deployments/scripts/build-and-import.sh`:
- Around line 15-25: Add a Bash 4+ guard before any use of associative arrays
(the declare -A lines, e.g., COMPONENT_CONTEXT and COMPONENT_IMAGE) so the
script fails early on macOS default bash 3.2; implement a check right after the
shebang that either verifies BASH_VERSION major >= 4 or runs a quick test like
attempting declare -A in a subshell, and if it fails emit a clear error
instructing to run the script with a Bash 4+ interpreter (or change the shebang
to /usr/bin/env bash) and exit nonzero.

In `@Makefile`:
- Around line 264-269: The helm-sync Makefile target is swallowing kubectl
rollout failures by appending "|| true" to the kubectl rollout restart and
kubectl rollout status commands; remove the "|| true" (or replace it with an
explicit failure handler like "|| { echo 'rollout failed'; exit 1; }") so
non-zero exit codes propagate and make fails on rollout errors, and apply the
same change to the other occurrences noted (lines 293-294) to ensure both
kubectl rollout restart and kubectl rollout status in the Makefile surface
failures instead of printing success when rollout fails.
- Around line 225-228: The dev-migrate Makefile target assumes a docker Compose
container named agent-manager-service exists and fails in Helm mode; update the
dev-migrate target to be mode-aware by checking a mode indicator (e.g., an env
var like HELM_MODE or K8S_MODE) or detecting the agent-manager-service
container, and branch: if not Helm/Compose, keep the existing docker exec into
agent-manager-service; if Helm/K8S, run the equivalent migration via kubectl
exec into the running pod or invoke a new target (e.g., helm-migrate) that runs
the migration in the Helm deployment. Ensure references to the
agent-manager-service container name and the new helm alternative are used so
reviewers can find the change.

---

Nitpick comments:
In `@deployments/scripts/setup-openchoreo.sh`:
- Around line 140-144: The readiness check uses two label selectors (kubectl
wait ... -l app=cluster-agent-dataplane and ... -l app=cluster-agent) but
kubectl logs only fetches logs for -l app=cluster-agent, so when only
cluster-agent-dataplane pods exist you miss logs; update the log collection to
try the same label fallback as the wait (attempt kubectl logs -l
app=cluster-agent-dataplane || kubectl logs -l app=cluster-agent || true) so
logs are fetched for whichever label succeeded, keeping namespace (-n
openchoreo-data-plane) and tail behavior consistent with the readiness check.

In `@deployments/values/values-local.yaml`:
- Line 10: Multiple occurrences of the image tag (the YAML key "tag" set to
"0.0.0-dev") are repeated across this values file; replace the repeated literal
with a single authoritative value and reference it elsewhere (for example,
consolidate the tag under one top-level key like image.tag and update other
places to use that single value or a YAML anchor/alias) so updates only need to
be made in one place and cannot desync; locate the repeated "tag" entries and
change them to reference the consolidated symbol instead of duplicating the
literal.

In `@Makefile`:
- Line 252: Replace the hardcoded cluster literal "openchoreo-local-v0.14.0"
used in k3d image import invocations with the Makefile variable $(CLUSTER_NAME);
update every occurrence (including the instances at the other k3d image import
targets) to use $(CLUSTER_NAME) so the image import commands (the lines invoking
"k3d image import amp-api:$(AMP_IMAGE_TAG) amp-console:$(AMP_IMAGE_TAG)
amp-traces-observer:$(AMP_IMAGE_TAG) amp-evaluation-job:$(AMP_IMAGE_TAG) -c
...") reference the cluster variable instead of the versioned literal, and
ensure CLUSTER_NAME is defined elsewhere in the Makefile if not already.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f3bc412-1823-4289-9751-3fa0b6f196d6

📥 Commits

Reviewing files that changed from the base of the PR and between 8e377c7 and 0aed56d.

📒 Files selected for processing (8)
  • Makefile
  • deployments/dev-cluster-config.yaml
  • deployments/scripts/build-and-import.sh
  • deployments/scripts/env.sh
  • deployments/scripts/helm-deploy-amp.sh
  • deployments/scripts/setup-k3d.sh
  • deployments/scripts/setup-openchoreo.sh
  • deployments/values/values-local.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • deployments/scripts/helm-deploy-amp.sh
  • deployments/scripts/setup-k3d.sh

kubectl create namespace openchoreo-observability-plane --dry-run=client -o yaml | kubectl apply -f -

kubectl apply -f $1/deployments/values/oc-collector-configmap.yaml -n openchoreo-observability-plane
kubectl apply --server-side --force-conflicts -f $1/deployments/values/oc-collector-configmap.yaml -n openchoreo-observability-plane
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use PROJECT_ROOT and quote the path.

This line bypasses the existing PROJECT_ROOT variable and leaves path expansion unquoted.

Suggested fix
-    kubectl apply --server-side --force-conflicts -f $1/deployments/values/oc-collector-configmap.yaml -n openchoreo-observability-plane
+    kubectl apply --server-side --force-conflicts -f "$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml" -n openchoreo-observability-plane
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl apply --server-side --force-conflicts -f $1/deployments/values/oc-collector-configmap.yaml -n openchoreo-observability-plane
kubectl apply --server-side --force-conflicts -f "$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml" -n openchoreo-observability-plane
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` at line 254, The kubectl apply line
bypasses PROJECT_ROOT and leaves the path unquoted; update the command that runs
kubectl (the line using kubectl apply --server-side --force-conflicts -f
$1/deployments/values/oc-collector-configmap.yaml -n
openchoreo-observability-plane) to use the PROJECT_ROOT variable and quote the
path (e.g., "$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml") so
path expansion is robust and safe.

Comment on lines 333 to +336
openchoreo-up:
@echo "🚀 Starting OpenChoreo cluster..."
@docker start openchoreo-local-control-plane openchoreo-local-worker 2>/dev/null || (echo "⚠️ Cluster not found. Run 'make setup-k3d setup-openchoreo' first." && exit 1)
@echo "Waiting for nodes to be ready..."
@echo "Starting OpenChoreo cluster..."
@docker start openchoreo-local-control-plane openchoreo-local-worker 2>/dev/null || (echo "Cluster not found. Run 'make setup-k3d setup-openchoreo' first." && exit 1)
@echo "Waiting for nodes to be ready..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OpenChoreo lifecycle targets are still wired to kind, not k3d.

These commands start/inspect kind-openchoreo-local containers/context while the rest of this Makefile is using k3d variables; that inconsistency will break lifecycle commands in the new flow.

Suggested direction
 openchoreo-up:
 	`@echo` "Starting OpenChoreo cluster..."
-	`@docker` start openchoreo-local-control-plane openchoreo-local-worker 2>/dev/null || (echo "Cluster not found. Run 'make setup-k3d setup-openchoreo' first." && exit 1)
+	`@k3d` cluster start $(CLUSTER_NAME) 2>/dev/null || (echo "Cluster not found. Run 'make setup-k3d setup-openchoreo' first." && exit 1)
@@
-		kubectl get nodes --context kind-openchoreo-local >/dev/null 2>&1 && \
-		kubectl wait --for=condition=Ready nodes --all --timeout=10s --context kind-openchoreo-local >/dev/null 2>&1 && break || sleep 10; \
+		kubectl get nodes --context $(CLUSTER_CONTEXT) >/dev/null 2>&1 && \
+		kubectl wait --for=condition=Ready nodes --all --timeout=10s --context $(CLUSTER_CONTEXT) >/dev/null 2>&1 && break || sleep 10; \
 	done

Also applies to: 355-357, 359-365

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 333 - 336, The openchoreo lifecycle targets (e.g., the
openchoreo-up target and the other lifecycle targets around it) are still
referencing kind-specific names/contexts; update these targets to use the
k3d-centric names and variables used elsewhere in the Makefile (for example
replace any hardcoded kind-openchoreo-local context/container checks with the
k3d cluster name variable (e.g., $(K3D_CLUSTER_NAME)) and the actual k3d
container names like openchoreo-local-control-plane/openchoreo-local-worker),
and switch any kind commands/inspections to their k3d equivalents so the
start/inspect/wait logic consistently uses the k3d flow across openchoreo-up and
the related lifecycle targets.

a5anka added 5 commits March 4, 2026 16:24
The build plane cluster-agent pod mounts cluster-gateway-ca but nothing
creates it in the openchoreo-build-plane namespace. Copy it from the
control plane namespace before installing the build plane chart.
On re-runs, helm upgrade resets cluster-gateway-ca to a placeholder.
The extractor job replaces it with the real cert, but the build plane
copy job can race ahead and copy the placeholder. Fix by polling
until the CA has a real certificate before proceeding.
Add waits for copy-ca jobs and agent pods in build plane and
observability plane verification sections. Prevents failures from
reading logs of pods that are still in ContainerCreating state.
The CA extractor job uses kubectl apply (client-side) to write the real
certificate, which claims field ownership from Helm. On subsequent helm
upgrade calls, this causes a conflict. Fix by removing the
kubectl-client-side-apply field manager before running helm upgrade.
helm upgrade resets the CA ConfigMap to a placeholder but doesn't
recreate completed jobs. The extractor and copy-ca jobs need to run
again to populate the real certificate. Delete them before each helm
install/upgrade so they get recreated.
@a5anka a5anka marked this pull request as draft March 4, 2026 11:13
@a5anka a5anka marked this pull request as draft March 4, 2026 11:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
deployments/scripts/setup-openchoreo.sh (1)

300-300: ⚠️ Potential issue | 🟡 Minor

Reuse PROJECT_ROOT and quote the path at Line 300.

This still uses $1 directly and unquoted in the kubectl apply path. Use "$PROJECT_ROOT/..." for consistent and safe path handling.

Suggested fix
-    kubectl apply --server-side --force-conflicts -f $1/deployments/values/oc-collector-configmap.yaml -n openchoreo-observability-plane
+    kubectl apply --server-side --force-conflicts -f "$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml" -n openchoreo-observability-plane
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` at line 300, The kubectl apply
invocation uses an unquoted positional parameter ($1) which is unsafe; change it
to use the PROJECT_ROOT variable and quote the full path so spaces/special chars
are handled. Replace the path fragment
`$1/deployments/values/oc-collector-configmap.yaml` with
`"$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml"` in the kubectl
apply command (the line invoking kubectl apply --server-side --force-conflicts
-f ... -n openchoreo-observability-plane) and ensure PROJECT_ROOT is exported or
set earlier in the script.
🧹 Nitpick comments (1)
deployments/scripts/setup-openchoreo.sh (1)

237-245: Use the matched copy-ca job name for kubectl wait instead of a broad label.

Current logic detects copy-ca by name but waits on job -l app=cluster-agent, which may target unrelated jobs or none.

Suggested fix
-if kubectl get jobs -n openchoreo-build-plane --no-headers 2>/dev/null | grep -q copy-ca; then
-    kubectl wait -n openchoreo-build-plane --for=condition=complete --timeout=120s job -l app=cluster-agent 2>/dev/null || true
+COPY_CA_JOB=$(kubectl get jobs -n openchoreo-build-plane -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' 2>/dev/null | grep 'copy-ca' | head -n1)
+if [ -n "$COPY_CA_JOB" ]; then
+    kubectl wait -n openchoreo-build-plane --for=condition=complete --timeout=120s "job/${COPY_CA_JOB}" 2>/dev/null || true
 fi

Apply the same pattern in the observability-plane block.

Also applies to: 369-377

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` around lines 237 - 245, The script
detects the copy-ca job by name but then waits on jobs selected by label, which
can miss or target the wrong job; change the kubectl wait that currently uses
"job -l app=cluster-agent" to wait specifically on the detected job name (e.g.,
use "job/copy-ca" in the --for=condition=complete call) in the
openchoreo-build-plane block (referencing copy-ca, app=cluster-agent,
openchoreo-build-plane) and make the same replacement in the observability-plane
block so both use the matched copy-ca job name instead of the broad label
selector.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/scripts/setup-openchoreo.sh`:
- Around line 177-181: The kubectl logs call only targets -l app=cluster-agent
while the preceding kubectl wait falls back between -l
app=cluster-agent-dataplane and -l app=cluster-agent; update the logging step to
mirror that fallback by attempting to fetch logs for -l
app=cluster-agent-dataplane first and then -l app=cluster-agent (or combine
selectors) so logs are retrieved regardless of which label the ready pod uses;
modify the kubectl logs invocation(s) that follow the kubectl wait commands to
match the same selector logic.
- Around line 75-92: The timeout loop that checks the cluster-gateway-ca
ConfigMap currently only prints a warning and continues, leaving deployment in a
partial state; modify the script in the for-loop that reads CA_CONTENT (the
cluster-gateway-ca configmap check) so that when the final iteration (when i
reaches the timeout) is reached you log the warning/helpful re-run commands and
then exit with a non-zero status (e.g., exit 1) to fail fast; reference the
ConfigMap name cluster-gateway-ca and the extractor Job
cluster-gateway-ca-extractor when updating the message so the script fails
immediately instead of proceeding on missing certificate.
- Around line 46-60: The script uses jq to parse kubectl JSON when computing
FIELD_INDEX in the block that handles cluster-gateway-ca, but jq is not
validated in the prerequisites and its absence will cause failures; add a
prerequisite check (e.g., test for command -v jq or which jq) alongside the
existing tool checks early in the script so the script exits with a clear error
if jq is missing, and ensure the check runs before the code that computes
FIELD_INDEX and invokes jq.

---

Duplicate comments:
In `@deployments/scripts/setup-openchoreo.sh`:
- Line 300: The kubectl apply invocation uses an unquoted positional parameter
($1) which is unsafe; change it to use the PROJECT_ROOT variable and quote the
full path so spaces/special chars are handled. Replace the path fragment
`$1/deployments/values/oc-collector-configmap.yaml` with
`"$PROJECT_ROOT/deployments/values/oc-collector-configmap.yaml"` in the kubectl
apply command (the line invoking kubectl apply --server-side --force-conflicts
-f ... -n openchoreo-observability-plane) and ensure PROJECT_ROOT is exported or
set earlier in the script.

---

Nitpick comments:
In `@deployments/scripts/setup-openchoreo.sh`:
- Around line 237-245: The script detects the copy-ca job by name but then waits
on jobs selected by label, which can miss or target the wrong job; change the
kubectl wait that currently uses "job -l app=cluster-agent" to wait specifically
on the detected job name (e.g., use "job/copy-ca" in the
--for=condition=complete call) in the openchoreo-build-plane block (referencing
copy-ca, app=cluster-agent, openchoreo-build-plane) and make the same
replacement in the observability-plane block so both use the matched copy-ca job
name instead of the broad label selector.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 54d53737-2dc4-4126-bb73-732a74056661

📥 Commits

Reviewing files that changed from the base of the PR and between 0aed56d and d2a36dc.

📒 Files selected for processing (1)
  • deployments/scripts/setup-openchoreo.sh

Comment on lines +46 to +60
# On re-runs, the CA extractor job uses kubectl apply (client-side) to write the
# real cert into cluster-gateway-ca, which claims field ownership. The next helm
# upgrade then conflicts. Fix by removing the client-side-apply field manager
# before upgrading, so Helm can take ownership cleanly.
if kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane &>/dev/null; then
kubectl annotate configmap cluster-gateway-ca -n openchoreo-control-plane \
kubectl.kubernetes.io/last-applied-configuration- --overwrite 2>/dev/null || true
# Remove the kubectl-client-side-apply managed field entry
FIELD_INDEX=$(kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane \
--show-managed-fields -o json | jq '.metadata.managedFields | to_entries[] | select(.value.manager == "kubectl-client-side-apply") | .key' 2>/dev/null)
if [ -n "$FIELD_INDEX" ]; then
kubectl patch configmap cluster-gateway-ca -n openchoreo-control-plane \
--type=json -p="[{\"op\":\"remove\",\"path\":\"/metadata/managedFields/${FIELD_INDEX}\"}]" 2>/dev/null || true
fi
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

#!/bin/bash
# Verify jq usage exists and prerequisite checks don't include jq.
rg -n '\bjq\b' deployments/scripts/setup-openchoreo.sh
rg -n 'command -v' deployments/scripts/setup-openchoreo.sh

Repository: wso2/agent-manager


Repository: wso2/agent-manager
Exit code: 0

stdout:

60:        --show-managed-fields -o json | jq '.metadata.managedFields | to_entries[] | select(.value.manager == "kubectl-client-side-apply") | .key' 2>/dev/null)
17:if ! command -v helm &> /dev/null; then

Script executed:

head -30 deployments/scripts/setup-openchoreo.sh
sed -n '46,70p' deployments/scripts/setup-openchoreo.sh

Repository: wso2/agent-manager


Repository: wso2/agent-manager
Exit code: 0

stdout:

#!/bin/bash
set -e

# Get the absolute directory of this script
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# Change to script directory to ensure consistent working directory
cd "$SCRIPT_DIR"

source "$SCRIPT_DIR/env.sh"

PROJECT_ROOT="$1"

echo "=== Installing OpenChoreo on k3d ==="

# Check prerequisites
if ! command -v helm &> /dev/null; then
    echo "❌ Helm is not installed. Please install it first:"
    echo "   brew install helm"
    exit 1
fi

if ! kubectl cluster-info --context $CLUSTER_CONTEXT &> /dev/null; then
    echo "❌ K3d cluster '$CLUSTER_CONTEXT' is not running."
    echo "   Run: ./setup-k3d.sh"
    exit 1
fi

echo "🔧 Setting kubectl context to $CLUSTER_CONTEXT..."
kubectl config use-context $CLUSTER_CONTEXT
# On re-runs, the CA extractor job uses kubectl apply (client-side) to write the
# real cert into cluster-gateway-ca, which claims field ownership. The next helm
# upgrade then conflicts. Fix by removing the client-side-apply field manager
# before upgrading, so Helm can take ownership cleanly.
# On re-runs, fix two issues:
# 1. The CA extractor job uses kubectl apply (client-side) to write the real cert,
#    which claims field ownership. Remove the field manager so helm upgrade won't conflict.
# 2. helm upgrade resets the CA ConfigMap to a placeholder, but the extractor job
#    won't re-run because Helm doesn't recreate completed jobs. Delete it so Helm
#    recreates it and it extracts the real cert again.
if kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane &>/dev/null; then
    kubectl annotate configmap cluster-gateway-ca -n openchoreo-control-plane \
        kubectl.kubernetes.io/last-applied-configuration- --overwrite 2>/dev/null || true
    FIELD_INDEX=$(kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane \
        --show-managed-fields -o json | jq '.metadata.managedFields | to_entries[] | select(.value.manager == "kubectl-client-side-apply") | .key' 2>/dev/null)
    if [ -n "$FIELD_INDEX" ]; then
        kubectl patch configmap cluster-gateway-ca -n openchoreo-control-plane \
            --type=json -p="[{\"op\":\"remove\",\"path\":\"/metadata/managedFields/${FIELD_INDEX}\"}]" 2>/dev/null || true
    fi
fi
kubectl delete job cluster-gateway-ca-extractor -n openchoreo-control-plane 2>/dev/null || true

helm upgrade --install openchoreo-control-plane oci://ghcr.io/openchoreo/helm-charts/openchoreo-control-plane \
--version ${OPENCHOREO_PATCH_VERSION} \
--namespace openchoreo-control-plane \

Script executed:

cat deployments/scripts/env.sh

Repository: wso2/agent-manager


Repository: wso2/agent-manager
Exit code: 0

stdout:

# Shared cluster environment variables — sourced by all scripts in this directory.
OPENCHOREO_VERSION="0.14.0"
OPENCHOREO_PATCH_VERSION="0.0.0-b53c6dc3"
CLUSTER_NAME="openchoreo-local-v${OPENCHOREO_VERSION}"
CLUSTER_CONTEXT="k3d-${CLUSTER_NAME}"

# AMP (Agent Management Platform) variables
AMP_NAMESPACE="wso2-amp"
AMP_RELEASE_NAME="amp"
AMP_IMAGE_TAG="0.0.0-dev"

Add a prerequisite check for jq before Line 60 usage.

jq is required at Line 60 to parse kubectl JSON output, but it is not validated in the prerequisite checks. With set -e, missing jq will cause the script to abort mid-execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` around lines 46 - 60, The script
uses jq to parse kubectl JSON when computing FIELD_INDEX in the block that
handles cluster-gateway-ca, but jq is not validated in the prerequisites and its
absence will cause failures; add a prerequisite check (e.g., test for command -v
jq or which jq) alongside the existing tool checks early in the script so the
script exits with a clear error if jq is missing, and ensure the check runs
before the code that computes FIELD_INDEX and invokes jq.

Comment on lines +75 to +92
# Verify the CA extractor has replaced the placeholder with a real certificate.
# The Helm chart deploys a placeholder ConfigMap and a Job that extracts the real
# CA from a TLS secret. On re-runs, helm upgrade resets the ConfigMap to the
# placeholder, so we must wait for the extractor to overwrite it again.
echo "⏳ Waiting for cluster-gateway-ca to contain a real certificate..."
for i in $(seq 1 30); do
CA_CONTENT=$(kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' 2>/dev/null)
if echo "$CA_CONTENT" | grep -q "BEGIN CERTIFICATE"; then
echo "✅ cluster-gateway-ca has a valid certificate"
break
fi
if [ "$i" -eq 30 ]; then
echo "⚠️ Timeout waiting for real CA certificate. The extractor job may need to be re-run:"
echo " kubectl delete job cluster-gateway-ca-extractor -n openchoreo-control-plane"
echo " Then re-run: make setup-openchoreo"
fi
sleep 5
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when CA extraction never produces a real cert.

At Line 87 onward, timeout only logs a warning and continues. That can leave a partially configured environment with misleading “ready” progress.

Suggested fix
-    if [ "$i" -eq 30 ]; then
-        echo "⚠️  Timeout waiting for real CA certificate. The extractor job may need to be re-run:"
-        echo "   kubectl delete job cluster-gateway-ca-extractor -n openchoreo-control-plane"
-        echo "   Then re-run: make setup-openchoreo"
-    fi
+    if [ "$i" -eq 30 ]; then
+        echo "❌ Timeout waiting for real CA certificate."
+        echo "   Re-run:"
+        echo "   kubectl delete job cluster-gateway-ca-extractor -n openchoreo-control-plane"
+        echo "   make setup-openchoreo"
+        exit 1
+    fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Verify the CA extractor has replaced the placeholder with a real certificate.
# The Helm chart deploys a placeholder ConfigMap and a Job that extracts the real
# CA from a TLS secret. On re-runs, helm upgrade resets the ConfigMap to the
# placeholder, so we must wait for the extractor to overwrite it again.
echo "⏳ Waiting for cluster-gateway-ca to contain a real certificate..."
for i in $(seq 1 30); do
CA_CONTENT=$(kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' 2>/dev/null)
if echo "$CA_CONTENT" | grep -q "BEGIN CERTIFICATE"; then
echo "✅ cluster-gateway-ca has a valid certificate"
break
fi
if [ "$i" -eq 30 ]; then
echo "⚠️ Timeout waiting for real CA certificate. The extractor job may need to be re-run:"
echo " kubectl delete job cluster-gateway-ca-extractor -n openchoreo-control-plane"
echo " Then re-run: make setup-openchoreo"
fi
sleep 5
done
# Verify the CA extractor has replaced the placeholder with a real certificate.
# The Helm chart deploys a placeholder ConfigMap and a Job that extracts the real
# CA from a TLS secret. On re-runs, helm upgrade resets the ConfigMap to the
# placeholder, so we must wait for the extractor to overwrite it again.
echo "⏳ Waiting for cluster-gateway-ca to contain a real certificate..."
for i in $(seq 1 30); do
CA_CONTENT=$(kubectl get configmap cluster-gateway-ca -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' 2>/dev/null)
if echo "$CA_CONTENT" | grep -q "BEGIN CERTIFICATE"; then
echo "✅ cluster-gateway-ca has a valid certificate"
break
fi
if [ "$i" -eq 30 ]; then
echo "❌ Timeout waiting for real CA certificate."
echo " Re-run:"
echo " kubectl delete job cluster-gateway-ca-extractor -n openchoreo-control-plane"
echo " make setup-openchoreo"
exit 1
fi
sleep 5
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` around lines 75 - 92, The timeout
loop that checks the cluster-gateway-ca ConfigMap currently only prints a
warning and continues, leaving deployment in a partial state; modify the script
in the for-loop that reads CA_CONTENT (the cluster-gateway-ca configmap check)
so that when the final iteration (when i reaches the timeout) is reached you log
the warning/helpful re-run commands and then exit with a non-zero status (e.g.,
exit 1) to fail fast; reference the ConfigMap name cluster-gateway-ca and the
extractor Job cluster-gateway-ca-extractor when updating the message so the
script fails immediately instead of proceeding on missing certificate.

Comment on lines +177 to +181
echo "⏳ Waiting for Data Plane agent to be ready..."
kubectl wait --for=condition=Ready pod -l app=cluster-agent-dataplane -n openchoreo-data-plane --timeout=120s 2>/dev/null || \
kubectl wait --for=condition=Ready pod -l app=cluster-agent -n openchoreo-data-plane --timeout=120s 2>/dev/null || \
echo "⚠️ Data Plane agent pods may still be starting"
kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align log selector fallback with the wait selector fallback.

Line 178/179 supports both app=cluster-agent-dataplane and app=cluster-agent, but Line 181 logs only app=cluster-agent. You can miss the relevant pod logs when only the dataplane label exists.

Suggested fix
-kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
+kubectl logs -n openchoreo-data-plane -l app=cluster-agent-dataplane --tail=10 2>/dev/null || \
+kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "⏳ Waiting for Data Plane agent to be ready..."
kubectl wait --for=condition=Ready pod -l app=cluster-agent-dataplane -n openchoreo-data-plane --timeout=120s 2>/dev/null || \
kubectl wait --for=condition=Ready pod -l app=cluster-agent -n openchoreo-data-plane --timeout=120s 2>/dev/null || \
echo "⚠️ Data Plane agent pods may still be starting"
kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
echo "⏳ Waiting for Data Plane agent to be ready..."
kubectl wait --for=condition=Ready pod -l app=cluster-agent-dataplane -n openchoreo-data-plane --timeout=120s 2>/dev/null || \
kubectl wait --for=condition=Ready pod -l app=cluster-agent -n openchoreo-data-plane --timeout=120s 2>/dev/null || \
echo "⚠️ Data Plane agent pods may still be starting"
kubectl logs -n openchoreo-data-plane -l app=cluster-agent-dataplane --tail=10 2>/dev/null || \
kubectl logs -n openchoreo-data-plane -l app=cluster-agent --tail=10 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` around lines 177 - 181, The kubectl
logs call only targets -l app=cluster-agent while the preceding kubectl wait
falls back between -l app=cluster-agent-dataplane and -l app=cluster-agent;
update the logging step to mirror that fallback by attempting to fetch logs for
-l app=cluster-agent-dataplane first and then -l app=cluster-agent (or combine
selectors) so logs are retrieved regardless of which label the ready pod uses;
modify the kubectl logs invocation(s) that follow the kubectl wait commands to
match the same selector logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant