Skip to content

Migrate to OC-v1.0.0-rc1#554

Merged
hanzjk merged 20 commits intowso2:mainfrom
hanzjk:oc-rc1
Mar 19, 2026
Merged

Migrate to OC-v1.0.0-rc1#554
hanzjk merged 20 commits intowso2:mainfrom
hanzjk:oc-rc1

Conversation

@hanzjk
Copy link
Contributor

@hanzjk hanzjk commented Mar 17, 2026

Purpose

Describe the problems, issues, or needs driving this feature/fix and include links to related issues in the following format: Resolves issue1, issue2, etc.

resolves #533

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI (email documentation@wso2.com to review all UI text). Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter �N/A� plus brief explanation of why there�s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type �Sent� when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to certification@wso2.com and NOT pasted in this PR. If there is no impact on certification exams, type �N/A� and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

  • Unit tests

    Code coverage information

  • Integration tests

    Details about the test cases and coverage

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Summary by CodeRabbit

  • New Features

    • OpenBao-based secret management and OAuth support for workload-publishing and observer operations.
    • Deterministic workflow run naming for monitored executions.
  • Infrastructure Updates

    • Upgraded OpenChoreo to v1.0.0-rc.1 and replaced Build Plane with Workflow Plane (cluster-scoped resources).
    • Observability, registry, and deployment pipeline flows updated to workflow-run patterns.
  • Documentation

    • Installation and quick-start docs updated for OpenBao, Workflow Plane, and RBAC changes.
  • Chores

    • Helm charts, scripts, and values updated for new plane, secrets, and API changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR migrates OpenChoreo integration to v1.0.0-rc.1: moves many resources from namespace- to cluster-scoped kinds, rewrites client APIs to workflow-run/queries, integrates OpenBao for secrets, updates Helm templates/values, and adjusts install scripts and manifests for the workflow plane and observability changes.

Changes

Cohort / File(s) Summary
Makefiles & Version Pins
Makefile, agent-manager-service/Makefile, deployments/scripts/env.sh
Switched OpenChoreo release references to v1.0.0-rc.1; adjusted dev-migrate invocation to run make in agent-manager-service instead of docker exec.
OpenChoreo Client: Builds & Workflows
agent-manager-service/clients/openchoreosvc/client/builds.go, .../generic-workflows.go, .../constants.go, .../client.go
Replaced component-workflow APIs with workflow-run APIs (Create/List/Get WorkflowRun); moved parameters into nested workflow.parameters.repository.*; added workflow status extraction from conditions; new Name field for CreateWorkflowRun requests.
OpenChoreo Client: Components & Deployments
agent-manager-service/clients/openchoreosvc/client/components.go, .../deployments.go, .../projects.go, .../secret_references.go
Switched ComponentType -> ClusterComponentType; moved repository data to workflow Parameters; env/secret handling now uses SecretKeyRef; deployment code now looks up ComponentRelease and extracts images/endpoints from workload; deployment pipeline refs now nested objects.
Observability Client & Logs/Metrics
agent-manager-service/clients/observabilitysvc/client.go
Replaced many direct Get* endpoints with QueryLogs/QueryMetrics flows; unified LogsQueryRequest/response handling; added structured JSON log parsing, new converters and helper functions for log levels/sort order and metric point types.
Agent Manager Service Changes
agent-manager-service/services/agent_manager.go, agent-manager-service/services/monitor_executor.go
Initial build trigger in CreateAgent is non-fatal (warning on failure); monitor workflow run name is deterministically generated and sent in CreateWorkflowRun request.
Config & Models
agent-manager-service/config/config.go, agent-manager-service/config/config_loader.go, agent-manager-service/models/infra_resources.go
Removed TLSConfig.HTTPPort; loader no longer reads TLS_HTTP_PORT; TargetEnvironmentRef dropped RequiresApproval field.
Helm: Build → Workflow Plane renames
deployments/helm-charts/.../wso2-amp-build-extension/..., deployments/helm-charts/.../_helpers.tpl
Renamed helper includes and template references from build-plane to workflow-plane and updated related image/template includes.
Helm: Component Types & Traits (CRD shape changes)
deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/*, .../component-traits/*
Many ComponentType → ClusterComponentType and Trait → ClusterTrait changes; replaced ad-hoc parameters with openAPIV3Schema environmentConfigs/parameters; updated route generation, trait wiring, and environment config defaults.
Helm: Component Workflows → ClusterWorkflow
deployments/helm-charts/.../component-workflows/*, deployments/helm-charts/.../cluster-workflow-templates/*
Converted ComponentWorkflow → ClusterWorkflow, added workflowPlaneRef, moved systemParameters → parameters.openAPIV3Schema, added oauth/workload-publish secret wiring and workflowrun-name/oauth-secret-name parameters; updated step args and ExternalSecret resources.
Helm: Build/Publish/Registry & Values
deployments/helm-charts/.../values.yaml, deployments/helm-charts/.../publish-image.yaml, .../google-buildpack-build.yaml
Added oauth and apiServer config blocks; switched registry/template references to workflow-plane helpers; updated image builder includes.
Helm: Platform Resources & Authz
deployments/helm-charts/.../project.yaml, .../environment.yaml, .../authz-cluster-role-binding.yaml, .../label-default-namespace-job.yaml
Changed dataPlaneRef kind → ClusterDataPlane; deploymentPipelineRef now an object with name; renamed role-binding and adjusted mapping structure; updated default namespace label key.
Quick-start / Install / Uninstall scripts
deployments/quick-start/install.sh, .../install-helpers.sh, .../uninstall.sh, deployments/scripts/*
Replaced BuildPlane references with Workflow Plane; integrated OpenBao/ClusterSecretStore (Vault) for secrets; adjusted plane registration to cluster-scoped kinds; added RBAC for WSO2 gateway CRDs; added prerequisites checks and port validation.
Deployment scripts & utilities
deployments/scripts/setup-*, deployments/scripts/utils.sh
Added port availability checks and version checks; adapted registration manifests to ClusterDataPlane/ClusterWorkflowPlane; rewired observability ExternalSecret keys and CA handling; increased observability timeouts and chart sources/versions.
Values & Single-cluster changes
deployments/single-cluster/*, deployments/quick-start/*, deployments/k3d-local-config.yaml
Removed some build-plane single-cluster values; disabled gateway TLS in single-cluster values; updated k3d config comments and quick-start to reference workflow-plane and updated OpenChoreo installer version.
OpenBao / OpenSearch / Observability
deployments/helm-charts/wso2-amp-thunder-extension/*, deployments/single-cluster/values-openbao.yaml, docs/install/*, docs/quick-start.md
Added OpenBao helm bootstrap and postStart Vault init script; added observer/workload-publisher client scripts; docs updated to include OpenBao and RBAC steps; observability charts bumped and traces chart added.
Traces Observer: OpenSearch field renames
traces-observer-service/opensearch/queries.go, .../queries_test.go, .../composite_queries_test.go
Removed .keyword suffix from OpenSearch field references across queries and tests (traceId/spanId/parentSpanId and resource UID fields).

Sequence Diagram(s)

sequenceDiagram
    participant WF as WorkflowRun Script
    participant OAuth as OpenBao (Token Endpoint)
    participant APISrv as Kubernetes API Server
    participant Workload as Generated Workload
    participant WR as WorkflowRun (controller)

    rect rgba(200,200,255,0.5)
    WF->>OAuth: POST token (client_id, client_secret)
    OAuth-->>WF: access_token
    end

    rect rgba(200,255,200,0.5)
    WF->>APISrv: POST Workload (Authorization: Bearer)
    alt 409 Conflict
        APISrv-->>WF: 409
        WF->>APISrv: PUT Workload (update)
        APISrv-->>WF: 2xx
    else 2xx
        APISrv-->>WF: 2xx
    end
    end

    rect rgba(255,200,200,0.5)
    WF->>APISrv: PUT annotate WorkflowRun with workloadRef
    APISrv-->>WR: annotation persisted
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Poem

🐰 I hopped from builds to workflow land,
Cluster scopes now at my command.
OpenBao sings a secret tune,
Logs and runs aligned by noon.
Thump—new plane, new hop, upgrade's grand! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description uses the template structure but provides minimal content. While the Purpose section correctly references the linked issue #533, most required sections (Goals, Approach, User stories, Release note, Documentation, Certification, etc.) lack substantive details and retain placeholder text. Expand the description with concrete details: articulate the key migration goals (API changes, new CRD kinds, resource restructuring), explain the implementation approach with examples, specify acceptance criteria, address documentation/migration/testing aspects, and confirm security checks.
Linked Issues check ❓ Inconclusive The linked issue #533 contains only a title ('Move Agent Manager to OC 1.0-RC1') with no detailed acceptance criteria or requirements. The PR implements comprehensive changes aligning with an OC 1.0-RC1 migration, but the issue provides minimal guidance for validation. The issue lacks detailed requirements; however, the PR changes (API updates, CRD restructuring, new features) demonstrate migration work. Request clarification on specific acceptance criteria, testing scope, and migration validation procedures from the linked issue to confirm completeness.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Migrate to OC-v1.0.0-rc1' directly and clearly describes the main change in the changeset: migrating the codebase to OpenChoreo version 1.0.0-rc1. This is specific, concise, and matches the primary objective.
Out of Scope Changes check ✅ Passed The changeset is focused on migrating Agent Manager to OC v1.0.0-rc1, affecting OpenChoreo API client code, Helm charts, configuration, and deployment scripts. Changes align with a migration objective (e.g., ComponentType→ClusterComponentType, API endpoint updates, secret management overhaul). No unrelated feature development or refactoring outside the migration scope is evident.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanzjk hanzjk marked this pull request as ready for review March 18, 2026 23:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
deployments/k3d-local-config.yaml (1)

8-9: ⚠️ Potential issue | 🟠 Major

Explicitly pin kubeAPI.host to restrict API server binding.

When kubeAPI.host is omitted from the k3d config, k3d defaults to binding the Kubernetes API on 0.0.0.0 (all interfaces) rather than localhost. Add an explicit host value to restrict access:

Suggested fix
 kubeAPI:
+  host: 127.0.0.1
   hostPort: "6550"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/k3d-local-config.yaml` around lines 8 - 9, kubeAPI currently only
sets hostPort, leaving kubeAPI.host unspecified which causes k3d to bind the API
server to 0.0.0.0; update the k3d config to explicitly set kubeAPI.host (e.g.,
"127.0.0.1" or "localhost") alongside kubeAPI.hostPort to restrict the API
server binding to localhost and prevent exposure on all interfaces.
agent-manager-service/Makefile (1)

59-70: ⚠️ Potential issue | 🟠 Major

Fail fast on spec download errors before codegen.

Lines 59, 69, and 79 use curl -sL without --fail, which can succeed on HTTP error responses and pass bad payloads into oapi-codegen. Add --fail (and ideally retries) to avoid brittle generation failures.

Suggested hardening
-	`@curl` -sL "https://raw.githubusercontent.com/openchoreo/openchoreo/v1.0.0-rc.1/openapi/openchoreo-api.yaml" -o /tmp/openchoreo-api.yaml
+	`@curl` -fsSL --retry 3 --retry-connrefused --retry-delay 2 "https://raw.githubusercontent.com/openchoreo/openchoreo/v1.0.0-rc.1/openapi/openchoreo-api.yaml" -o /tmp/openchoreo-api.yaml
-	`@curl` -sL "https://raw.githubusercontent.com/openchoreo/openchoreo/v1.0.0-rc.1/openapi/observer-api.yaml" -o /tmp/observer-api.yaml
+	`@curl` -fsSL --retry 3 --retry-connrefused --retry-delay 2 "https://raw.githubusercontent.com/openchoreo/openchoreo/v1.0.0-rc.1/openapi/observer-api.yaml" -o /tmp/observer-api.yaml
-	`@curl` -sL "https://raw.githubusercontent.com/wso2/api-platform/refs/heads/main/platform-api/src/resources/openapi.yaml" -o /tmp/gw-api.yaml
+	`@curl` -fsSL --retry 3 --retry-connrefused --retry-delay 2 "https://raw.githubusercontent.com/wso2/api-platform/refs/heads/main/platform-api/src/resources/openapi.yaml" -o /tmp/gw-api.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/Makefile` around lines 59 - 70, The Makefile's curl
invocations used by the OpenChoreo client and gen-observer-client targets
currently use `-sL` which can treat HTTP errors as success; update those curl
commands (the lines that download openchoreo-api.yaml and observer-api.yaml) to
include `--fail` and add simple retry options like `--retry 3 --retry-delay 2`
(keeping `-sL`) so downloads fail fast and return non-zero on HTTP errors;
ensure the rest of the target continues to rely on curl's exit code to prevent
running `oapi-codegen` with a bad payload.
agent-manager-service/clients/openchoreosvc/client/generic-workflows.go (1)

35-42: ⚠️ Potential issue | 🟡 Minor

Remove unused Phase field from WorkflowRunResponse.

The Phase field in WorkflowRunResponse (line 39) is never populated by convertWorkflowRunToResponse and is not used by any callers. Remove it from the struct definition unless it's needed for future API compatibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/generic-workflows.go`
around lines 35 - 42, Remove the unused Phase field from the WorkflowRunResponse
struct: edit the type WorkflowRunResponse to delete the Phase string field and
any associated JSON/tag definitions, then update convertWorkflowRunToResponse to
stop populating Phase (remove any assignment or placeholder) and adjust any
callers/tests that reference WorkflowRunResponse.Phase to use Status or other
fields instead; ensure compilation by running go vet/build after removing the
field.
agent-manager-service/clients/openchoreosvc/client/infrastructure.go (1)

318-326: ⚠️ Potential issue | 🟡 Minor

Address pipeline failure: codegen formatting check failed.

The pipeline indicates a formatting issue around line 315. The blank line at 318 or surrounding code may need reformatting.

Run make codegenfmt-check locally and commit any formatting changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/infrastructure.go` around
lines 318 - 326, The build failed codegen formatting around the return of
models.EnvironmentResponse; run the codegen formatter (make codegenfmt-check)
locally, fix any whitespace/blank-line issues around the block that constructs
&models.EnvironmentResponse (ensure there are no stray blank lines or misaligned
indentation before/after the composite literal), re-run the formatter until it
passes, and commit the formatted changes.
agent-manager-service/clients/openchoreosvc/client/builds.go (1)

113-132: ⚠️ Potential issue | 🟠 Major

Validate that the fetched WorkflowRun belongs to the requested agent/project.

GetWorkflowRunWithResponse is scoped only by namespace and name. Without checking the run labels before mapping, this can return another component's build under the caller's agentName and projectName.

🛠️ Suggested fix
 	if resp.JSON200 == nil {
 		return nil, fmt.Errorf("empty response from get build")
 	}
+	if resp.JSON200.Metadata.Labels == nil ||
+		(*resp.JSON200.Metadata.Labels)[string(LabelKeyComponentName)] != componentName ||
+		(*resp.JSON200.Metadata.Labels)[string(LabelKeyProjectName)] != projectName {
+		return nil, fmt.Errorf("%w: workflow run %q does not belong to component %q", utils.ErrNotFound, buildName, componentName)
+	}

 	return toBuildDetailsResponse(resp.JSON200, componentName, projectName)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/builds.go` around lines
113 - 132, The GetBuild function currently calls GetWorkflowRunWithResponse and
maps resp.JSON200 without verifying the run actually belongs to the requested
agent/project; add a validation step after the resp.JSON200 nil-check and before
calling toBuildDetailsResponse to inspect the workflow run's metadata.labels (or
other identifying fields) on resp.JSON200 (e.g., labels["agentName"],
labels["projectName"] or the component label) and ensure they match the provided
componentName and projectName (agentName if applicable); if labels do not match
return a not-found or permission error (e.g., fmt.Errorf or the existing
handleErrorResponse pattern) instead of mapping an unrelated run.
agent-manager-service/clients/openchoreosvc/client/components.go (1)

515-516: ⚠️ Potential issue | 🟠 Major

Use the new autoscaling field name consistently.

The reader/defaulting path in this PR now uses targetCPUUtilizationPercentage, but this update path still writes cpuUtilizationPercentage. Target CPU changes will not round-trip and can be ignored by the new schema.

🛠️ Suggested fix
 		if req.AutoScaling.TargetCPUUtilizationPercentage != nil {
-			autoscaling["cpuUtilizationPercentage"] = *req.AutoScaling.TargetCPUUtilizationPercentage
+			delete(autoscaling, "cpuUtilizationPercentage")
+			autoscaling["targetCPUUtilizationPercentage"] = *req.AutoScaling.TargetCPUUtilizationPercentage
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/components.go` around
lines 515 - 516, The autoscaling map is being written with the old key
"cpuUtilizationPercentage" instead of the new schema key
"targetCPUUtilizationPercentage", so update the write in the patch that handles
req.AutoScaling.TargetCPUUtilizationPercentage to use
"targetCPUUtilizationPercentage" (i.e., change the map assignment where
autoscaling["cpuUtilizationPercentage"] =
*req.AutoScaling.TargetCPUUtilizationPercentage to
autoscaling["targetCPUUtilizationPercentage"] =
*req.AutoScaling.TargetCPUUtilizationPercentage) ensuring the reader/defaulting
path and update path use the same field name.
🧹 Nitpick comments (9)
deployments/single-cluster/values-dp.yaml (1)

8-8: Clarify that TLS is disabled for local profile only.

enabled: false is reasonable for k3d local setup, but it’s easy to copy this into non-local environments. Add an explicit local-only warning to prevent accidental plaintext deployment.

Suggested tweak
 gateway:
   tls:
+    # Local single-cluster (k3d) only. Keep TLS enabled in shared/staging/prod.
     enabled: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/single-cluster/values-dp.yaml` at line 8, The TLS setting
"enabled: false" in deployments/single-cluster/values-dp.yaml should be
annotated to indicate it is intended for local k3d profiles only; update the
YAML comment above the "enabled: false" entry (the TLS enable flag) to clearly
state "LOCAL ONLY / do not copy to non-local environments — disables TLS and
uses plaintext" and recommend enabling TLS in production profiles (or point to
the production TLS configuration docs) so reviewers/users won't accidentally
deploy plaintext in non-local clusters.
deployments/helm-charts/wso2-amp-platform-resources-extension/templates/authz-cluster-role-binding.yaml (1)

18-21: Inconsistent default for role name.

Line 21 has an empty string default for role, but the template's own comment (line 3) states this binding "Grants admin role to the amp-api-client service account." The values.yaml explicitly provides role: admin, which mitigates the risk in normal deployments. However, the empty default contradicts the documented intent.

For consistency and to avoid surprises in non-standard deployments, consider aligning the default with the documented behavior:

Suggested fix
  roleMappings:
    - roleRef:
        kind: ClusterAuthzRole
-       name: {{ .Values.authz.ampApiClient.role | default "" }}
+       name: {{ .Values.authz.ampApiClient.role | default "admin" }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@deployments/helm-charts/wso2-amp-platform-resources-extension/templates/authz-cluster-role-binding.yaml`
around lines 18 - 21, The role name default is an empty string for roleRef.name
under roleMappings (ClusterAuthzRole), which contradicts the template comment
and values.yaml; update the Helm template reference
.Values.authz.ampApiClient.role to use a default of "admin" instead of "" so
roleRef.name defaults to admin when not provided (look for the roleMappings ->
roleRef -> name entry and change the default).
deployments/scripts/utils.sh (3)

65-71: Potential portability issue with sort -V on older macOS.

The sort -V (version sort) option is a GNU extension and may not be available on older macOS systems with BSD sort. Consider adding a fallback or documenting the requirement.

Alternative: manual version comparison for broader compatibility
version_gte() {
    local ver1="$1"
    local ver2="$2"
    # Try sort -V first, fall back to manual comparison
    if sort --version 2>/dev/null | grep -q "GNU"; then
        [ "$(printf '%s\n%s' "$ver2" "$ver1" | sort -V | head -n1)" = "$ver2" ]
    else
        # Manual comparison for major.minor
        local v1_major v1_minor v2_major v2_minor
        IFS='.' read -r v1_major v1_minor <<< "$ver1"
        IFS='.' read -r v2_major v2_minor <<< "$ver2"
        [ "$v1_major" -gt "$v2_major" ] || \
        { [ "$v1_major" -eq "$v2_major" ] && [ "$v1_minor" -ge "$v2_minor" ]; }
    fi
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/utils.sh` around lines 65 - 71, The version_gte function
relies on GNU-only sort -V which may be missing on older macOS; update
version_gte to detect if GNU sort is available (e.g., test sort --version or
grep for "GNU") and use sort -V when present, otherwise implement a portable
fallback manual comparison (split by dots and compare numeric components like
major/minor/patch) so version_gte("$1","$2") returns the correct boolean on both
GNU and BSD systems.

372-381: Consider adding error handling for CA extraction.

If the kubectl get secret or base64 -d commands fail, CA_CRT may be empty or malformed, which could silently propagate an invalid ConfigMap. Consider adding validation.

Proposed fix with error handling
   # 3. Copy cluster-gateway-ca ConfigMap from control-plane to desired namespace
-  CA_CRT=$(kubectl get secret cluster-gateway-ca \
-    -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}'| base64 -d)
+  CA_CRT=$(kubectl get secret cluster-gateway-ca \
+    -n openchoreo-control-plane -o jsonpath='{.data.ca\.crt}' | base64 -d) || {
+    echo "❌ Failed to extract CA certificate from cluster-gateway-ca secret"
+    return 1
+  }
+
+  if [ -z "$CA_CRT" ]; then
+    echo "❌ CA certificate is empty"
+    return 1
+  fi

   kubectl create configmap cluster-gateway-ca \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/utils.sh` around lines 372 - 381, The CA extraction step
assigns CA_CRT via "kubectl get secret ... | base64 -d" but lacks validation;
detect failures by checking the exit status and content of the CA_CRT variable
after the command (e.g., ensure it's non-empty and well-formed), log a clear
error via stderr/process logger if it's empty or if the kubectl command failed,
and exit non-zero to avoid creating a bad ConfigMap; update the block that sets
CA_CRT and the subsequent "kubectl create configmap cluster-gateway-ca ... -n
\"$PLANE_NAMESPACE\"" to perform this validation and early exit on error.

1-8: Add a shebang to specify the shell interpreter.

This file is sourced by other scripts but should still include a shebang for clarity and to satisfy static analysis tools. Additionally, lsof may not be available on all systems (e.g., minimal Linux containers).

Proposed fix
+#!/bin/bash
+
 # Util: Check if a port is in use
 is_port_in_use() {
     local port="$1"
-    if lsof -i :"$port" -sTCP:LISTEN &>/dev/null; then
+    # Try lsof first, fall back to /dev/tcp or ss
+    if command -v lsof &>/dev/null; then
+        lsof -i :"$port" -sTCP:LISTEN &>/dev/null && return 0
+    elif command -v ss &>/dev/null; then
+        ss -tuln | grep -q ":$port " && return 0
+    fi
+    return 1
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/utils.sh` around lines 1 - 8, Add a shebang at the top
(e.g. #!/usr/bin/env sh or bash) and update is_port_in_use to detect
availability of lsof (using command -v lsof) and fall back to a portable
alternative such as ss or netstat if lsof is missing; modify the function
is_port_in_use to choose the appropriate tool at runtime and use that tool's
flags to test for a listening TCP port so the check works in minimal containers
and passes static analysis.
Makefile (1)

145-145: Use $(MAKE) for recursive make invocation.

Line 145 should use $(MAKE) instead of make to preserve Make flags and parallel/jobserver behavior.

Suggested change
-	`@cd` agent-manager-service && make dev-migrate
+	`@cd` agent-manager-service && $(MAKE) dev-migrate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 145, Replace the recursive invocation "make dev-migrate" in
the Makefile recipe (currently written as "@cd agent-manager-service && make
dev-migrate") with "$(MAKE) dev-migrate" so that Make preserves flags,
jobserver/parallelism and other variables when delegating to the
agent-manager-service makefile; keep the surrounding "cd agent-manager-service
&&" and the leading "@" unchanged.
deployments/single-cluster/values-openbao.yaml (1)

36-39: Wildcard service account binding may be overly permissive.

The bound_service_account_names="*" grants the writer role to any service account in the specified namespaces. While acceptable for development, this should be scoped to specific service accounts in production.

Consider documenting this as a development-only configuration or restricting to specific service accounts:

       bao write auth/kubernetes/role/openchoreo-secret-writer-role \
-        bound_service_account_names="*" \
+        bound_service_account_names="workflow-controller,argo-workflows" \
         bound_service_account_namespaces="openbao,openchoreo-workflow-plane" \
         policies=openchoreo-secret-writer-policy ttl=20m
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/single-cluster/values-openbao.yaml` around lines 36 - 39, The
current Vault role creation uses bound_service_account_names="*" which is overly
permissive; update the bao command that creates role
openchoreo-secret-writer-role (with policy openchoreo-secret-writer-policy and
namespaces openbao, openchoreo-workflow-plane) to list explicit service account
names instead of "*" (e.g., bound_service_account_names="svc-a,svc-b") or, if
you must keep "*" for development, add a comment in values-openbao.yaml marking
this block as development-only and include a TODO to replace with specific
service accounts for production.
deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/generate-workload.yaml (1)

150-161: Add --fail-with-body to OAuth token request for consistent error handling.

The OAuth token request uses -s --fail-with-body but the -s (silent) flag may suppress useful error information. Consider using --fail-with-body alone or adding explicit HTTP code checking like the later API calls.

However, the subsequent check on lines 158-161 validates the access token, so this is a minor concern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/generate-workload.yaml`
around lines 150 - 161, The OAuth token curl invocation using TOKEN_RESPONSE
currently includes the -s flag which can hide useful error details; update the
TOKEN_RESPONSE call that posts to OAUTH_URL (using OAUTH_HOST, CLIENT_ID,
CLIENT_SECRET) to either remove -s and keep --fail-with-body or capture and
check the HTTP status explicitly (e.g., use curl -w to append status or separate
--write-out) and then validate the HTTP code before extracting ACCESS_TOKEN;
ensure on failure you log the full TOKEN_RESPONSE and HTTP status and exit the
same way the ACCESS_TOKEN null check currently does.
agent-manager-service/clients/observabilitysvc/client.go (1)

346-434: Consider extracting duplicated JSON log parsing logic.

convertComponentLogEntry and convertWorkflowLogEntry share identical JSON parsing logic for extracting msg, level, and time fields. Consider extracting this into a shared helper.

♻️ Suggested refactor
// parseJSONLogLine extracts structured fields from JSON-formatted log lines
func parseJSONLogLine(logLine string, entry *models.LogEntry) {
	if !strings.HasPrefix(strings.TrimSpace(logLine), "{") {
		return
	}
	var parsed map[string]interface{}
	if err := json.Unmarshal([]byte(logLine), &parsed); err != nil {
		return
	}
	if msg, ok := parsed["msg"]; ok {
		if msgStr, ok := msg.(string); ok && msgStr != "" {
			entry.Log = msgStr
		}
	}
	if entry.LogLevel == "" {
		if lvl, ok := parsed["level"]; ok {
			if lvlStr, ok := lvl.(string); ok {
				entry.LogLevel = strings.ToUpper(lvlStr)
			}
		}
	}
	if entry.Timestamp.IsZero() {
		if ts, ok := parsed["time"]; ok {
			if tsStr, ok := ts.(string); ok {
				if t, err := time.Parse(time.RFC3339, tsStr); err == nil {
					entry.Timestamp = t
				} else if t, err := time.Parse("2006-01-02T15:04:05", tsStr); err == nil {
					entry.Timestamp = t
				}
			}
		}
	}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/observabilitysvc/client.go` around lines 346 -
434, Both convertComponentLogEntry and convertWorkflowLogEntry duplicate the
JSON log parsing logic; extract that into a shared helper (e.g.,
parseJSONLogLine(logLine string, entry *models.LogEntry)) and call it from both
converters (pass the raw *log.Log string and &entry). The helper should perform
the existing checks (Trim/Prefix "{", json.Unmarshal), set entry.Log from
parsed["msg"], set entry.LogLevel only if empty from parsed["level"]
(uppercased), and set entry.Timestamp only if zero by parsing parsed["time"]
with RFC3339 then fallback format — preserving the current behavior of
convertComponentLogEntry and convertWorkflowLogEntry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent-manager-service/clients/observabilitysvc/client.go`:
- Around line 154-158: The call to searchScope.FromWorkflowSearchScope(...)
currently ignores its error return which can leave searchScope invalid; update
the code where searchScope is defined (variable searchScope and its
FromWorkflowSearchScope call) to capture the returned error, check it, and
handle it appropriately (e.g., return the error up the call chain or log and
return a wrapped error) before proceeding to downstream API calls so an
invalid/empty search scope cannot be used.
- Around line 271-277: The call to FromComponentSearchScope on searchScope (type
gen.LogsQueryRequest_SearchScope) currently ignores its returned error; update
the code to capture and check the error from
searchScope.FromComponentSearchScope(gen.ComponentSearchScope{...}) and handle
it appropriately (e.g., return the error from the enclosing function or log and
wrap it) so invalid search scopes are surfaced rather than ignored; ensure you
reference the same symbols: searchScope, FromComponentSearchScope, and
gen.ComponentSearchScope when adding the error variable and handling logic.
- Around line 191-195: The call to FromWorkflowSearchScope currently ignores its
returned error when populating searchScope (gen.LogsQueryRequest_SearchScope);
update the code around the searchScope variable to capture the error (e.g., err
:= searchScope.FromWorkflowSearchScope(...)), check err and handle it
consistently with the surrounding code flow (return the error or log and wrap it
as other functions in this file do) so failures constructing the search scope
are not silently dropped.

In `@agent-manager-service/clients/openchoreosvc/client/builds.go`:
- Around line 63-72: The code currently writes commitID into params["commit"],
but the workflow expects the commit at parameters.repository.revision.commit;
update the logic that builds params (the block handling
component.Spec.Workflow.Parameters and commitID) to set the nested key
"repository.revision.commit" with the commitID instead of the top-level "commit"
(also apply the same change in the other occurrence that manipulates
params/commitID). Locate usages around the params variable and the commitID
assignment in builds.go and replace the top-level key write with a nested
repository.revision.commit entry so the workflow reads the requested SHA.
- Around line 575-592: The loop over run.Status.Conditions in builds.go
currently returns on the first matching True condition which depends on slice
order; instead, scan all conditions (e.g., in the function that iterates over
run.Status.Conditions) without early returns, record flags for
WorkflowCompleted, WorkflowSucceeded, WorkflowFailed, and WorkflowRunning (and
capture Completed->Reason == "WorkflowSucceeded"), then after the loop decide
and return the status with the correct precedence (completed/succeeded/failed
before running). Update the code that references cond.Type ==
"WorkflowCompleted"/"WorkflowRunning"/"WorkflowSucceeded"/"WorkflowFailed" to
set booleans and make a single deterministic return based on those flags.
- Around line 228-237: The current block unconditionally writes repository
fields and therefore overwrites workflow defaults with empty values (branch ==
"" and appPath normalized to "/"); update the logic that builds
workflowParams["repository"] so it only sets "appPath" and the nested
"revision.branch" when the corresponding req.Repository.AppPath and
req.Repository.Branch are non-empty (and avoid calling normalizePath on an empty
string), e.g. construct the map for repository conditionally: always include
"url", then add "appPath" only if req.Repository.AppPath != "", and add
"revision":{"branch":...} only if req.Repository.Branch != ""; this prevents
overwriting defaults; refer to workflowParams, req.Repository, normalizePath
when making the change.

In `@agent-manager-service/clients/openchoreosvc/client/components.go`:
- Around line 564-565: The call to getEnvConfigDefaultsFromComponentType
currently swallows errors and returns an empty default, so change its signature
and callers (including GetEnvResourceConfigs and the other usage around the
second occurrence) to propagate errors: update
getEnvConfigDefaultsFromComponentType to return (ConfigDefaults, error) (or
pointer+error), return any non-200 or fetch/error as an error, and in
GetEnvResourceConfigs (and the other caller) check the error and return it
instead of treating it as "no defaults"; ensure callers handle the error path
and preserve existing success responses.
- Around line 221-228: The code always writes empty repository fields into
params, which overrides ClusterWorkflow defaults; change the logic that builds
params (the block handling req.Repository in components.go) to only set
"appPath" and the nested "revision.branch" when req.Repository.AppPath and
req.Repository.Branch are non-empty (and only create the "revision" map if
Branch is set), and keep adding "repository" to params only when req.Repository
is non-nil; this preserves the schema defaults for repository.revision.branch
("main") and repository.appPath (".") when those values are omitted.

In `@console/apps/webapp/public/config.js`:
- Line 2: Update the copyright header in public/config.js so the year is not
regressed to 2025; change the lone "2025" to the correct current year "2026" or
use a range like "2025-2026" to reflect both years, ensuring the header string
matches the repository's copyright convention.

In
`@deployments/helm-charts/wso2-amp-build-extension/templates/workload-publisher-authz-binding.yaml`:
- Around line 1-17: The template for ClusterAuthzRoleBinding (metadata name
amp-workflow-plane-workload-publisher-binding) is entirely commented out so the
binding never gets created; restore it by uncommenting or by converting it into
an active template block that conditionally renders when needed (e.g., when a
new value like global.enableWorkloadPublisherBinding or when
.Values.global.oauth.clientId is set), ensuring the resource kind
ClusterAuthzRoleBinding, spec.effect: allow, entitlement.claim: sub with value
from .Values.global.oauth.clientId, and the roleMappings pointing to
ClusterAuthzRole name workload-publisher are present so the workload-publisher
authorization is actually created at deploy time.

In
`@deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-traits/api-management-trait.yaml`:
- Around line 17-19: The template advertises a configurable "context" parameter
but the RestApi resource still derives spec.context from metadata; update the
code that constructs the RestApi (the logic that sets spec.context on the
RestApi object) to use the provided parameters.context (i.e.,
${parameters.context}) when present instead of reading from metadata; ensure you
reference the "context" parameter in the template/templating function that
creates the RestApi so spec.context = parameters.context (falling back to the
previous metadata-derived value only if parameters.context is empty).

In `@deployments/scripts/setup-openchoreo.sh`:
- Around line 454-456: The displayed header text is inconsistent: update the
echo statement that currently prints "--- Build Plane ---" so it reads "---
Workflow Plane ---" to match the kubectl namespace reference
(`openchoreo-workflow-plane`) and the migration naming; change the string inside
the echo call that prints the header accordingly (look for the echo "--- Build
Plane ---" line).

In `@deployments/scripts/utils.sh`:
- Around line 224-236: The YAML under the gateway ingress block has inconsistent
indentation for the external key; align the indentation so that "external:" is a
child of "ingress:" with the same indentation level as the other ingress
children (i.e., adjust spacing around the keys "gateway", "ingress", "external",
and their nested "name", "namespace", "http", and "https" blocks) to produce a
consistent hierarchy (gateway → ingress → external → {name, namespace, http,
https}).

In `@docs/install/managed-cluster.md`:
- Around line 554-590: The two ExternalSecret resources are inconsistent:
observer-secret uses "property: value" while opensearch-admin-credentials does
not; update the opensearch-admin-credentials ExternalSecret (the resource named
opensearch-admin-credentials and its data entries for secretKey: username and
secretKey: password) to include "property: value" under each remoteRef to match
how OpenBao stores secrets (or alternatively remove property: value from
observer-secret if you confirm secrets are plain values); ensure both
ExternalSecret specs use the same remoteRef structure so key extraction works
consistently.

In `@docs/install/self-hosted-cluster.md`:
- Around line 521-557: The opensearch-admin-credentials ExternalSecret is
missing the `property: value` entries that observer-secret uses; update the
ExternalSecret spec for the ExternalSecret named opensearch-admin-credentials to
add `property: value` to each remoteRef (matching how OpenBao stores secrets) so
keys like opensearch-username and opensearch-password include `property: value`
just like in the observer-secret ExternalSecret.

In `@docs/quick-start.md`:
- Line 55: The doc uses outdated term "Build Plane" in the "OpenChoreo
Installation" bullet causing inconsistency; update that occurrence of "Build
Plane" to "Workflow Plane" so it matches the earlier line and OpenChoreo
v1.0.0-rc1 naming. Locate the bullet containing "OpenChoreo Installation" and
replace "Build Plane" with "Workflow Plane" in the sentence that lists Control
Plane, Data Plane, Build Plane, and Observability Plane (ensure surrounding
punctuation/spacing remains unchanged).

---

Outside diff comments:
In `@agent-manager-service/clients/openchoreosvc/client/builds.go`:
- Around line 113-132: The GetBuild function currently calls
GetWorkflowRunWithResponse and maps resp.JSON200 without verifying the run
actually belongs to the requested agent/project; add a validation step after the
resp.JSON200 nil-check and before calling toBuildDetailsResponse to inspect the
workflow run's metadata.labels (or other identifying fields) on resp.JSON200
(e.g., labels["agentName"], labels["projectName"] or the component label) and
ensure they match the provided componentName and projectName (agentName if
applicable); if labels do not match return a not-found or permission error
(e.g., fmt.Errorf or the existing handleErrorResponse pattern) instead of
mapping an unrelated run.

In `@agent-manager-service/clients/openchoreosvc/client/components.go`:
- Around line 515-516: The autoscaling map is being written with the old key
"cpuUtilizationPercentage" instead of the new schema key
"targetCPUUtilizationPercentage", so update the write in the patch that handles
req.AutoScaling.TargetCPUUtilizationPercentage to use
"targetCPUUtilizationPercentage" (i.e., change the map assignment where
autoscaling["cpuUtilizationPercentage"] =
*req.AutoScaling.TargetCPUUtilizationPercentage to
autoscaling["targetCPUUtilizationPercentage"] =
*req.AutoScaling.TargetCPUUtilizationPercentage) ensuring the reader/defaulting
path and update path use the same field name.

In `@agent-manager-service/clients/openchoreosvc/client/generic-workflows.go`:
- Around line 35-42: Remove the unused Phase field from the WorkflowRunResponse
struct: edit the type WorkflowRunResponse to delete the Phase string field and
any associated JSON/tag definitions, then update convertWorkflowRunToResponse to
stop populating Phase (remove any assignment or placeholder) and adjust any
callers/tests that reference WorkflowRunResponse.Phase to use Status or other
fields instead; ensure compilation by running go vet/build after removing the
field.

In `@agent-manager-service/clients/openchoreosvc/client/infrastructure.go`:
- Around line 318-326: The build failed codegen formatting around the return of
models.EnvironmentResponse; run the codegen formatter (make codegenfmt-check)
locally, fix any whitespace/blank-line issues around the block that constructs
&models.EnvironmentResponse (ensure there are no stray blank lines or misaligned
indentation before/after the composite literal), re-run the formatter until it
passes, and commit the formatted changes.

In `@agent-manager-service/Makefile`:
- Around line 59-70: The Makefile's curl invocations used by the OpenChoreo
client and gen-observer-client targets currently use `-sL` which can treat HTTP
errors as success; update those curl commands (the lines that download
openchoreo-api.yaml and observer-api.yaml) to include `--fail` and add simple
retry options like `--retry 3 --retry-delay 2` (keeping `-sL`) so downloads fail
fast and return non-zero on HTTP errors; ensure the rest of the target continues
to rely on curl's exit code to prevent running `oapi-codegen` with a bad
payload.

In `@deployments/k3d-local-config.yaml`:
- Around line 8-9: kubeAPI currently only sets hostPort, leaving kubeAPI.host
unspecified which causes k3d to bind the API server to 0.0.0.0; update the k3d
config to explicitly set kubeAPI.host (e.g., "127.0.0.1" or "localhost")
alongside kubeAPI.hostPort to restrict the API server binding to localhost and
prevent exposure on all interfaces.

---

Nitpick comments:
In `@agent-manager-service/clients/observabilitysvc/client.go`:
- Around line 346-434: Both convertComponentLogEntry and convertWorkflowLogEntry
duplicate the JSON log parsing logic; extract that into a shared helper (e.g.,
parseJSONLogLine(logLine string, entry *models.LogEntry)) and call it from both
converters (pass the raw *log.Log string and &entry). The helper should perform
the existing checks (Trim/Prefix "{", json.Unmarshal), set entry.Log from
parsed["msg"], set entry.LogLevel only if empty from parsed["level"]
(uppercased), and set entry.Timestamp only if zero by parsing parsed["time"]
with RFC3339 then fallback format — preserving the current behavior of
convertComponentLogEntry and convertWorkflowLogEntry.

In
`@deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/generate-workload.yaml`:
- Around line 150-161: The OAuth token curl invocation using TOKEN_RESPONSE
currently includes the -s flag which can hide useful error details; update the
TOKEN_RESPONSE call that posts to OAUTH_URL (using OAUTH_HOST, CLIENT_ID,
CLIENT_SECRET) to either remove -s and keep --fail-with-body or capture and
check the HTTP status explicitly (e.g., use curl -w to append status or separate
--write-out) and then validate the HTTP code before extracting ACCESS_TOKEN;
ensure on failure you log the full TOKEN_RESPONSE and HTTP status and exit the
same way the ACCESS_TOKEN null check currently does.

In
`@deployments/helm-charts/wso2-amp-platform-resources-extension/templates/authz-cluster-role-binding.yaml`:
- Around line 18-21: The role name default is an empty string for roleRef.name
under roleMappings (ClusterAuthzRole), which contradicts the template comment
and values.yaml; update the Helm template reference
.Values.authz.ampApiClient.role to use a default of "admin" instead of "" so
roleRef.name defaults to admin when not provided (look for the roleMappings ->
roleRef -> name entry and change the default).

In `@deployments/scripts/utils.sh`:
- Around line 65-71: The version_gte function relies on GNU-only sort -V which
may be missing on older macOS; update version_gte to detect if GNU sort is
available (e.g., test sort --version or grep for "GNU") and use sort -V when
present, otherwise implement a portable fallback manual comparison (split by
dots and compare numeric components like major/minor/patch) so
version_gte("$1","$2") returns the correct boolean on both GNU and BSD systems.
- Around line 372-381: The CA extraction step assigns CA_CRT via "kubectl get
secret ... | base64 -d" but lacks validation; detect failures by checking the
exit status and content of the CA_CRT variable after the command (e.g., ensure
it's non-empty and well-formed), log a clear error via stderr/process logger if
it's empty or if the kubectl command failed, and exit non-zero to avoid creating
a bad ConfigMap; update the block that sets CA_CRT and the subsequent "kubectl
create configmap cluster-gateway-ca ... -n \"$PLANE_NAMESPACE\"" to perform this
validation and early exit on error.
- Around line 1-8: Add a shebang at the top (e.g. #!/usr/bin/env sh or bash) and
update is_port_in_use to detect availability of lsof (using command -v lsof) and
fall back to a portable alternative such as ss or netstat if lsof is missing;
modify the function is_port_in_use to choose the appropriate tool at runtime and
use that tool's flags to test for a listening TCP port so the check works in
minimal containers and passes static analysis.

In `@deployments/single-cluster/values-dp.yaml`:
- Line 8: The TLS setting "enabled: false" in
deployments/single-cluster/values-dp.yaml should be annotated to indicate it is
intended for local k3d profiles only; update the YAML comment above the
"enabled: false" entry (the TLS enable flag) to clearly state "LOCAL ONLY / do
not copy to non-local environments — disables TLS and uses plaintext" and
recommend enabling TLS in production profiles (or point to the production TLS
configuration docs) so reviewers/users won't accidentally deploy plaintext in
non-local clusters.

In `@deployments/single-cluster/values-openbao.yaml`:
- Around line 36-39: The current Vault role creation uses
bound_service_account_names="*" which is overly permissive; update the bao
command that creates role openchoreo-secret-writer-role (with policy
openchoreo-secret-writer-policy and namespaces openbao,
openchoreo-workflow-plane) to list explicit service account names instead of "*"
(e.g., bound_service_account_names="svc-a,svc-b") or, if you must keep "*" for
development, add a comment in values-openbao.yaml marking this block as
development-only and include a TODO to replace with specific service accounts
for production.

In `@Makefile`:
- Line 145: Replace the recursive invocation "make dev-migrate" in the Makefile
recipe (currently written as "@cd agent-manager-service && make dev-migrate")
with "$(MAKE) dev-migrate" so that Make preserves flags, jobserver/parallelism
and other variables when delegating to the agent-manager-service makefile; keep
the surrounding "cd agent-manager-service &&" and the leading "@" unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2c23769e-1927-442c-a43b-ed01df651934

📥 Commits

Reviewing files that changed from the base of the PR and between cdb51f6 and 00cfbb2.

⛔ Files ignored due to path filters (5)
  • agent-manager-service/clients/observabilitysvc/gen/client.gen.go is excluded by !**/gen/**
  • agent-manager-service/clients/observabilitysvc/gen/types.gen.go is excluded by !**/gen/**
  • agent-manager-service/clients/openchoreosvc/gen/client.gen.go is excluded by !**/gen/**
  • agent-manager-service/clients/openchoreosvc/gen/types.gen.go is excluded by !**/gen/**
  • console/common/config/rush/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (50)
  • Makefile
  • agent-manager-service/Makefile
  • agent-manager-service/clients/observabilitysvc/client.go
  • agent-manager-service/clients/openchoreosvc/client/builds.go
  • agent-manager-service/clients/openchoreosvc/client/components.go
  • agent-manager-service/clients/openchoreosvc/client/deployments.go
  • agent-manager-service/clients/openchoreosvc/client/generic-workflows.go
  • agent-manager-service/clients/openchoreosvc/client/infrastructure.go
  • agent-manager-service/clients/openchoreosvc/client/projects.go
  • agent-manager-service/clients/openchoreosvc/client/secret_references.go
  • agent-manager-service/services/agent_manager.go
  • console/apps/webapp/public/config.js
  • deployments/helm-charts/wso2-amp-build-extension/templates/_helpers.tpl
  • deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/balleina-buildpack-build.yaml
  • deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/generate-workload.yaml
  • deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/google-buildpack-build.yaml
  • deployments/helm-charts/wso2-amp-build-extension/templates/cluster-workflow-templates/publish-image.yaml
  • deployments/helm-charts/wso2-amp-build-extension/templates/workload-publisher-authz-binding.yaml
  • deployments/helm-charts/wso2-amp-build-extension/values.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/authz-cluster-role-binding.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-traits/api-management-trait.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-traits/hpa-trait.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-traits/instrumentation-trait-env-injection.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-traits/python-otel-instrumentation-trait.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/external-agent-api.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-workflows/amp-ballerina-buildpack.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-workflows/amp-docker.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-workflows/amp-google-cloud-buildpacks.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/environment.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/label-default-namespace-job.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/project.yaml
  • deployments/helm-charts/wso2-amp-platform-resources-extension/values.yaml
  • deployments/helm-charts/wso2-amp-thunder-extension/templates/amp-thunder-bootstrap.yaml
  • deployments/helm-charts/wso2-amp-thunder-extension/values.yaml
  • deployments/k3d-local-config.yaml
  • deployments/quick-start/install.sh
  • deployments/scripts/env.sh
  • deployments/scripts/setup-colima.sh
  • deployments/scripts/setup-k3d.sh
  • deployments/scripts/setup-openchoreo.sh
  • deployments/scripts/setup-prerequisites.sh
  • deployments/scripts/utils.sh
  • deployments/single-cluster/values-bp.yaml
  • deployments/single-cluster/values-dp.yaml
  • deployments/single-cluster/values-op.yaml
  • deployments/single-cluster/values-openbao.yaml
  • docs/install/managed-cluster.md
  • docs/install/self-hosted-cluster.md
  • docs/quick-start.md
💤 Files with no reviewable changes (1)
  • deployments/single-cluster/values-bp.yaml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agent-manager-service/clients/openchoreosvc/client/infrastructure.go (1)

318-325: ⚠️ Potential issue | 🟠 Major

DNSPrefix is dropped from environment responses but still expected downstream.

The converter in infrastructure.go lines 318-325 does not set DNSPrefix in the returned EnvironmentResponse, yet environment_service.go still forwards it to callers (lines 86, 135), and controllers consume it (gateway_controller.go lines 645, 698; environment_controller.go lines 86, 260).

OpenChoreo RC1 exposes DNS information through the Environment Gateway structure (Ingress/EgressExternal/InternalHttp/Https/TlsHost fields), but this converter does not extract or map it to DNSPrefix. Either map the appropriate Gateway.Host value here or deprecate DNSPrefix end-to-end in this same PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/infrastructure.go` around
lines 318 - 325, The converter in infrastructure.go that builds
models.EnvironmentResponse currently omits DNSPrefix, causing downstream
consumers (environment_service.go and
gateway_controller.go/environment_controller.go) to receive nil/missing values;
update the converter that returns &models.EnvironmentResponse
(infrastructure.go) to extract the appropriate Gateway host from the
Environment's Gateway tree (check env.Gateway.Ingress/Egress → External/Internal
→ Http/Https/Tls → Host) and set DNSPrefix to that host (or the chosen fallback
if multiple are present), so DNSPrefix is populated consistently for callers;
alternatively, if deprecation is intended, remove forwarding/usages of DNSPrefix
end-to-end instead, but do one of these two actions in this change.
♻️ Duplicate comments (1)
agent-manager-service/clients/openchoreosvc/client/components.go (1)

220-228: ⚠️ Potential issue | 🟠 Major

Preserve ClusterWorkflow defaults when repo fields are omitted.

This always sends repository.appPath and repository.revision.branch, so empty request values override the rc1 defaults instead of letting the workflow fall back to them.

🛠️ Proposed fix
  if req.Repository != nil {
-		params["repository"] = map[string]any{
-			"url":     req.Repository.URL,
-			"appPath": req.Repository.AppPath,
-			"revision": map[string]any{
-				"branch": req.Repository.Branch,
-			},
-		}
+		repo := map[string]any{
+			"url": req.Repository.URL,
+		}
+		if req.Repository.AppPath != "" {
+			repo["appPath"] = req.Repository.AppPath
+		}
+		if req.Repository.Branch != "" {
+			repo["revision"] = map[string]any{
+				"branch": req.Repository.Branch,
+			}
+		}
+		params["repository"] = repo
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/components.go` around
lines 220 - 228, The code currently always sets repository.appPath and
repository.revision.branch which can override ClusterWorkflow defaults; update
the block that builds params["repository"] (the code referencing req.Repository,
params, and keys "repository", "appPath", "revision", "branch") to only add
"appPath" when req.Repository.AppPath is non-empty and only add a "revision" map
containing "branch" when req.Repository.Branch is non-empty, so omitted fields
are not sent and the workflow can fall back to its defaults.
🧹 Nitpick comments (5)
deployments/scripts/utils.sh (2)

73-76: Consider adding --client flag for kubectl version check.

kubectl version without --client attempts to contact the cluster, which may timeout or produce extra output. While the grep extraction still works, adding --client would make it faster and more reliable.

Proposed fix
 # Util: Extract version number from command output
 get_version() {
-    "$1" version 2>/dev/null | grep -oE '[0-9]+\.[0-9]+' | head -1
+    case "$1" in
+        kubectl) kubectl version --client 2>/dev/null | grep -oE '[0-9]+\.[0-9]+' | head -1 ;;
+        *)       "$1" version 2>/dev/null | grep -oE '[0-9]+\.[0-9]+' | head -1 ;;
+    esac
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/utils.sh` around lines 73 - 76, The get_version() helper
calls "$1" version which may try to contact the cluster; update get_version to
invoke the CLI with the client-only flag (e.g. use "$1" version --client or the
equivalent flag for the tool) so it does not attempt network calls and returns
quickly; locate the get_version function and change the invoked command string
that currently reads "$1" version to include the --client option and keep the
existing stderr redirection and grep extraction logic intact.

1-8: Add a shebang to specify the target shell.

This file is sourced by other scripts, but adding a shebang helps static analysis tools (Shellcheck SC2148) and editors provide correct syntax highlighting and linting. Since the sourcing scripts use #!/bin/bash, this file should declare the same.

Proposed fix
+#!/bin/bash
+
 # Util: Check if a port is in use
 is_port_in_use() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/utils.sh` around lines 1 - 8, Add a bash shebang to the
top of the script so editors and linters recognize the target shell; prepend
"#!/bin/bash" as the first line of the file that defines is_port_in_use() to
match the sourcing scripts and satisfy ShellCheck SC2148 while preserving the
script’s ability to be sourced.
agent-manager-service/clients/observabilitysvc/client.go (3)

326-350: Silent failure when both union variants fail to parse.

If resp.Logs is not nil but both AsLogsQueryResponseLogs0() and AsLogsQueryResponseLogs1() return errors, the function silently returns an empty logs array. This could mask API changes or unexpected response formats.

Consider logging a warning when neither variant parses successfully:

♻️ Proposed logging for debugging
 	// Try to parse as workflow logs
 	workflowLogs, err := resp.Logs.AsLogsQueryResponseLogs1()
-	if err == nil && len(workflowLogs) > 0 {
+	if err == nil {
 		for _, log := range workflowLogs {
 			entry := convertWorkflowLogEntry(&log)
 			result.Logs = append(result.Logs, entry)
 		}
+	} else {
+		// Both parsing attempts failed - log for debugging
+		slog.Warn("Failed to parse logs response as either component or workflow logs")
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/observabilitysvc/client.go` around lines 326 -
350, When resp.Logs is non-nil but both AsLogsQueryResponseLogs0() and
AsLogsQueryResponseLogs1() fail, add a warning log so the failure isn't silent:
detect the case where both errs are non-nil (after calling
resp.Logs.AsLogsQueryResponseLogs0() and resp.Logs.AsLogsQueryResponseLogs1()),
and call your logger (e.g. processLogger or the existing logger in scope) to
emit a warning that includes both error messages and a short representation of
resp.Logs (or its raw value) to aid debugging; then return result as before.
Reference the parsing calls AsLogsQueryResponseLogs0 and
AsLogsQueryResponseLogs1 and the result.Logs slice to locate where to insert the
log.

352-440: Consider extracting shared JSON parsing logic.

Both convertComponentLogEntry and convertWorkflowLogEntry contain nearly identical JSON parsing blocks for extracting msg, level, and time fields. While the input types differ, a helper function could reduce duplication:

♻️ Optional helper extraction
// parseStructuredLogFields extracts msg, level, and time from JSON log lines
func parseStructuredLogFields(logLine string) (msg string, level string, timestamp time.Time, ok bool) {
	if !strings.HasPrefix(strings.TrimSpace(logLine), "{") {
		return "", "", time.Time{}, false
	}
	var parsed map[string]interface{}
	if err := json.Unmarshal([]byte(logLine), &parsed); err != nil {
		return "", "", time.Time{}, false
	}
	if m, ok := parsed["msg"]; ok {
		if msgStr, ok := m.(string); ok {
			msg = msgStr
		}
	}
	if lvl, ok := parsed["level"]; ok {
		if lvlStr, ok := lvl.(string); ok {
			level = strings.ToUpper(lvlStr)
		}
	}
	if ts, ok := parsed["time"]; ok {
		if tsStr, ok := ts.(string); ok {
			if t, err := time.Parse(time.RFC3339, tsStr); err == nil {
				timestamp = t
			} else if t, err := time.Parse("2006-01-02T15:04:05", tsStr); err == nil {
				timestamp = t
			}
		}
	}
	return msg, level, timestamp, true
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/observabilitysvc/client.go` around lines 352 -
440, Both convertComponentLogEntry and convertWorkflowLogEntry duplicate the
JSON parsing logic; extract that into a helper (e.g., parseStructuredLogFields)
that accepts the raw log string and returns (msg string, level string, timestamp
time.Time, ok bool). Replace the duplicated blocks in convertComponentLogEntry
and convertWorkflowLogEntry by calling parseStructuredLogFields(*log.Log) and
then set entry.Log, entry.LogLevel, and entry.Timestamp only when the helper
returns values (respecting existing fallback rules and zero checks). Ensure the
helper handles trimming, JSON unmarshal, extracting "msg"/"level"/"time" (with
RFC3339 and the fallback format) and uppercases level as in the originals.

490-499: Direct type cast without validation.

The function directly casts input strings to gen.LogsQueryRequestLogLevels without validating against allowed enum values. Invalid levels will be passed to the API and may cause unclear error responses.

This is acceptable if the upstream API validates inputs, but consider documenting expected values or adding validation for clearer error messages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/observabilitysvc/client.go` around lines 490 -
499, convertLogLevels currently casts strings to gen.LogsQueryRequestLogLevels
without validation; update convertLogLevels to validate each input string
against the allowed enum values (the gen.LogsQueryRequestLogLevels constants),
and return a clear error when any input is invalid instead of silently casting;
implement this by building a set/map of allowed enum string values, iterating
over levels, collecting valid enums into the result slice and returning nil plus
an error if an invalid value is encountered (also update the function signature
to return (*[]gen.LogsQueryRequestLogLevels, error) and update callers
accordingly), or alternatively filter out invalid entries and log a warning if
you prefer non-failing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent-manager-service/clients/openchoreosvc/client/components.go`:
- Around line 1375-1382: The rollout timestamp uses time.RFC3339 which only has
second precision; update both occurrences in UpdateReleaseBindingEnvVars and
RemoveReleaseBindingEnvVars to use time.RFC3339Nano so the restartedAt value has
sub-second precision. Locate where
releaseBinding.Spec.ComponentTypeEnvironmentConfigs is initialized and where
(*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"] is set,
and replace time.RFC3339 with time.RFC3339Nano, leaving the map creation and
assignment logic unchanged.
- Around line 756-766: applyResourceDefaults currently reads properties directly
from resourcesProp without dereferencing possible $ref, so when resources is a
reference the requests/limits defaults are missing; before extracting
resourceProps do resourcesProp = resolveRef(resourcesProp, defs) (or call
resolveRef on the incoming resourcesProp) so that resourceProps, the requests
block, and subsequent applyQuantityDefaults for resources.Requests/limits
operate on the resolved schema; update applyResourceDefaults to resolve the
resourcesProp using resolveRef(defs) prior to casting to map[string]interface{}
and then continue as before.

In `@agent-manager-service/clients/openchoreosvc/client/infrastructure.go`:
- Around line 224-228: The mapping loop that builds models.TargetEnvironmentRef
(in infrastructure.go where pp.TargetEnvironmentRefs is iterated) only sets Name
and leaves RequiresApproval zeroed; either remove RequiresApproval from the
internal models.TargetEnvironmentRef type if OC no longer provides approval
metadata, or locate the new approval metadata on the OC type (the OC-generated
TargetEnvironmentRef or its parent) and assign it into
models.TargetEnvironmentRef.RequiresApproval inside the loop so the internal
model reflects the real approval state; update the mapping where targetRefs[j]
is constructed to set RequiresApproval from the correct OC field or remove the
field from the model if obsolete.

---

Outside diff comments:
In `@agent-manager-service/clients/openchoreosvc/client/infrastructure.go`:
- Around line 318-325: The converter in infrastructure.go that builds
models.EnvironmentResponse currently omits DNSPrefix, causing downstream
consumers (environment_service.go and
gateway_controller.go/environment_controller.go) to receive nil/missing values;
update the converter that returns &models.EnvironmentResponse
(infrastructure.go) to extract the appropriate Gateway host from the
Environment's Gateway tree (check env.Gateway.Ingress/Egress → External/Internal
→ Http/Https/Tls → Host) and set DNSPrefix to that host (or the chosen fallback
if multiple are present), so DNSPrefix is populated consistently for callers;
alternatively, if deprecation is intended, remove forwarding/usages of DNSPrefix
end-to-end instead, but do one of these two actions in this change.

---

Duplicate comments:
In `@agent-manager-service/clients/openchoreosvc/client/components.go`:
- Around line 220-228: The code currently always sets repository.appPath and
repository.revision.branch which can override ClusterWorkflow defaults; update
the block that builds params["repository"] (the code referencing req.Repository,
params, and keys "repository", "appPath", "revision", "branch") to only add
"appPath" when req.Repository.AppPath is non-empty and only add a "revision" map
containing "branch" when req.Repository.Branch is non-empty, so omitted fields
are not sent and the workflow can fall back to its defaults.

---

Nitpick comments:
In `@agent-manager-service/clients/observabilitysvc/client.go`:
- Around line 326-350: When resp.Logs is non-nil but both
AsLogsQueryResponseLogs0() and AsLogsQueryResponseLogs1() fail, add a warning
log so the failure isn't silent: detect the case where both errs are non-nil
(after calling resp.Logs.AsLogsQueryResponseLogs0() and
resp.Logs.AsLogsQueryResponseLogs1()), and call your logger (e.g. processLogger
or the existing logger in scope) to emit a warning that includes both error
messages and a short representation of resp.Logs (or its raw value) to aid
debugging; then return result as before. Reference the parsing calls
AsLogsQueryResponseLogs0 and AsLogsQueryResponseLogs1 and the result.Logs slice
to locate where to insert the log.
- Around line 352-440: Both convertComponentLogEntry and convertWorkflowLogEntry
duplicate the JSON parsing logic; extract that into a helper (e.g.,
parseStructuredLogFields) that accepts the raw log string and returns (msg
string, level string, timestamp time.Time, ok bool). Replace the duplicated
blocks in convertComponentLogEntry and convertWorkflowLogEntry by calling
parseStructuredLogFields(*log.Log) and then set entry.Log, entry.LogLevel, and
entry.Timestamp only when the helper returns values (respecting existing
fallback rules and zero checks). Ensure the helper handles trimming, JSON
unmarshal, extracting "msg"/"level"/"time" (with RFC3339 and the fallback
format) and uppercases level as in the originals.
- Around line 490-499: convertLogLevels currently casts strings to
gen.LogsQueryRequestLogLevels without validation; update convertLogLevels to
validate each input string against the allowed enum values (the
gen.LogsQueryRequestLogLevels constants), and return a clear error when any
input is invalid instead of silently casting; implement this by building a
set/map of allowed enum string values, iterating over levels, collecting valid
enums into the result slice and returning nil plus an error if an invalid value
is encountered (also update the function signature to return
(*[]gen.LogsQueryRequestLogLevels, error) and update callers accordingly), or
alternatively filter out invalid entries and log a warning if you prefer
non-failing behavior.

In `@deployments/scripts/utils.sh`:
- Around line 73-76: The get_version() helper calls "$1" version which may try
to contact the cluster; update get_version to invoke the CLI with the
client-only flag (e.g. use "$1" version --client or the equivalent flag for the
tool) so it does not attempt network calls and returns quickly; locate the
get_version function and change the invoked command string that currently reads
"$1" version to include the --client option and keep the existing stderr
redirection and grep extraction logic intact.
- Around line 1-8: Add a bash shebang to the top of the script so editors and
linters recognize the target shell; prepend "#!/bin/bash" as the first line of
the file that defines is_port_in_use() to match the sourcing scripts and satisfy
ShellCheck SC2148 while preserving the script’s ability to be sourced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d79b939f-6266-4a79-82b8-8bb9d2e9c148

📥 Commits

Reviewing files that changed from the base of the PR and between 00cfbb2 and 34c0f95.

📒 Files selected for processing (12)
  • agent-manager-service/clients/observabilitysvc/client.go
  • agent-manager-service/clients/openchoreosvc/client/builds.go
  • agent-manager-service/clients/openchoreosvc/client/components.go
  • agent-manager-service/clients/openchoreosvc/client/constants.go
  • agent-manager-service/clients/openchoreosvc/client/infrastructure.go
  • console/apps/webapp/public/config.js
  • deployments/quick-start/install.sh
  • deployments/scripts/setup-openchoreo.sh
  • deployments/scripts/utils.sh
  • docs/install/managed-cluster.md
  • docs/install/self-hosted-cluster.md
  • docs/quick-start.md
💤 Files with no reviewable changes (1)
  • console/apps/webapp/public/config.js
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/install/self-hosted-cluster.md
  • docs/quick-start.md
  • deployments/quick-start/install.sh
  • docs/install/managed-cluster.md
  • agent-manager-service/clients/openchoreosvc/client/builds.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deployments/quick-start/uninstall.sh (1)

347-365: ⚠️ Potential issue | 🟠 Major

Full uninstall still omits the new OpenBao bootstrap.

deployments/quick-start/install.sh, Lines 766-814 now add an openbao Helm release, an openbao namespace, and ClusterSecretStore/default, but the full-uninstall path still does not clean any of them up. That means ./uninstall.sh cannot actually return the cluster to a clean post-quick-start state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/quick-start/uninstall.sh` around lines 347 - 365, The uninstall
path omits the new OpenBao resources so ./uninstall.sh can't fully clean the
cluster; update the namespace list in the AMP_ONLY branch loop to include
"openbao", and add cleanup steps to remove the Helm release and
ClusterSecretStore: after detecting/deleting the "openbao" namespace ensure you
run a helm uninstall of the "openbao" release in the openbao namespace (or
attempt helm uninstall with --namespace openbao) and delete the
ClusterSecretStore named "default" (kubectl delete clustersecretstore default ||
true); modify the section that checks AMP_ONLY and the namespace loop (the for
ns in ... list and the kubectl/helm delete calls) and reuse the existing
log_info/log_warning behavior for failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/quick-start/install.sh`:
- Around line 788-819: The ClusterSecretStore creation for OpenBao
(serviceAccount external-secrets-openbao and ClusterSecretStore named default)
is a hard prerequisite but currently treated as non-fatal; change the failure
branch so that when the kubectl --context ${CLUSTER_CONTEXT} apply -f - for the
ClusterSecretStore block fails you call a fatal handler (e.g., log_error) and
exit with non-zero (exit 1) to stop the install rather than log_warning and
continue, ensuring subsequent ExternalSecret creation (e.g.,
opensearch-admin-credentials, observer-secret) does not run when
ClusterSecretStore/default failed to apply.

In `@deployments/quick-start/uninstall.sh`:
- Around line 260-269: The uninstall currently only removes the
"openchoreo-workflow-plane" Helm release but does not remove the standalone
"registry" release installed into the same namespace; update the uninstall logic
that handles the openchoreo-workflow-plane release to also check for and
uninstall the "registry" Helm release in the "openchoreo-workflow-plane"
namespace (i.e., mirror the helm status + helm uninstall pattern used for
"openchoreo-workflow-plane" to run a helm status "registry" -n
"openchoreo-workflow-plane" check and helm uninstall "registry" -n
"openchoreo-workflow-plane", and log success/warning using the existing
log_info/log_success/log_warning helpers).
- Around line 220-238: The uninstall script's full-uninstall branch (when
AMP_ONLY is false) currently deletes ClusterDataPlane and ClusterWorkflowPlane
but omits the cluster-scoped ClusterObservabilityPlane, leaving the CR behind;
update the block guarded by AMP_ONLY to also detect and delete
ClusterObservabilityPlane using the same pattern as
ClusterDataPlane/ClusterWorkflowPlane (use kubectl get clusterobservabilityplane
-A &>/dev/null 2>&1 to check, then kubectl delete clusterobservabilityplane
--all --all-namespaces --timeout=30s &>/dev/null || true, and mirror the
corresponding log_info/log_success messages) so full uninstall removes the
observability-plane CR.

---

Outside diff comments:
In `@deployments/quick-start/uninstall.sh`:
- Around line 347-365: The uninstall path omits the new OpenBao resources so
./uninstall.sh can't fully clean the cluster; update the namespace list in the
AMP_ONLY branch loop to include "openbao", and add cleanup steps to remove the
Helm release and ClusterSecretStore: after detecting/deleting the "openbao"
namespace ensure you run a helm uninstall of the "openbao" release in the
openbao namespace (or attempt helm uninstall with --namespace openbao) and
delete the ClusterSecretStore named "default" (kubectl delete clustersecretstore
default || true); modify the section that checks AMP_ONLY and the namespace loop
(the for ns in ... list and the kubectl/helm delete calls) and reuse the
existing log_info/log_warning behavior for failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43b41e6a-6c12-4c2d-abda-17c6a3db67fe

📥 Commits

Reviewing files that changed from the base of the PR and between 34c0f95 and 4c17ac4.

📒 Files selected for processing (6)
  • agent-manager-service/clients/openchoreosvc/client/components.go
  • agent-manager-service/models/infra_resources.go
  • deployments/quick-start/install-helpers.sh
  • deployments/quick-start/install.sh
  • deployments/quick-start/k3d-config.yaml
  • deployments/quick-start/uninstall.sh
✅ Files skipped from review due to trivial changes (1)
  • deployments/quick-start/k3d-config.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • agent-manager-service/clients/openchoreosvc/client/components.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deployments/quick-start/install.sh (1)

923-945: ⚠️ Potential issue | 🟡 Minor

Inconsistent version references in raw GitHub URLs.

Line 923 references release-v0.16 for the Docker Registry values file, while line 945 references v1.0.0-rc.1 for the Workflow Plane values file. This version mismatch could lead to compatibility issues.

Consider updating line 923 to use the same version tag:

♻️ Suggested fix
-        --values https://raw.githubusercontent.com/openchoreo/openchoreo/release-v0.16/install/k3d/single-cluster/values-registry.yaml \
+        --values https://raw.githubusercontent.com/openchoreo/openchoreo/v1.0.0-rc.1/install/k3d/single-cluster/values-registry.yaml \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/quick-start/install.sh` around lines 923 - 945, The two raw
GitHub values URLs use different release tags causing mismatch; update the first
URL (the one pointing to values-registry.yaml used when installing the Docker
Registry) to use the same tag as the later helm_install_idempotent call
(v1.0.0-rc.1) so both URLs reference the same release, ensuring compatibility
between the registry installation block and the helm_install_idempotent
invocation.
♻️ Duplicate comments (2)
agent-manager-service/clients/openchoreosvc/client/components.go (2)

220-228: ⚠️ Potential issue | 🟠 Major

Preserve ClusterWorkflow defaults by omitting empty repository fields.

This still writes appPath and revision.branch even when empty, which overrides schema defaults instead of inheriting them.

🛠️ Proposed fix
  // Add repository details in nested format expected by ClusterWorkflow
  if req.Repository != nil {
-		params["repository"] = map[string]any{
-			"url":     req.Repository.URL,
-			"appPath": req.Repository.AppPath,
-			"revision": map[string]any{
-				"branch": req.Repository.Branch,
-			},
-		}
+		repo := map[string]any{
+			"url": req.Repository.URL,
+		}
+		if req.Repository.Branch != "" {
+			repo["revision"] = map[string]any{
+				"branch": req.Repository.Branch,
+			}
+		}
+		if req.Repository.AppPath != "" {
+			repo["appPath"] = req.Repository.AppPath
+		}
+		params["repository"] = repo
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/components.go` around
lines 220 - 228, The repository map currently sets "appPath" and
"revision.branch" even when empty, which overrides ClusterWorkflow defaults;
update the code that builds params["repository"] (the block using req.Repository
and params["repository"]) to only include "appPath" when req.Repository.AppPath
is non-empty and only include the "revision" map (and its "branch" key) when
req.Repository.Branch is non-empty — build the repository map conditionally (or
add keys conditionally) so empty strings are not written and ClusterWorkflow can
use its defaults.

1380-1387: ⚠️ Potential issue | 🟠 Major

Use sub-second rollout stamps to avoid missed restart triggers.

Using time.RFC3339 can collide within the same second and skip rollout detection when updates are close together.

🛠️ Proposed fix
-	(*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"] = time.Now().Format(time.RFC3339)
+	(*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"] = time.Now().UTC().Format(time.RFC3339Nano)
-	(*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"] = time.Now().Format(time.RFC3339)
+	(*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"] = time.Now().UTC().Format(time.RFC3339Nano)
#!/bin/bash
# Verify restartedAt uses high-precision timestamp format in this file.
rg -n 'restartedAt.*RFC3339' agent-manager-service/clients/openchoreosvc/client/components.go
# Expected after fix: no matches for RFC3339 on restartedAt assignments.

Also applies to: 1557-1561

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/clients/openchoreosvc/client/components.go` around
lines 1380 - 1387, The timestamp used for the rollout trigger should include
sub-second precision to avoid missed restarts; update the assignment that sets
(*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"] to use
time.Now().Format(time.RFC3339Nano) instead of time.RFC3339 (apply the same
change to the other occurrence around the second block mentioned), keeping the
nil-map initialization logic intact so the pointer is set before assignment;
after the change, verify no remaining uses of RFC3339 for restartedAt.
🧹 Nitpick comments (2)
deployments/quick-start/install.sh (1)

1285-1317: Consider using explicit verbs instead of wildcard.

The ClusterRole uses verbs: ["*"] which grants all permissions. While functional, using explicit verbs (get, list, watch, create, update, patch, delete) improves auditability and follows the principle of least privilege.

♻️ Optional refinement for least-privilege RBAC
 rules:
   - apiGroups: ["gateway.api-platform.wso2.com"]
     resources: ["restapis", "apigateways"]
-    verbs: ["*"]
+    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
   - apiGroups: ["gateway.kgateway.dev"]
     resources: ["backends"]
-    verbs: ["*"]
+    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/quick-start/install.sh` around lines 1285 - 1317, Replace the
wildcard permissions in the ClusterRole named wso2-api-platform-gateway-module
by enumerating explicit verbs instead of verbs: ["*"]; for the rules that target
apiGroups "gateway.api-platform.wso2.com" (resources: "restapis","apigateways")
and "gateway.kgateway.dev" (resources: "backends"), change verbs to the minimal
required set (e.g., "get","list","watch","create","update","patch","delete") or
a smaller subset appropriate for the controller's operations, and then reapply
the manifest; ensure you only grant the specific verbs each resource actually
needs to follow least-privilege principles.
deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml (1)

225-256: Internal routes lack request timeout configuration.

The external HTTPRoute (line 222-223) specifies timeouts.request: 30s, but the internal route has no timeout. This may be intentional to allow longer-running internal operations, but unbounded timeouts can lead to resource exhaustion if internal endpoints hang.

Consider whether a timeout (possibly longer than 30s) should be added for consistency and resilience.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml`
around lines 225 - 256, The internal HTTPRoute template (id: httproute-internal
/ kind: HTTPRoute) is missing a request timeout; add a timeouts.request entry
under spec (mirroring the external route's timeouts.request: 30s or a configured
longer value) so internal endpoints cannot hang indefinitely — update the
HTTPRoute spec in the template to include the timeout value (consider exposing a
variable if you want different internal timeout than the external one).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent-manager-service/clients/openchoreosvc/client/components.go`:
- Around line 1745-1753: When constructing endpointURLs, don't drop an endpoint
just because the preferred TLS variant is missing: in the selection logic around
ep.ExternalURLs and config.GetConfig().TLSConfig.EnableTLS, try the preferred
(Https when TLS enabled, Http otherwise) and if that is nil fall back to the
other variant (Http or Https) before skipping the endpoint; then call
buildEndpointURLString. Update buildEndpointURLString to defensively handle nil
or empty EndpointURL fields: treat a nil Scheme as a sensible default ("https"
if TLS enabled or "http" otherwise, or simply "http" if TLS context isn't
available), validate that Host is non-empty (skip/return error or empty string
if missing), include Port only when present, and never emit a leading "://host"
by always prefixing with a non-empty scheme. Reference symbols: ep.ExternalURLs,
endpointURL, endpointURLs map, config.GetConfig().TLSConfig.EnableTLS, and
buildEndpointURLString.

---

Outside diff comments:
In `@deployments/quick-start/install.sh`:
- Around line 923-945: The two raw GitHub values URLs use different release tags
causing mismatch; update the first URL (the one pointing to values-registry.yaml
used when installing the Docker Registry) to use the same tag as the later
helm_install_idempotent call (v1.0.0-rc.1) so both URLs reference the same
release, ensuring compatibility between the registry installation block and the
helm_install_idempotent invocation.

---

Duplicate comments:
In `@agent-manager-service/clients/openchoreosvc/client/components.go`:
- Around line 220-228: The repository map currently sets "appPath" and
"revision.branch" even when empty, which overrides ClusterWorkflow defaults;
update the code that builds params["repository"] (the block using req.Repository
and params["repository"]) to only include "appPath" when req.Repository.AppPath
is non-empty and only include the "revision" map (and its "branch" key) when
req.Repository.Branch is non-empty — build the repository map conditionally (or
add keys conditionally) so empty strings are not written and ClusterWorkflow can
use its defaults.
- Around line 1380-1387: The timestamp used for the rollout trigger should
include sub-second precision to avoid missed restarts; update the assignment
that sets (*releaseBinding.Spec.ComponentTypeEnvironmentConfigs)["restartedAt"]
to use time.Now().Format(time.RFC3339Nano) instead of time.RFC3339 (apply the
same change to the other occurrence around the second block mentioned), keeping
the nil-map initialization logic intact so the pointer is set before assignment;
after the change, verify no remaining uses of RFC3339 for restartedAt.

---

Nitpick comments:
In
`@deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml`:
- Around line 225-256: The internal HTTPRoute template (id: httproute-internal /
kind: HTTPRoute) is missing a request timeout; add a timeouts.request entry
under spec (mirroring the external route's timeouts.request: 30s or a configured
longer value) so internal endpoints cannot hang indefinitely — update the
HTTPRoute spec in the template to include the timeout value (consider exposing a
variable if you want different internal timeout than the external one).

In `@deployments/quick-start/install.sh`:
- Around line 1285-1317: Replace the wildcard permissions in the ClusterRole
named wso2-api-platform-gateway-module by enumerating explicit verbs instead of
verbs: ["*"]; for the rules that target apiGroups
"gateway.api-platform.wso2.com" (resources: "restapis","apigateways") and
"gateway.kgateway.dev" (resources: "backends"), change verbs to the minimal
required set (e.g., "get","list","watch","create","update","patch","delete") or
a smaller subset appropriate for the controller's operations, and then reapply
the manifest; ensure you only grant the specific verbs each resource actually
needs to follow least-privilege principles.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 73c5b8df-cd42-49db-aca8-8d4d1f9ff85e

📥 Commits

Reviewing files that changed from the base of the PR and between 4c17ac4 and 4e9656a.

📒 Files selected for processing (13)
  • agent-manager-service/clients/openchoreosvc/client/components.go
  • agent-manager-service/config/config.go
  • agent-manager-service/config/config_loader.go
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml
  • deployments/quick-start/install.sh
  • deployments/quick-start/uninstall.sh
  • deployments/scripts/setup-openchoreo.sh
  • deployments/single-cluster/values-op.yaml
  • docs/install/managed-cluster.md
  • docs/install/self-hosted-cluster.md
  • traces-observer-service/opensearch/composite_queries_test.go
  • traces-observer-service/opensearch/queries.go
  • traces-observer-service/opensearch/queries_test.go
💤 Files with no reviewable changes (2)
  • agent-manager-service/config/config.go
  • agent-manager-service/config/config_loader.go
✅ Files skipped from review due to trivial changes (1)
  • traces-observer-service/opensearch/composite_queries_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • docs/install/managed-cluster.md
  • deployments/quick-start/uninstall.sh
  • docs/install/self-hosted-cluster.md
  • deployments/single-cluster/values-op.yaml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent-manager-service/services/monitor_executor.go`:
- Around line 188-193: The workflowRunName construction can violate DNS-1123 and
exceed 63 chars; replace the current fmt.Sprintf usage with a sanitization +
truncation routine: lower-case monitor.Name, replace any non [a-z0-9-]
characters with '-', collapse repeated '-' and strip leading/trailing '-' so it
starts/ends with an alphanumeric, then truncate the sanitized monitor name to
maxBaseLen = 63 - 1 - 8 = 54 characters so appending "-" + runID.String()[:8]
yields ≤63 chars; if truncation or sanitization yields an empty base, use a safe
fallback like "run"; finally set CreateWorkflowRunRequest.Name to the
sanitized+truncated base + "-" + runID.String()[:8] (referencing
workflowRunName, monitor.Name, runID.String(), and
CreateWorkflowRunRequest.Name).

In
`@deployments/helm-charts/wso2-amp-evaluation-extension/templates/workflow-templates/workflow-monitor-evaluation.yaml`:
- Line 45: The template uses the variable ${metadata.namespace} but the template
variable documentation block (the comment header above the template) does not
list it; update that documentation block to add a short entry describing
${metadata.namespace} (what it holds — the target namespace for the release) so
the variable reference is documented alongside the other template variables
referenced in this file, e.g., mention ${metadata.namespace} in the same comment
block that lists the other variables.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 418d5061-c430-4f67-a2f1-37f573ed2bca

📥 Commits

Reviewing files that changed from the base of the PR and between 4e9656a and 65105d8.

📒 Files selected for processing (4)
  • agent-manager-service/clients/openchoreosvc/client/client.go
  • agent-manager-service/clients/openchoreosvc/client/generic-workflows.go
  • agent-manager-service/services/monitor_executor.go
  • deployments/helm-charts/wso2-amp-evaluation-extension/templates/workflow-templates/workflow-monitor-evaluation.yaml
✅ Files skipped from review due to trivial changes (1)
  • agent-manager-service/clients/openchoreosvc/client/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • agent-manager-service/clients/openchoreosvc/client/generic-workflows.go

@hanzjk hanzjk merged commit bff3514 into wso2:main Mar 19, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move Agent Manager to OC 1.0-RC1

2 participants