Skip to content

Conversation

@Thealisyed
Copy link

@Thealisyed Thealisyed commented Dec 19, 2025

feat: Use builtin claude-code agent for net-edge evals
Summary:

  • Add Claude Code agent configuration for running NetEdge eval scenarios
  • Include 5 eval definitions matching the Gemini agent structure. 6th eval needs more investigation
  • Update README

Assisted with Claude Code

Summary by CodeRabbit

  • New Features

    • Support for running evaluations with a Claude Code agent and pattern-based eval files.
    • Added several Claude Code evaluation profiles covering common net-edge scenarios.
    • Optional skip-permissions flag for agent runs.
  • Documentation

    • README updated with "Running with Claude Code" steps and example evaluation commands; parity added alongside existing agent workflows.
  • Chores

    • Server config accepts an additional server input flag for evaluation runs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Walkthrough

Adds a builtin Claude Code agent and five new Eval YAMLs to examples/net-edge, updates README to document running Claude Code, appends an mcp server arg, and exposes/wires a new DangerouslySkipPermissions flag into the agent configuration and runtime.

Changes

Cohort / File(s) Summary
Documentation
examples/net-edge/README.md
Switched to eval_*.yaml pattern, added "Running with Claude Code" section and example CLI invocation referencing claude-code-agent evals.
Claude Code eval configs
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml, examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml, examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml, examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml, examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
Added five Eval YAMLs using config.agent.type: "builtin.claude-code", referencing ../mcp-config.yaml, each pointing to scenario taskSets and asserting netedge tool usage with min/max tool call limits.
MCP configuration
examples/net-edge/mcp-config.yaml
Appended -s ../gen-mcp/examples/netedge-tools/mcpserver.yaml to mcpServers.netedge.args.
Agent public config
pkg/agent/config.go
Added optional DangerouslySkipPermissions *bool to AgentCommands (dangerouslySkipPermissions JSON key).
Agent runtime wiring
pkg/agent/runner.go
Propagated DangerouslySkipPermissions into the template rendering data and included it in the rendered run command when set.
Claude Code agent implementation
pkg/agent/claude_code.go
Added DangerouslySkipPermissions field to defaults and updated RunPrompt template to include -p "{{ .Prompt }}", conditional --dangerously-skip-permissions, and always append --output-format stream-json --verbose.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • Cali0707
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add Claude Code agent for net-edge evals' accurately and clearly summarizes the main change—adding Claude Code agent configurations for NetEdge evaluation scenarios.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
examples/net-edge/README.md (3)

1-4: Update description to reflect dual-agent support.

The title and description focus on "Codex GPT-5 coding agent" but the document now covers both Codex and Claude Code agents. Consider making this more generic or mentioning both agents.

Proposed update
 # NetEdge Scenario 1 (Service Selector Mismatch)
 
 Evaluate the NetEdge gen-mcp server **Route → Service selector mismatch** scenario with the `gevals`
-framework and the Codex GPT-5 coding agent.
+framework using Codex GPT-5 or Claude Code agents.

26-62: Consider adding Claude Code prerequisites.

The prerequisites section only covers Codex-specific requirements (API key, config.toml, etc.). Consider adding a separate subsection for Claude Code prerequisites, such as:

  • Claude CLI installation and authentication
  • Any required environment variables or configuration

85-87: Use agent-agnostic language in shared workflow description.

Line 86 states "The Codex agent must diagnose and repair the mismatch" but this description applies to both Codex and Claude Code workflows since it describes what happens after running either eval.

Proposed update
 `setup.sh` deploys the hello workload, then intentionally breaks the Service selector so the Route loses its
-endpoints. The Codex agent must diagnose and repair the mismatch, after which `verify.sh` confirms the selector
+endpoints. The agent must diagnose and repair the mismatch, after which `verify.sh` confirms the selector
 and endpoints are healthy. Results are written to `gevals-netedge-selector-mismatch-out.json` by default.
examples/net-edge/claude-code-agent/agent.yaml (1)

59-83: Consider potential race condition in cleanup trap.

The cleanup function receives $? as an argument, but the trap is set as trap 'cleanup $?' EXIT. If any command in the cleanup function itself fails (before set -euo pipefail triggers), the original exit status could be lost. The trap - EXIT at line 80 followed by exit "${exit_status}" is the correct pattern, but consider using a subshell or capturing the status earlier.

Additionally, the .kube removal at line 70-72 is good for security, but the debug directory could still contain sensitive data in claude-code-home/.config/gcloud.

🔎 Consider also removing gcloud credentials from debug output
       if [[ -d "${DEBUG_DIR}/claude-code-home/.kube" ]]; then
         rm -rf "${DEBUG_DIR}/claude-code-home/.kube"
       fi
+      if [[ -d "${DEBUG_DIR}/claude-code-home/.config/gcloud" ]]; then
+        rm -rf "${DEBUG_DIR}/claude-code-home/.config/gcloud"
+      fi
gevals-claude-code-netedge-selector-mismatch-out.json (1)

1-6: Consider removing/anonymizing absolute paths and sensitive data from example output.

The taskPath field (line 4) contains an absolute path including a username (/home/alsyed/gevals/...). Additionally, the taskOutput field contains session IDs, cluster URLs, and IP addresses that are specific to a particular test run. If this file is intended as example output for documentation, consider:

  1. Replacing the absolute path with a relative path or placeholder
  2. Sanitizing or replacing real cluster URLs and IPs with example values
gevals-claude-code-netedge-networkpolicy-block-out.json (1)

1-6: Same path/data sanitization concern as other output files.

Contains absolute path with username in taskPath. Consider sanitizing for public documentation.

gevals-claude-code-netedge-nxdomain-host-out.json (1)

1-6: Same path sanitization concern applies.

The taskPath contains an absolute path with username. Consider anonymizing for the example output.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca93d0f and 441a797.

📒 Files selected for processing (13)
  • examples/net-edge/README.md (2 hunks)
  • examples/net-edge/claude-code-agent/agent.yaml (1 hunks)
  • examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml (1 hunks)
  • examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml (1 hunks)
  • examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml (1 hunks)
  • examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml (1 hunks)
  • examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml (1 hunks)
  • examples/net-edge/claude-code-agent/eval_6_referencegrant.yaml (1 hunks)
  • examples/net-edge/mcp-config.yaml (1 hunks)
  • gevals-claude-code-netedge-networkpolicy-block-out.json (1 hunks)
  • gevals-claude-code-netedge-nxdomain-host-out.json (1 hunks)
  • gevals-claude-code-netedge-selector-mismatch-out.json (1 hunks)
  • netedge-selector-mismatch-error.txt (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.
📚 Learning: 2025-11-18T20:44:43.077Z
Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

Applied to files:

  • examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
  • examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
  • examples/net-edge/README.md
  • examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
  • examples/net-edge/claude-code-agent/eval_6_referencegrant.yaml
  • examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
  • examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
🔇 Additional comments (17)
examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_6_referencegrant.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/README.md (2)

14-17: Documentation structure looks good.

The layout section correctly documents the new claude-code-agent directory and eval_*.yaml pattern for both agent types.


63-83: Clear documentation for both agent workflows.

The separation into "Running with Codex" and "Running with Claude Code" sections makes it easy for users to follow the appropriate workflow.

examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml (1)

1-15: All referenced files in eval_5_loadbalancer.yaml exist and are properly configured:

  • agent.yaml
  • ../mcp-config.yaml
  • ../tasks/loadbalancer-missing/loadbalancer-missing.yaml
examples/net-edge/mcp-config.yaml (1)

8-9: The -s flag for genmcp run is undocumented; verify it's supported in gen-mcp.

The configuration uses an undocumented -s flag that does not appear in the official genmcp CLI documentation or in the README.md example (which only shows -f). Additionally, the referenced files (mcpfile.yaml and mcpserver.yaml) are in an external ../gen-mcp/ directory outside this repository and cannot be verified here.

examples/net-edge/claude-code-agent/agent.yaml (5)

9-15: LGTM on prerequisite validation.

The script uses set -euo pipefail for strict error handling and properly validates the jq dependency before proceeding.


32-48: Kubeconfig handling preserves cluster access correctly.

The logic properly prioritizes an explicit KUBECONFIG environment variable, falling back to copying the original HOME's .kube directory. The 2>/dev/null || true pattern gracefully handles missing files.


50-56: GCP credential preservation for Vertex AI authentication.

The conditional copying of ~/.config/gcloud when GOOGLE_APPLICATION_CREDENTIALS is not set ensures Vertex AI authentication works in the temporary HOME context.


95-100: Model override implementation looks correct.

The optional CLAUDE_MODEL environment variable allows flexibility in model selection without requiring changes to the agent configuration.


102-108: The tee command does not mask the claude exit code; pipefail is already enabled.

Line 11 sets set -euo pipefail, which ensures the pipeline claude "${CLAUDE_ARGS[@]}" 2>&1 | tee -a returns the claude command's exit code, not tee's. Combined with set -e, any failure from claude will trigger the cleanup trap. Exit code handling is correct.

Likely an incorrect or invalid review comment.

gevals-claude-code-netedge-selector-mismatch-out.json (1)

20-149: Call history structure captures MCP tool interactions correctly.

The callHistory section properly records tool calls with timestamps, request/response data, and success status. This provides good traceability for debugging evaluation scenarios.

gevals-claude-code-netedge-networkpolicy-block-out.json (1)

7-19: Assertion results structure is well-defined.

The assertionResults object with toolsUsed, minToolCalls, and maxToolCalls provides clear pass/fail indicators for evaluation criteria.

gevals-claude-code-netedge-nxdomain-host-out.json (1)

84-146: Multi-tool call history demonstrates DNS probing capability.

The call history shows both inspect_route and probe_dns_local tool invocations, validating the agent's ability to use multiple MCP tools for diagnosis. The DNS probe correctly queries public DNS (8.8.8.8) for A records.

@Cali0707
Copy link
Contributor

Hey @Thealisyed is there a reason the builtin claude code agent doesn't work for your use case?

@Thealisyed
Copy link
Author

Hi @Cali0707!
Brett asked me to port this using the same pattern as his Gemini agent PR (#69) for consistency across agents.
Hmm that is a possibility that we could alternatively try and tweak the builtin claude code agent instead as that would be a smaller change.
I'll let @bentito weigh in on which approach he prefers.

@bentito
Copy link
Contributor

bentito commented Jan 5, 2026

Hi @Cali0707! Brett asked me to port this using the same pattern as his Gemini agent PR (#69) for consistency across agents. Hmm that is a possibility that we could alternatively try and tweak the builtin claude code agent instead as that would be a smaller change. I'll let @bentito weigh in on which approach he prefers.

I didn't know there was a built in claude code agent!! 😄 @Thealisyed just give us a set of evals that uses the built in Claude agent then, I think

@bentito
Copy link
Contributor

bentito commented Jan 5, 2026

/assign

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/agent/claude_code.go (1)

27-36: Consider adding a timeout for the gcloud command.

The validation logic helpfully checks for GCP credentials needed for Vertex AI, but the gcloud auth application-default print-access-token command could potentially hang or take significant time if gcloud is misconfigured.

🔎 Suggested improvement with timeout
 	// Check for GCP credentials (for Vertex AI users)
 	if os.Getenv("GOOGLE_APPLICATION_CREDENTIALS") == "" {
 		if _, err := exec.LookPath("gcloud"); err == nil {
 			// gcloud exists, check if ADC is configured
 			cmd := exec.Command("gcloud", "auth", "application-default", "print-access-token")
+			cmd.Timeout = 5 * time.Second  // Add reasonable timeout
 			if err := cmd.Run(); err != nil {
 				fmt.Fprintf(os.Stderr, "Warning: No GCP credentials found. If using Vertex AI, run 'gcloud auth application-default login'\n")
 			}
 		}
 	}

Alternatively, use context.WithTimeout for more control:

 	// Check for GCP credentials (for Vertex AI users)
 	if os.Getenv("GOOGLE_APPLICATION_CREDENTIALS") == "" {
 		if _, err := exec.LookPath("gcloud"); err == nil {
 			// gcloud exists, check if ADC is configured
+			ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			defer cancel()
-			cmd := exec.Command("gcloud", "auth", "application-default", "print-access-token")
+			cmd := exec.CommandContext(ctx, "gcloud", "auth", "application-default", "print-access-token")
 			if err := cmd.Run(); err != nil {
 				fmt.Fprintf(os.Stderr, "Warning: No GCP credentials found. If using Vertex AI, run 'gcloud auth application-default login'\n")
 			}
 		}
 	}
gevals-claude-code-netedge-selector-mismatch-out.json (1)

1-151: Consider whether evaluation output files should be committed to the repository.

This file contains detailed execution traces and outputs from running evaluations. A few considerations:

  1. Repository size: These JSON output files are quite large and will accumulate over time
  2. Absolute paths: Line 4 contains an absolute path /home/alsyed/gevals/... with a username, which might not be intended for the repository
  3. Maintenance: Output files may become stale as code evolves

Consider one of these approaches:

  • Move output files to a separate examples/artifacts directory with a note they're for reference
  • Add *-out.json to .gitignore and document how to generate them locally
  • Keep only one example output file per scenario type for documentation purposes

If keeping these files is intentional for documentation/examples, that's fine—just wanted to flag for consideration.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 441a797 and 29da4a5.

📒 Files selected for processing (12)
  • examples/net-edge/README.md
  • examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
  • examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
  • examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
  • examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
  • examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
  • examples/net-edge/mcp-config.yaml
  • gevals-claude-code-netedge-networkpolicy-block-out.json
  • gevals-claude-code-netedge-nxdomain-host-out.json
  • gevals-claude-code-netedge-selector-mismatch-out.json
  • netedge-selector-mismatch-error.txt
  • pkg/agent/claude_code.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
  • examples/net-edge/mcp-config.yaml
  • examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.
📚 Learning: 2025-11-18T20:44:43.077Z
Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

Applied to files:

  • examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
  • examples/net-edge/README.md
  • examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
  • examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
🔇 Additional comments (9)
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml (1)

1-14: LGTM! Well-structured evaluation configuration.

The eval configuration follows the established pattern for Claude Code agent evaluations. The use of builtin.claude-code aligns with the maintainer's direction from PR comments, and the tool call constraints (1-20) are appropriate for the load balancer missing scenario.

pkg/agent/claude_code.go (2)

5-5: LGTM! Necessary import for environment variable checks.

The os package is correctly imported to support the new GCP credential validation logic.


52-52: Verify the necessity of --dangerously-skip-permissions flag.

The --dangerously-skip-permissions flag bypasses safety checks, which could have security implications. While this may be necessary for automated eval execution in a controlled environment, ensure this is intentional and documented.

The flag name suggests it's meant for specific scenarios where user prompts are not required. Confirm that:

  1. The eval sandbox environment is sufficiently isolated
  2. The evaluated code is trusted or the environment is disposable
  3. This aligns with the security model for automated evaluations

If this is standard practice for eval execution, consider documenting this in the README or eval configuration guide.

examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml (1)

1-14: LGTM! Clean eval configuration.

The eval definition is well-structured and correctly uses the builtin.claude-code agent type as directed in the PR discussion. The assertions and path references are appropriate.

examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml (1)

1-14: LGTM! Consistent eval structure.

This eval definition maintains consistency with the other Claude Code eval configurations in this PR while correctly referencing the reencrypt-tls scenario task.

examples/net-edge/README.md (3)

14-16: LGTM! Clear documentation of the new structure.

The layout section accurately reflects the addition of Claude Code eval configurations and helpfully notes that they use the builtin agent.


62-71: LGTM! Clear separation of Codex instructions.

The section rename and path update appropriately distinguish Codex-specific instructions from the new Claude Code section.


78-80: The documentation is already correct and clearly conveys that GCP authentication is conditional. The code confirms this: the Claude Code agent checks for GOOGLE_APPLICATION_CREDENTIALS and runs gcloud auth application-default print-access-token, but only issues a warning if credentials are missing—it does not fail. The agent continues to execute without GCP credentials, relying on Claude CLI's default authentication methods. The README's "If using Vertex AI" language appropriately indicates this is optional.

netedge-selector-mismatch-error.txt (1)

1-9: Remove this error log file from the repository.

This file contains error output from a local test run (GCP credential failure) and should not be committed. As previously noted, this appears to be an accidental inclusion.

Please remove this file and consider adding a .gitignore pattern for error log files:

*-error.txt
*.log

Likely an incorrect or invalid review comment.

@Cali0707
Copy link
Contributor

@Thealisyed could you resolve the merge conflict here?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/agent/claude_code.go (1)

48-52: Make --dangerously-skip-permissions opt-in; this flag should not be hardcoded as default.

The --dangerously-skip-permissions flag is hardcoded in RunPrompt (line 52), applying to all users of the claude-code agent by default. Anthropic documents this flag as dangerous and intended only for isolated/headless environments (Docker, CI). Defaulting to permission skipping changes the security posture for all uses of the builtin agent. Expose this as a configurable option or make it evaluation-only, not default for all executions.

The CLI flags -p, --dangerously-skip-permissions, and --output-format stream-json are documented in the Claude Code CLI. Note that --strict-mcp-config and --verbose are not explicitly listed in Anthropic's official documentation—verify these are supported in the targeted Claude Code version.

🤖 Fix all issues with AI agents
In `@pkg/agent/claude_code.go`:
- Around line 3-7: Update the gcloud ADC probe to use a 3-second context
timeout: wrap the probe call in context.WithTimeout(…) with a 3s deadline and
run the process via exec.CommandContext instead of exec.Command so the process
is killed when the timeout elapses; ensure you defer cancel(), capture the
command output and error as before, and handle context.DeadlineExceeded
separately to avoid blocking startup in the probe logic (apply the same change
to the other probe call in the same file referenced around lines 27-35).
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 169175a and 4c939c2.

📒 Files selected for processing (8)
  • examples/net-edge/README.md
  • examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
  • examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
  • examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
  • examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
  • examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
  • examples/net-edge/mcp-config.yaml
  • pkg/agent/claude_code.go
✅ Files skipped from review due to trivial changes (1)
  • examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • examples/net-edge/mcp-config.yaml
  • examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
  • examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
  • examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
  • examples/net-edge/README.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.
📚 Learning: 2025-11-18T20:44:43.077Z
Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

Applied to files:

  • examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
🧬 Code graph analysis (1)
pkg/agent/claude_code.go (1)
functional/servers/agent/agent.go (1)
  • Run (20-83)
🔇 Additional comments (1)
examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml (1)

1-14: LGTM — eval config is consistent with net-edge task/MCP wiring.
Assertions and tool-call bounds look sensible for this scenario.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@examples/net-edge/README.md`:
- Around line 87-96: The "Running with Claude Code" section omits the required
authentication setup; add a step before running the eval to ensure the
ANTHROPIC_API_KEY environment variable is set (e.g., "Ensure ANTHROPIC_API_KEY
environment variable is set") so the ./gevals eval
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml command can
authenticate; update the README section heading "Running with Claude Code" to
include this prerequisite.

In `@pkg/agent/claude_code.go`:
- Line 41: The RunPrompt template currently hardcodes the unsafe flag
--dangerously-skip-permissions; change this to be opt-in by introducing a
boolean option (e.g. DangerousSkipPermissions) on the Claude agent/config and
only append the flag to the RunPrompt command when that option is true (e.g.
conditionally include the template token like {{ if .DangerouslySkipPermissions
}}--dangerously-skip-permissions{{ end }}). Ensure the new field defaults to
false and set the corresponding value wherever the template data for RunPrompt
is constructed so existing behavior remains safe unless explicitly enabled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/agent/claude_code.go`:
- Around line 32-43: The default for DangerousSkipPermissions is currently set
to true; change the local variable dangerouslySkipPermissions to false so
AgentCommands.DangerouslySkipPermissions points to a false value (i.e.,
`dangerouslySkipPermissions := false`), leaving AgentSpec, AgentMetadata,
RunPrompt and the template conditional unchanged so the prompt only injects
`--dangerously-skip-permissions` when the pointer is true.

Summary:

  - Add Claude Code agent configuration for running NetEdge eval scenarios
  - Include 5 eval definitions matching the Gemini agent structure. 6th eval needs more investigation
  - Update README

 Assisted with Claude Code
func (a *ClaudeCodeAgent) GetDefaults(model string) (*AgentSpec, error) {
separator := ","
useVirtualHome := false
dangerouslySkipPermissions := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thealisyed what was the need for this flag?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants