feat: add Claude Code agent for net-edge evals #72

Thealisyed · 2025-12-19T14:22:23Z

feat: Use builtin claude-code agent for net-edge evals
Summary:

Add Claude Code agent configuration for running NetEdge eval scenarios
Include 5 eval definitions matching the Gemini agent structure. 6th eval needs more investigation
Update README

Assisted with Claude Code

Summary by CodeRabbit

New Features
- Support for running evaluations with a Claude Code agent and pattern-based eval files.
- Added several Claude Code evaluation profiles covering common net-edge scenarios.
- Optional skip-permissions flag for agent runs.
Documentation
- README updated with "Running with Claude Code" steps and example evaluation commands; parity added alongside existing agent workflows.
Chores
- Server config accepts an additional server input flag for evaluation runs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-19T14:22:33Z

Walkthrough

Adds a builtin Claude Code agent and five new Eval YAMLs to examples/net-edge, updates README to document running Claude Code, appends an mcp server arg, and exposes/wires a new DangerouslySkipPermissions flag into the agent configuration and runtime.

Changes

Cohort / File(s)	Summary
Documentation `examples/net-edge/README.md`	Switched to `eval_*.yaml` pattern, added "Running with Claude Code" section and example CLI invocation referencing `claude-code-agent` evals.
Claude Code eval configs `examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml`, `examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml`, `examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml`, `examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml`, `examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml`	Added five Eval YAMLs using `config.agent.type: "builtin.claude-code"`, referencing `../mcp-config.yaml`, each pointing to scenario `taskSet`s and asserting `netedge` tool usage with min/max tool call limits.
MCP configuration `examples/net-edge/mcp-config.yaml`	Appended `-s ../gen-mcp/examples/netedge-tools/mcpserver.yaml` to `mcpServers.netedge.args`.
Agent public config `pkg/agent/config.go`	Added optional `DangerouslySkipPermissions *bool` to `AgentCommands` (`dangerouslySkipPermissions` JSON key).
Agent runtime wiring `pkg/agent/runner.go`	Propagated `DangerouslySkipPermissions` into the template rendering data and included it in the rendered run command when set.
Claude Code agent implementation `pkg/agent/claude_code.go`	Added `DangerouslySkipPermissions` field to defaults and updated RunPrompt template to include `-p "{{ .Prompt }}"`, conditional `--dangerously-skip-permissions`, and always append `--output-format stream-json --verbose`.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Built-in Agent Types & Inline Configuration #38: Overlapping additions for a builtin claude-code agent and related agent flag/config changes.

Suggested reviewers

Cali0707

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add Claude Code agent for net-edge evals' accurately and clearly summarizes the main change—adding Claude Code agent configurations for NetEdge evaluation scenarios.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

examples/net-edge/README.md (3)
1-4: Update description to reflect dual-agent support.

The title and description focus on "Codex GPT-5 coding agent" but the document now covers both Codex and Claude Code agents. Consider making this more generic or mentioning both agents.
Proposed update
 # NetEdge Scenario 1 (Service Selector Mismatch)
 
 Evaluate the NetEdge gen-mcp server **Route → Service selector mismatch** scenario with the `gevals`
-framework and the Codex GPT-5 coding agent.
+framework using Codex GPT-5 or Claude Code agents.
26-62: Consider adding Claude Code prerequisites.

The prerequisites section only covers Codex-specific requirements (API key, config.toml, etc.). Consider adding a separate subsection for Claude Code prerequisites, such as:

Claude CLI installation and authentication

Any required environment variables or configuration

85-87: Use agent-agnostic language in shared workflow description.

Line 86 states "The Codex agent must diagnose and repair the mismatch" but this description applies to both Codex and Claude Code workflows since it describes what happens after running either eval.
Proposed update
 `setup.sh` deploys the hello workload, then intentionally breaks the Service selector so the Route loses its
-endpoints. The Codex agent must diagnose and repair the mismatch, after which `verify.sh` confirms the selector
+endpoints. The agent must diagnose and repair the mismatch, after which `verify.sh` confirms the selector
 and endpoints are healthy. Results are written to `gevals-netedge-selector-mismatch-out.json` by default.
examples/net-edge/claude-code-agent/agent.yaml (1)
59-83: Consider potential race condition in cleanup trap.

The cleanup function receives $? as an argument, but the trap is set as trap 'cleanup $?' EXIT. If any command in the cleanup function itself fails (before set -euo pipefail triggers), the original exit status could be lost. The trap - EXIT at line 80 followed by exit "${exit_status}" is the correct pattern, but consider using a subshell or capturing the status earlier.

Additionally, the .kube removal at line 70-72 is good for security, but the debug directory could still contain sensitive data in claude-code-home/.config/gcloud.
🔎 Consider also removing gcloud credentials from debug output
       if [[ -d "${DEBUG_DIR}/claude-code-home/.kube" ]]; then
         rm -rf "${DEBUG_DIR}/claude-code-home/.kube"
       fi
+      if [[ -d "${DEBUG_DIR}/claude-code-home/.config/gcloud" ]]; then
+        rm -rf "${DEBUG_DIR}/claude-code-home/.config/gcloud"
+      fi
gevals-claude-code-netedge-selector-mismatch-out.json (1)

1-6: Consider removing/anonymizing absolute paths and sensitive data from example output.

The taskPath field (line 4) contains an absolute path including a username (/home/alsyed/gevals/...). Additionally, the taskOutput field contains session IDs, cluster URLs, and IP addresses that are specific to a particular test run. If this file is intended as example output for documentation, consider:

Replacing the absolute path with a relative path or placeholder

Sanitizing or replacing real cluster URLs and IPs with example values

gevals-claude-code-netedge-networkpolicy-block-out.json (1)

1-6: Same path/data sanitization concern as other output files.

Contains absolute path with username in taskPath. Consider sanitizing for public documentation.

gevals-claude-code-netedge-nxdomain-host-out.json (1)

1-6: Same path sanitization concern applies.

The taskPath contains an absolute path with username. Consider anonymizing for the example output.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca93d0f and 441a797.

📒 Files selected for processing (13)

examples/net-edge/README.md (2 hunks)
examples/net-edge/claude-code-agent/agent.yaml (1 hunks)
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml (1 hunks)
examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml (1 hunks)
examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml (1 hunks)
examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml (1 hunks)
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml (1 hunks)
examples/net-edge/claude-code-agent/eval_6_referencegrant.yaml (1 hunks)
examples/net-edge/mcp-config.yaml (1 hunks)
gevals-claude-code-netedge-networkpolicy-block-out.json (1 hunks)
gevals-claude-code-netedge-nxdomain-host-out.json (1 hunks)
gevals-claude-code-netedge-selector-mismatch-out.json (1 hunks)
netedge-selector-mismatch-error.txt (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

📚 Learning: 2025-11-18T20:44:43.077Z

Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

Applied to files:

examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
examples/net-edge/README.md
examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
examples/net-edge/claude-code-agent/eval_6_referencegrant.yaml
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml

🔇 Additional comments (17)

examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/claude-code-agent/eval_6_referencegrant.yaml (1)

1-15: Configuration structure looks good.

The eval definition follows the established pattern and is consistent with the other Claude Code eval files in this PR.

examples/net-edge/README.md (2)

14-17: Documentation structure looks good.

The layout section correctly documents the new claude-code-agent directory and eval_*.yaml pattern for both agent types.

63-83: Clear documentation for both agent workflows.

The separation into "Running with Codex" and "Running with Claude Code" sections makes it easy for users to follow the appropriate workflow.

examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml (1)

1-15: All referenced files in eval_5_loadbalancer.yaml exist and are properly configured:

agent.yaml ✓

../mcp-config.yaml ✓

../tasks/loadbalancer-missing/loadbalancer-missing.yaml ✓

examples/net-edge/mcp-config.yaml (1)

8-9: The -s flag for genmcp run is undocumented; verify it's supported in gen-mcp.

The configuration uses an undocumented -s flag that does not appear in the official genmcp CLI documentation or in the README.md example (which only shows -f). Additionally, the referenced files (mcpfile.yaml and mcpserver.yaml) are in an external ../gen-mcp/ directory outside this repository and cannot be verified here.

examples/net-edge/claude-code-agent/agent.yaml (5)

9-15: LGTM on prerequisite validation.

The script uses set -euo pipefail for strict error handling and properly validates the jq dependency before proceeding.

32-48: Kubeconfig handling preserves cluster access correctly.

The logic properly prioritizes an explicit KUBECONFIG environment variable, falling back to copying the original HOME's .kube directory. The 2>/dev/null || true pattern gracefully handles missing files.

50-56: GCP credential preservation for Vertex AI authentication.

The conditional copying of ~/.config/gcloud when GOOGLE_APPLICATION_CREDENTIALS is not set ensures Vertex AI authentication works in the temporary HOME context.

95-100: Model override implementation looks correct.

The optional CLAUDE_MODEL environment variable allows flexibility in model selection without requiring changes to the agent configuration.

102-108: The tee command does not mask the claude exit code; pipefail is already enabled.

Line 11 sets set -euo pipefail, which ensures the pipeline claude "${CLAUDE_ARGS[@]}" 2>&1 | tee -a returns the claude command's exit code, not tee's. Combined with set -e, any failure from claude will trigger the cleanup trap. Exit code handling is correct.

Likely an incorrect or invalid review comment.

gevals-claude-code-netedge-selector-mismatch-out.json (1)

20-149: Call history structure captures MCP tool interactions correctly.

The callHistory section properly records tool calls with timestamps, request/response data, and success status. This provides good traceability for debugging evaluation scenarios.

gevals-claude-code-netedge-networkpolicy-block-out.json (1)

7-19: Assertion results structure is well-defined.

The assertionResults object with toolsUsed, minToolCalls, and maxToolCalls provides clear pass/fail indicators for evaluation criteria.

gevals-claude-code-netedge-nxdomain-host-out.json (1)

84-146: Multi-tool call history demonstrates DNS probing capability.

The call history shows both inspect_route and probe_dns_local tool invocations, validating the agent's ability to use multiple MCP tools for diagnosis. The DNS probe correctly queries public DNS (8.8.8.8) for A records.

netedge-selector-mismatch-error.txt

Cali0707 · 2025-12-22T16:49:13Z

Hey @Thealisyed is there a reason the builtin claude code agent doesn't work for your use case?

Thealisyed · 2025-12-29T10:01:20Z

Hi @Cali0707!
Brett asked me to port this using the same pattern as his Gemini agent PR (#69) for consistency across agents.
Hmm that is a possibility that we could alternatively try and tweak the builtin claude code agent instead as that would be a smaller change.
I'll let @bentito weigh in on which approach he prefers.

bentito · 2026-01-05T13:17:58Z

Hi @Cali0707! Brett asked me to port this using the same pattern as his Gemini agent PR (#69) for consistency across agents. Hmm that is a possibility that we could alternatively try and tweak the builtin claude code agent instead as that would be a smaller change. I'll let @bentito weigh in on which approach he prefers.

I didn't know there was a built in claude code agent!! 😄 @Thealisyed just give us a set of evals that uses the built in Claude agent then, I think

bentito · 2026-01-05T18:57:17Z

/assign

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

pkg/agent/claude_code.go (1)
27-36: Consider adding a timeout for the gcloud command.

The validation logic helpfully checks for GCP credentials needed for Vertex AI, but the gcloud auth application-default print-access-token command could potentially hang or take significant time if gcloud is misconfigured.
🔎 Suggested improvement with timeout
 	// Check for GCP credentials (for Vertex AI users)
 	if os.Getenv("GOOGLE_APPLICATION_CREDENTIALS") == "" {
 		if _, err := exec.LookPath("gcloud"); err == nil {
 			// gcloud exists, check if ADC is configured
 			cmd := exec.Command("gcloud", "auth", "application-default", "print-access-token")
+			cmd.Timeout = 5 * time.Second  // Add reasonable timeout
 			if err := cmd.Run(); err != nil {
 				fmt.Fprintf(os.Stderr, "Warning: No GCP credentials found. If using Vertex AI, run 'gcloud auth application-default login'\n")
 			}
 		}
 	}
Alternatively, use context.WithTimeout for more control:
 	// Check for GCP credentials (for Vertex AI users)
 	if os.Getenv("GOOGLE_APPLICATION_CREDENTIALS") == "" {
 		if _, err := exec.LookPath("gcloud"); err == nil {
 			// gcloud exists, check if ADC is configured
+			ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			defer cancel()
-			cmd := exec.Command("gcloud", "auth", "application-default", "print-access-token")
+			cmd := exec.CommandContext(ctx, "gcloud", "auth", "application-default", "print-access-token")
 			if err := cmd.Run(); err != nil {
 				fmt.Fprintf(os.Stderr, "Warning: No GCP credentials found. If using Vertex AI, run 'gcloud auth application-default login'\n")
 			}
 		}
 	}
gevals-claude-code-netedge-selector-mismatch-out.json (1)

1-151: Consider whether evaluation output files should be committed to the repository.

This file contains detailed execution traces and outputs from running evaluations. A few considerations:

Repository size: These JSON output files are quite large and will accumulate over time

Absolute paths: Line 4 contains an absolute path /home/alsyed/gevals/... with a username, which might not be intended for the repository

Maintenance: Output files may become stale as code evolves

Consider one of these approaches:

Move output files to a separate examples/artifacts directory with a note they're for reference

Add *-out.json to .gitignore and document how to generate them locally

Keep only one example output file per scenario type for documentation purposes

If keeping these files is intentional for documentation/examples, that's fine—just wanted to flag for consideration.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 441a797 and 29da4a5.

📒 Files selected for processing (12)

examples/net-edge/README.md
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
examples/net-edge/mcp-config.yaml
gevals-claude-code-netedge-networkpolicy-block-out.json
gevals-claude-code-netedge-nxdomain-host-out.json
gevals-claude-code-netedge-selector-mismatch-out.json
netedge-selector-mismatch-error.txt
pkg/agent/claude_code.go

🚧 Files skipped from review as they are similar to previous changes (3)

examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
examples/net-edge/mcp-config.yaml
examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

📚 Learning: 2025-11-18T20:44:43.077Z

Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

Applied to files:

examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
examples/net-edge/README.md
examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml

🔇 Additional comments (9)

examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml (1)

1-14: LGTM! Well-structured evaluation configuration.

The eval configuration follows the established pattern for Claude Code agent evaluations. The use of builtin.claude-code aligns with the maintainer's direction from PR comments, and the tool call constraints (1-20) are appropriate for the load balancer missing scenario.

pkg/agent/claude_code.go (2)

5-5: LGTM! Necessary import for environment variable checks.

The os package is correctly imported to support the new GCP credential validation logic.

52-52: Verify the necessity of --dangerously-skip-permissions flag.

The --dangerously-skip-permissions flag bypasses safety checks, which could have security implications. While this may be necessary for automated eval execution in a controlled environment, ensure this is intentional and documented.

The flag name suggests it's meant for specific scenarios where user prompts are not required. Confirm that:

The eval sandbox environment is sufficiently isolated

The evaluated code is trusted or the environment is disposable

This aligns with the security model for automated evaluations

If this is standard practice for eval execution, consider documenting this in the README or eval configuration guide.

examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml (1)

1-14: LGTM! Clean eval configuration.

The eval definition is well-structured and correctly uses the builtin.claude-code agent type as directed in the PR discussion. The assertions and path references are appropriate.

examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml (1)

1-14: LGTM! Consistent eval structure.

This eval definition maintains consistency with the other Claude Code eval configurations in this PR while correctly referencing the reencrypt-tls scenario task.

examples/net-edge/README.md (3)

14-16: LGTM! Clear documentation of the new structure.

The layout section accurately reflects the addition of Claude Code eval configurations and helpfully notes that they use the builtin agent.

62-71: LGTM! Clear separation of Codex instructions.

The section rename and path update appropriately distinguish Codex-specific instructions from the new Claude Code section.

78-80: The documentation is already correct and clearly conveys that GCP authentication is conditional. The code confirms this: the Claude Code agent checks for GOOGLE_APPLICATION_CREDENTIALS and runs gcloud auth application-default print-access-token, but only issues a warning if credentials are missing—it does not fail. The agent continues to execute without GCP credentials, relying on Claude CLI's default authentication methods. The README's "If using Vertex AI" language appropriately indicates this is optional.
netedge-selector-mismatch-error.txt (1)
1-9: Remove this error log file from the repository.

This file contains error output from a local test run (GCP credential failure) and should not be committed. As previously noted, this appears to be an accidental inclusion.

Please remove this file and consider adding a .gitignore pattern for error log files:
*-error.txt
*.log
Likely an incorrect or invalid review comment.

Cali0707 · 2026-01-15T15:23:47Z

@Thealisyed could you resolve the merge conflict here?

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/agent/claude_code.go (1)

48-52: Make --dangerously-skip-permissions opt-in; this flag should not be hardcoded as default.

The --dangerously-skip-permissions flag is hardcoded in RunPrompt (line 52), applying to all users of the claude-code agent by default. Anthropic documents this flag as dangerous and intended only for isolated/headless environments (Docker, CI). Defaulting to permission skipping changes the security posture for all uses of the builtin agent. Expose this as a configurable option or make it evaluation-only, not default for all executions.

The CLI flags -p, --dangerously-skip-permissions, and --output-format stream-json are documented in the Claude Code CLI. Note that --strict-mcp-config and --verbose are not explicitly listed in Anthropic's official documentation—verify these are supported in the targeted Claude Code version.

🤖 Fix all issues with AI agents

In `@pkg/agent/claude_code.go`:
- Around line 3-7: Update the gcloud ADC probe to use a 3-second context
timeout: wrap the probe call in context.WithTimeout(…) with a 3s deadline and
run the process via exec.CommandContext instead of exec.Command so the process
is killed when the timeout elapses; ensure you defer cancel(), capture the
command output and error as before, and handle context.DeadlineExceeded
separately to avoid blocking startup in the probe logic (apply the same change
to the other probe call in the same file referenced around lines 27-35).

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 169175a and 4c939c2.

📒 Files selected for processing (8)

examples/net-edge/README.md
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml
examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
examples/net-edge/mcp-config.yaml
pkg/agent/claude_code.go

✅ Files skipped from review due to trivial changes (1)

examples/net-edge/claude-code-agent/eval_4_reencrypt-tls.yaml

🚧 Files skipped from review as they are similar to previous changes (5)

examples/net-edge/mcp-config.yaml
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml
examples/net-edge/claude-code-agent/eval_2_nxdomain.yaml
examples/net-edge/claude-code-agent/eval_5_loadbalancer.yaml
examples/net-edge/README.md

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

📚 Learning: 2025-11-18T20:44:43.077Z

Learnt from: Cali0707
Repo: genmcp/gevals PR: 39
File: .claude/skills/create-eval/SKILL.md:20-20
Timestamp: 2025-11-18T20:44:43.077Z
Learning: In the .claude/skills/create-eval/SKILL.md file, the eval creation instructions reference documentation files (.md) that explain each component (tasks.md, mcpConfig.md, agent.md, eval.md), not the actual YAML configuration files. The eval.md file contains documentation describing how to create eval.yaml files.

Applied to files:

examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml

🧬 Code graph analysis (1)

pkg/agent/claude_code.go (1)

functional/servers/agent/agent.go (1)

Run (20-83)

🔇 Additional comments (1)

examples/net-edge/claude-code-agent/eval_3_networkpolicy.yaml (1)

1-14: LGTM — eval config is consistent with net-edge task/MCP wiring.
Assertions and tool-call bounds look sensible for this scenario.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

pkg/agent/claude_code.go

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@examples/net-edge/README.md`:
- Around line 87-96: The "Running with Claude Code" section omits the required
authentication setup; add a step before running the eval to ensure the
ANTHROPIC_API_KEY environment variable is set (e.g., "Ensure ANTHROPIC_API_KEY
environment variable is set") so the ./gevals eval
examples/net-edge/claude-code-agent/eval_1_selector-mismatch.yaml command can
authenticate; update the README section heading "Running with Claude Code" to
include this prerequisite.

In `@pkg/agent/claude_code.go`:
- Line 41: The RunPrompt template currently hardcodes the unsafe flag
--dangerously-skip-permissions; change this to be opt-in by introducing a
boolean option (e.g. DangerousSkipPermissions) on the Claude agent/config and
only append the flag to the RunPrompt command when that option is true (e.g.
conditionally include the template token like {{ if .DangerouslySkipPermissions
}}--dangerously-skip-permissions{{ end }}). Ensure the new field defaults to
false and set the corresponding value wherever the template data for RunPrompt
is constructed so existing behavior remains safe unless explicitly enabled.

examples/net-edge/README.md

pkg/agent/claude_code.go

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@pkg/agent/claude_code.go`:
- Around line 32-43: The default for DangerousSkipPermissions is currently set
to true; change the local variable dangerouslySkipPermissions to false so
AgentCommands.DangerouslySkipPermissions points to a false value (i.e.,
`dangerouslySkipPermissions := false`), leaving AgentSpec, AgentMetadata,
RunPrompt and the template conditional unchanged so the prompt only injects
`--dangerously-skip-permissions` when the pointer is true.

pkg/agent/claude_code.go

Summary: - Add Claude Code agent configuration for running NetEdge eval scenarios - Include 5 eval definitions matching the Gemini agent structure. 6th eval needs more investigation - Update README Assisted with Claude Code

Cali0707 · 2026-01-22T13:33:44Z

pkg/agent/claude_code.go

 func (a *ClaudeCodeAgent) GetDefaults(model string) (*AgentSpec, error) {
 	separator := ","
 	useVirtualHome := false
+	dangerouslySkipPermissions := false


@Thealisyed what was the need for this flag?

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

netedge-selector-mismatch-error.txt Outdated Show resolved Hide resolved

Thealisyed force-pushed the main branch from 441a797 to 29da4a5 Compare January 6, 2026 16:06

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

Thealisyed force-pushed the main branch from 29da4a5 to 169175a Compare January 6, 2026 16:14

Thealisyed force-pushed the main branch from a5547eb to 4c939c2 Compare January 16, 2026 15:11

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

pkg/agent/claude_code.go Show resolved Hide resolved

Cali0707 reviewed Jan 16, 2026

View reviewed changes

pkg/agent/claude_code.go Outdated Show resolved Hide resolved

Thealisyed force-pushed the main branch from 4c939c2 to dd3b4b2 Compare January 22, 2026 11:30

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

examples/net-edge/README.md Show resolved Hide resolved

pkg/agent/claude_code.go Outdated Show resolved Hide resolved

Thealisyed force-pushed the main branch from dd3b4b2 to 2774696 Compare January 22, 2026 11:42

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

pkg/agent/claude_code.go Outdated Show resolved Hide resolved

feat: Use builtin claude-code agent for net-edge evals

24cf22e

Summary: - Add Claude Code agent configuration for running NetEdge eval scenarios - Include 5 eval definitions matching the Gemini agent structure. 6th eval needs more investigation - Update README Assisted with Claude Code

Thealisyed force-pushed the main branch from 2774696 to 24cf22e Compare January 22, 2026 11:52

Cali0707 reviewed Jan 22, 2026

View reviewed changes

feat: add Claude Code agent for net-edge evals #72

Are you sure you want to change the base?

feat: add Claude Code agent for net-edge evals #72

Conversation

Thealisyed commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cali0707 commented Dec 22, 2025

Uh oh!

Thealisyed commented Dec 29, 2025

Uh oh!

bentito commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bentito commented Jan 5, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Cali0707 commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cali0707 Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Thealisyed commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

bentito commented Jan 5, 2026 •

edited

Loading