test: add unit tests for repeatingDetector and clearCallArguments#180
test: add unit tests for repeatingDetector and clearCallArguments#180mason5052 wants to merge 2 commits intovxcontrol:masterfrom
Conversation
Add comprehensive test coverage for the repeating tool call detection logic that guards against infinite agent chain loops (related to vxcontrol#175). TestRepeatingDetector (9 cases): - nil function call, first/second/third identical calls - threshold triggering at RepeatingToolCallThreshold (3) - funcCalls reset on different call - escalation threshold validation (6 vs 7 consecutive calls) - argument normalization (message field stripping, key ordering) TestRepeatingDetectorEscalationThreshold: - Validates escalation math: abort at len >= threshold + 4 = 7 TestClearCallArguments (3 cases): - message field stripping, key sorting, invalid JSON passthrough Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds unit test coverage for the repeating tool-call detection and argument-normalization logic in backend/pkg/providers/helpers.go, which is used as safety logic to identify repeated tool calls.
Changes:
- Add table-driven tests for
repeatingDetector.detectcovering threshold behavior, reset behavior, and argument normalization effects. - Add tests for
repeatingDetector.clearCallArgumentscovering message stripping, key ordering, and invalid JSON passthrough. - Introduce a small test helper (
makeToolCall) to buildllms.ToolCallinputs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func TestRepeatingDetectorEscalationThreshold(t *testing.T) { | ||
| // This test validates the escalation math used in performer.go: | ||
| // len(detector.funcCalls) >= RepeatingToolCallThreshold + maxSoftDetectionsBeforeAbort | ||
| // With threshold=3 and maxSoftDetections=4, abort triggers at len >= 7 | ||
|
|
||
| detector := &repeatingDetector{} | ||
| tc := makeToolCall("search", `{"query":"test"}`) | ||
|
|
||
| for i := 0; i < 7; i++ { | ||
| detector.detect(tc) | ||
| } | ||
|
|
||
| assert.Equal(t, 7, len(detector.funcCalls)) | ||
| assert.True(t, len(detector.funcCalls) >= RepeatingToolCallThreshold+4, | ||
| "7 calls should reach escalation threshold: %d >= %d+%d", | ||
| len(detector.funcCalls), RepeatingToolCallThreshold, 4) | ||
|
|
||
| // Verify 6 calls is below threshold | ||
| detector2 := &repeatingDetector{} | ||
| for i := 0; i < 6; i++ { | ||
| detector2.detect(tc) | ||
| } | ||
|
|
||
| assert.Equal(t, 6, len(detector2.funcCalls)) | ||
| assert.False(t, len(detector2.funcCalls) >= RepeatingToolCallThreshold+4, | ||
| "6 calls should NOT reach escalation threshold: %d < %d+%d", | ||
| len(detector2.funcCalls), RepeatingToolCallThreshold, 4) |
There was a problem hiding this comment.
TestRepeatingDetectorEscalationThreshold claims to validate escalation logic “used in performer.go”, but the current execToolCall implementation has no escalation/abort threshold (it only checks detector.detect(toolCall) and returns a soft message). As written, the test only asserts len(detector.funcCalls) >= RepeatingToolCallThreshold+4 using a hard-coded 4, so it doesn’t actually validate any production behavior and can drift from the real implementation if/when escalation is added. Consider either removing this test, or rewriting it to assert the real behavior (e.g., calling the performer/exec logic and expecting an abort error after N soft detections), and referencing a named constant instead of +4.
| name: "six identical calls still below escalation threshold", | ||
| calls: func() []llms.ToolCall { | ||
| tc := makeToolCall("search", `{"query":"test"}`) | ||
| return []llms.ToolCall{tc, tc, tc, tc, tc, tc} | ||
| }(), | ||
| expectedDetected: []bool{false, false, true, true, true, true}, | ||
| expectedLen: 6, | ||
| }, | ||
| { | ||
| name: "seven identical calls reaches escalation threshold", | ||
| calls: func() []llms.ToolCall { | ||
| tc := makeToolCall("search", `{"query":"test"}`) | ||
| return []llms.ToolCall{tc, tc, tc, tc, tc, tc, tc} | ||
| }(), | ||
| expectedDetected: []bool{false, false, true, true, true, true, true}, | ||
| expectedLen: 7, | ||
| }, |
There was a problem hiding this comment.
Several table test case names refer to an “escalation threshold” (e.g., the 6/7 call cases), but repeatingDetector.detect only implements the repeating threshold (RepeatingToolCallThreshold) and has no escalation concept. Renaming these cases to describe the behavior under test (continued detection after the threshold is reached) would avoid implying functionality that isn’t present in repeatingDetector.
- Replace hardcoded +4 with testMaxSoftDetectionsBeforeAbort constant with sync comment pointing to performer.go - Add test case for same function name with different non-message args resetting funcCalls (covers the other reset condition in detect()) Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Description of the Change
Problem
The
repeatingDetectorandclearCallArgumentsfunctions inhelpers.gohad no unit test coverage despite being critical safety logic for preventing infinite agent chain loops (#175, #178).Solution
Add 12 test cases covering the full behavior of repeating tool call detection:
TestRepeatingDetector (9 cases): nil function call handling, threshold triggering at
RepeatingToolCallThreshold=3, funcCalls accumulation and reset on different calls, escalation threshold validation (6 vs 7 consecutive calls), argument normalization (message field stripping, JSON key ordering)TestRepeatingDetectorEscalationThreshold: Validates the escalation math used in
performer.go-- abort triggers whenlen(funcCalls) >= RepeatingToolCallThreshold + 4 = 7, confirming 4 soft warnings before abort on the 7th consecutive identical callTestClearCallArguments (3 cases): message field stripping, alphabetical key sorting, invalid JSON passthrough
All tests follow the existing table-driven pattern in
helpers_test.gousingtestify/assert.Related to #175
Adds test coverage for #178
Type of Change
Areas Affected
Testing and Verification
Test Configuration
Test Steps
go test ./pkg/providers/ -run "TestRepeatingDetector|TestClearCallArguments" -v-- all 12 tests passgo test ./pkg/providers/ -v-- all existing tests continue to pass (no regressions)go vet ./pkg/providers/-- no warningsSecurity Considerations
No security impact. Test-only change.
Checklist
go fmtandgo vet