Skip to content

test: add unit tests for repeatingDetector and clearCallArguments#180

Open
mason5052 wants to merge 2 commits intovxcontrol:masterfrom
mason5052:test/repeating-detector-and-iteration-cap
Open

test: add unit tests for repeatingDetector and clearCallArguments#180
mason5052 wants to merge 2 commits intovxcontrol:masterfrom
mason5052:test/repeating-detector-and-iteration-cap

Conversation

@mason5052
Copy link
Contributor

Description of the Change

Problem

The repeatingDetector and clearCallArguments functions in helpers.go had no unit test coverage despite being critical safety logic for preventing infinite agent chain loops (#175, #178).

Solution

Add 12 test cases covering the full behavior of repeating tool call detection:

  • TestRepeatingDetector (9 cases): nil function call handling, threshold triggering at RepeatingToolCallThreshold=3, funcCalls accumulation and reset on different calls, escalation threshold validation (6 vs 7 consecutive calls), argument normalization (message field stripping, JSON key ordering)

  • TestRepeatingDetectorEscalationThreshold: Validates the escalation math used in performer.go -- abort triggers when len(funcCalls) >= RepeatingToolCallThreshold + 4 = 7, confirming 4 soft warnings before abort on the 7th consecutive identical call

  • TestClearCallArguments (3 cases): message field stripping, alphabetical key sorting, invalid JSON passthrough

All tests follow the existing table-driven pattern in helpers_test.go using testify/assert.

Related to #175
Adds test coverage for #178

Type of Change

  • Tests (adding or updating tests)

Areas Affected

  • Core Services (Backend API)

Testing and Verification

Test Configuration

  • PentAGI Version: master
  • Go Version: 1.24

Test Steps

  1. go test ./pkg/providers/ -run "TestRepeatingDetector|TestClearCallArguments" -v -- all 12 tests pass
  2. go test ./pkg/providers/ -v -- all existing tests continue to pass (no regressions)
  3. go vet ./pkg/providers/ -- no warnings

Security Considerations

No security impact. Test-only change.

Checklist

  • My code follows the project's coding standards
  • All new and existing tests pass
  • I have run go fmt and go vet
  • Security implications considered
  • Changes are backward compatible

Add comprehensive test coverage for the repeating tool call detection
logic that guards against infinite agent chain loops (related to vxcontrol#175).

TestRepeatingDetector (9 cases):
- nil function call, first/second/third identical calls
- threshold triggering at RepeatingToolCallThreshold (3)
- funcCalls reset on different call
- escalation threshold validation (6 vs 7 consecutive calls)
- argument normalization (message field stripping, key ordering)

TestRepeatingDetectorEscalationThreshold:
- Validates escalation math: abort at len >= threshold + 4 = 7

TestClearCallArguments (3 cases):
- message field stripping, key sorting, invalid JSON passthrough

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Copilot AI review requested due to automatic review settings March 6, 2026 02:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds unit test coverage for the repeating tool-call detection and argument-normalization logic in backend/pkg/providers/helpers.go, which is used as safety logic to identify repeated tool calls.

Changes:

  • Add table-driven tests for repeatingDetector.detect covering threshold behavior, reset behavior, and argument normalization effects.
  • Add tests for repeatingDetector.clearCallArguments covering message stripping, key ordering, and invalid JSON passthrough.
  • Introduce a small test helper (makeToolCall) to build llms.ToolCall inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +939 to +965
func TestRepeatingDetectorEscalationThreshold(t *testing.T) {
// This test validates the escalation math used in performer.go:
// len(detector.funcCalls) >= RepeatingToolCallThreshold + maxSoftDetectionsBeforeAbort
// With threshold=3 and maxSoftDetections=4, abort triggers at len >= 7

detector := &repeatingDetector{}
tc := makeToolCall("search", `{"query":"test"}`)

for i := 0; i < 7; i++ {
detector.detect(tc)
}

assert.Equal(t, 7, len(detector.funcCalls))
assert.True(t, len(detector.funcCalls) >= RepeatingToolCallThreshold+4,
"7 calls should reach escalation threshold: %d >= %d+%d",
len(detector.funcCalls), RepeatingToolCallThreshold, 4)

// Verify 6 calls is below threshold
detector2 := &repeatingDetector{}
for i := 0; i < 6; i++ {
detector2.detect(tc)
}

assert.Equal(t, 6, len(detector2.funcCalls))
assert.False(t, len(detector2.funcCalls) >= RepeatingToolCallThreshold+4,
"6 calls should NOT reach escalation threshold: %d < %d+%d",
len(detector2.funcCalls), RepeatingToolCallThreshold, 4)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestRepeatingDetectorEscalationThreshold claims to validate escalation logic “used in performer.go”, but the current execToolCall implementation has no escalation/abort threshold (it only checks detector.detect(toolCall) and returns a soft message). As written, the test only asserts len(detector.funcCalls) >= RepeatingToolCallThreshold+4 using a hard-coded 4, so it doesn’t actually validate any production behavior and can drift from the real implementation if/when escalation is added. Consider either removing this test, or rewriting it to assert the real behavior (e.g., calling the performer/exec logic and expecting an abort error after N soft detections), and referencing a named constant instead of +4.

Copilot uses AI. Check for mistakes.
Comment on lines +884 to +900
name: "six identical calls still below escalation threshold",
calls: func() []llms.ToolCall {
tc := makeToolCall("search", `{"query":"test"}`)
return []llms.ToolCall{tc, tc, tc, tc, tc, tc}
}(),
expectedDetected: []bool{false, false, true, true, true, true},
expectedLen: 6,
},
{
name: "seven identical calls reaches escalation threshold",
calls: func() []llms.ToolCall {
tc := makeToolCall("search", `{"query":"test"}`)
return []llms.ToolCall{tc, tc, tc, tc, tc, tc, tc}
}(),
expectedDetected: []bool{false, false, true, true, true, true, true},
expectedLen: 7,
},
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several table test case names refer to an “escalation threshold” (e.g., the 6/7 call cases), but repeatingDetector.detect only implements the repeating threshold (RepeatingToolCallThreshold) and has no escalation concept. Renaming these cases to describe the behavior under test (continued detection after the threshold is reached) would avoid implying functionality that isn’t present in repeatingDetector.

Copilot uses AI. Check for mistakes.
- Replace hardcoded +4 with testMaxSoftDetectionsBeforeAbort constant
  with sync comment pointing to performer.go
- Add test case for same function name with different non-message args
  resetting funcCalls (covers the other reset condition in detect())

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants