Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 74 additions & 25 deletions npm/src/agent/ProbeAgent.js
Original file line number Diff line number Diff line change
Expand Up @@ -3582,58 +3582,107 @@
// Continue even if storage fails
}

// Completion prompt handling - run a follow-up prompt after attempt_completion for validation/review
// This runs BEFORE mermaid validation and JSON schema validation
// Skip if we're already in a completion prompt follow-up call or if no completion prompt is configured
// Completion prompt handling - inject one more user message into the existing conversation
// This continues the SAME agentic session (same tools, same TaskManager, same history)
// rather than spawning a recursive this.answer() call which would reset state
if (completionAttempted && this.completionPrompt && !options._completionPromptProcessed) {
if (this.debug) {
console.log('[DEBUG] Running completion prompt for post-completion validation/review...');
console.log('[DEBUG] Running completion prompt as continuation of current session...');
}

try {
// Record completion prompt start in telemetry
const originalResult = finalResult;

if (this.tracer) {
this.tracer.recordEvent('completion_prompt.started', {
'completion_prompt.original_result_length': finalResult?.length || 0
});
}

// Create the completion prompt with the current result as context
// Append completion prompt as a user message to the existing conversation
const completionPromptMessage = `${this.completionPrompt}

Here is the result to review:
<result>
${finalResult}

Check warning on line 3607 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: security

security Issue

The completionPrompt user-configurable string is directly interpolated into the AI prompt without sanitization. While this is used internally by the agent, if completionPrompt contains malicious content (e.g., from external configuration), it could manipulate the AI's behavior. The finalResult is also interpolated directly without escaping, which could lead to prompt injection if the result contains XML-like tags or instructions.
Raw output
Consider sanitizing or validating completionPrompt at configuration time. For finalResult, consider using a more robust delimiter that cannot appear in the result, or escape any XML-like content within the result block.
</result>

After reviewing, provide your final answer using attempt_completion.`;
Double-check your response based on the criteria above. If everything looks good, respond with your previous answer exactly as-is using attempt_completion. If something needs to be fixed or is missing, do it now, then respond with the COMPLETE updated answer (everything you did in total, not just the fix) using attempt_completion.`;

// Make a follow-up call with the completion prompt
// Pass _completionPromptProcessed to prevent infinite loops
// Save output buffers — the recursive answer() must not destroy DSL output() content
const savedOutputItems = this._outputBuffer ? [...this._outputBuffer.items] : [];
const savedExtractedBlocks = this._extractedRawBlocks ? [...this._extractedRawBlocks] : [];
const completionResult = await this.answer(completionPromptMessage, [], {
...options,
_completionPromptProcessed: true
});
// Restore output buffers so the parent call can append them to the final result
if (this._outputBuffer) {
this._outputBuffer.items = savedOutputItems;
currentMessages.push({ role: 'user', content: completionPromptMessage });

// Reset completion tracking for the follow-up turn
completionResult = null;
completionAttempted = false;

// Run one more streamText pass with the same tools and conversation context
// Give a small number of extra iterations for the follow-up
const completionMaxIterations = 5;
const completionStreamOptions = {
model: this.provider ? this.provider(this.model) : this.model,

Check warning on line 3622 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: performance

performance Issue

The completion prompt follow-up mutates currentMessages by pushing the user message directly, then passes it to prepareMessagesWithImages which creates a shallow copy. This results in O(n) array copying where n is the total message count, happening twice (once in the main loop, once in completion prompt). For long conversations, this creates unnecessary memory pressure.
Raw output
Consider batching the completion prompt message addition with the existing messages array construction to avoid multiple array copies. The current approach is acceptable for typical conversation lengths but could be optimized for very long sessions.
messages: this.prepareMessagesWithImages(currentMessages),
tools,
stopWhen: stepCountIs(completionMaxIterations),
maxTokens: maxResponseTokens,
temperature: 0.3,

Check warning on line 3627 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: performance

logic Issue

The code resets completionResult = null and completionAttempted = false before the follow-up turn. While this is correct for the new logic, it means the completion tracking state is being mutated in-place. If the follow-up streamText call fails or times out, these reset values persist, though the code correctly falls back to originalResult.
Raw output
The current implementation is correct - the fallback to originalResult handles failure cases. Consider adding a telemetry event to track when the fallback is triggered for monitoring purposes.
onStepFinish: ({ toolResults, text, finishReason, usage }) => {
if (usage) {
this.tokenCounter.recordUsage(usage);
}

Check warning on line 3631 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: performance

architecture Issue

The completion prompt follow-up uses a hardcoded max of 5 iterations (completionMaxIterations = 5). This is a reasonable limit but is not configurable. For complex follow-up tasks, this might be insufficient, while for simple validations it may be excessive.
Raw output
Consider making the completion prompt iteration limit configurable via a new option like completionPromptMaxIterations, defaulting to 5. This allows tuning for different use cases without code changes.
if (options.onStream && text) {
options.onStream(text);
}
if (this.debug) {
console.log(`[DEBUG] Completion prompt step finished (reason: ${finishReason}, tools: ${toolResults?.length || 0})`);
}
}

Check warning on line 3638 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: security

security Issue

The completion prompt follow-up creates a new streamText call with up to 5 additional iterations. Combined with the main loop iterations, this could lead to excessive API calls and resource consumption. While there's a maxOperationTimeout, an attacker could craft inputs that maximize token usage within those limits.
Raw output
Consider adding telemetry/alerting for unusual completion prompt patterns, and ensure the 5-iteration cap is documented as a resource limit.
};

const providerOpts = this._buildThinkingProviderOptions(maxResponseTokens);
if (providerOpts) {
completionStreamOptions.providerOptions = providerOpts;
}

const cpResult = await this.streamTextWithRetryAndFallback(completionStreamOptions);
const cpFinalText = await cpResult.text;
const cpUsage = await cpResult.usage;

Check warning on line 3648 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: performance

style Issue

The completionStreamOptions object is constructed with similar properties to the main streamOptions (model, messages, tools, maxTokens, temperature, onStepFinish). This creates a partial duplication of the options construction logic. While not a runtime performance issue, it increases code maintenance burden.
Raw output
Consider extracting common streamText options into a helper method or spreading from a base options object to reduce duplication. This is a minor code quality improvement, not a performance concern.
if (cpUsage) {
this.tokenCounter.recordUsage(cpUsage, cpResult.experimental_providerMetadata);
}

Check warning on line 3652 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: security

security Issue

When the completion prompt fails, the error is logged but the original result is silently preserved. While this is correct behavior for availability, sensitive error details may be logged via console.error. The error message could potentially leak into debug output.
Raw output
Ensure error logs do not contain sensitive data. Consider using structured logging that can be filtered in production.
// Append follow-up messages to conversation history
const cpMessages = await cpResult.response?.messages;
if (cpMessages) {
for (const msg of cpMessages) {
currentMessages.push(msg);
}
}
this._extractedRawBlocks = savedExtractedBlocks;

// Update finalResult with the result from the completion prompt
finalResult = completionResult;
// Use new completion result if the agent called attempt_completion again,

Check warning on line 3661 in npm/src/agent/ProbeAgent.js

View check run for this annotation

probelabs / Visor: performance

performance Issue

The code iterates over cpResult.response?.messages and pushes each to currentMessages individually. For responses with many messages, this is O(m) where m is message count. While typically small, using spread operator would be more idiomatic: currentMessages.push(...cpMessages).
Raw output
Replace the for-of loop with spread: if (cpMessages?.length) currentMessages.push(...cpMessages);
// otherwise keep the original result (the follow-up may have just done side-effects)
if (completionResult) {
finalResult = completionResult;
completionAttempted = true;
} else if (cpFinalText && cpFinalText.trim().length > 0) {
finalResult = cpFinalText;
completionAttempted = true;
} else {
// Follow-up produced nothing useful — keep the original
finalResult = originalResult;
completionAttempted = true;
if (this.debug) {
console.log('[DEBUG] Completion prompt returned empty result, keeping original.');
}
}

if (this.debug) {
console.log(`[DEBUG] Completion prompt finished. New result length: ${finalResult?.length || 0}`);
console.log(`[DEBUG] Completion prompt finished. Final result length: ${finalResult?.length || 0}`);
}

// Record completion prompt completion in telemetry
if (this.tracer) {
this.tracer.recordEvent('completion_prompt.completed', {
'completion_prompt.final_result_length': finalResult?.length || 0
'completion_prompt.final_result_length': finalResult?.length || 0,
'completion_prompt.used_original': finalResult === originalResult
});
}
} catch (error) {
Expand Down
248 changes: 247 additions & 1 deletion npm/tests/unit/completion-prompt.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -154,13 +154,15 @@
${finalResult}
</result>

After reviewing, provide your final answer using attempt_completion.`;
Double-check your response based on the criteria above. If everything looks good, respond with your previous answer exactly as-is using attempt_completion. If something needs to be fixed or is missing, do it now, then respond with the COMPLETE updated answer (everything you did in total, not just the fix) using attempt_completion.`;

expect(formattedMessage).toContain(completionPrompt);
expect(formattedMessage).toContain(finalResult);
expect(formattedMessage).toContain('<result>');
expect(formattedMessage).toContain('</result>');
expect(formattedMessage).toContain('attempt_completion');
expect(formattedMessage).toContain('Double-check your response');
expect(formattedMessage).toContain('respond with your previous answer exactly as-is');
});
});

Expand Down Expand Up @@ -377,3 +379,247 @@
expect(baseAgent.completionPrompt).toBe('Original prompt');
});
});

describe('completionPrompt session continuity behavior', () => {
// Helper to create a mock streamText result
function createMockStreamResult(text, messages = []) {
return {
text: Promise.resolve(text),
usage: Promise.resolve({ promptTokens: 10, completionTokens: 5 }),
response: { messages: Promise.resolve(messages) },
experimental_providerMetadata: undefined,
steps: Promise.resolve([]),
};
}

// Helper to set up agent with mocked internals so answer() reaches streamText
function createMockedAgent(options = {}) {
const agent = new ProbeAgent({
completionPrompt: options.completionPrompt || 'Check your work',
path: process.cwd(),
model: 'test-model',
...options,
});

// Mock getSystemMessage to avoid filesystem access
jest.spyOn(agent, 'getSystemMessage').mockResolvedValue('You are a test agent.');

// Mock prepareMessagesWithImages to pass through
jest.spyOn(agent, 'prepareMessagesWithImages').mockImplementation(msgs => msgs);

// Mock _buildThinkingProviderOptions
jest.spyOn(agent, '_buildThinkingProviderOptions').mockReturnValue(null);

// Ensure provider is null so model string is used directly
agent.provider = null;

// Mock hooks
agent.hooks = { emit: jest.fn().mockResolvedValue(undefined) };

// Mock storage adapter
agent.storageAdapter = { saveMessage: jest.fn().mockResolvedValue(undefined) };

return agent;
}

test('should call streamText twice (not recursive answer) when completionPrompt is set', async () => {
const agent = createMockedAgent();

const streamCalls = [];
let streamCallCount = 0;
let onCompleteFn = null;

// Capture the onComplete callback from _buildNativeTools
const origBuild = agent._buildNativeTools.bind(agent);
jest.spyOn(agent, '_buildNativeTools').mockImplementation((opts, onComplete, ctx) => {
onCompleteFn = onComplete;
return origBuild(opts, onComplete, ctx);
});

jest.spyOn(agent, 'streamTextWithRetryAndFallback').mockImplementation(async (opts) => {
streamCallCount++;
streamCalls.push({
callNumber: streamCallCount,
messages: [...(opts.messages || [])],
});

if (streamCallCount === 1) {
// Simulate attempt_completion being called during main turn
if (onCompleteFn) onCompleteFn('{"summary":"Done","pr_urls":["https://github.com/test/1"]}');
return createMockStreamResult('', [{ role: 'assistant', content: 'done' }]);
}
// Completion prompt follow-up
return createMockStreamResult('Looks good', [{ role: 'assistant', content: 'verified' }]);
});

const answerSpy = jest.spyOn(agent, 'answer');
const result = await agent.answer('Implement feature');

// answer() called exactly once (no recursive call)
expect(answerSpy).toHaveBeenCalledTimes(1);

// streamText called twice: main loop + completion prompt follow-up
expect(streamCallCount).toBe(2);

// Second call should have more messages (completion prompt user message appended)
expect(streamCalls[1].messages.length).toBeGreaterThan(streamCalls[0].messages.length);

// Verify the appended user message contains the completion prompt and result
const lastMsg = streamCalls[1].messages[streamCalls[1].messages.length - 1];
expect(lastMsg.role).toBe('user');
expect(lastMsg.content).toContain('Check your work');
expect(lastMsg.content).toContain('<result>');
expect(lastMsg.content).toContain('pr_urls');
expect(lastMsg.content).toContain('Double-check your response');

jest.restoreAllMocks();
});

test('should preserve original result when completion prompt returns empty', async () => {
const agent = createMockedAgent();

let streamCallCount = 0;
let onCompleteFn = null;

const origBuild = agent._buildNativeTools.bind(agent);
jest.spyOn(agent, '_buildNativeTools').mockImplementation((opts, onComplete, ctx) => {
onCompleteFn = onComplete;
return origBuild(opts, onComplete, ctx);
});

jest.spyOn(agent, 'streamTextWithRetryAndFallback').mockImplementation(async () => {
streamCallCount++;
if (streamCallCount === 1) {
if (onCompleteFn) onCompleteFn('Original result with PR URLs');
return createMockStreamResult('', []);
}
// Completion prompt returns empty text, no attempt_completion called
return createMockStreamResult('', []);
});

const result = await agent.answer('Do the task');

// Original result should be preserved
expect(result).toBe('Original result with PR URLs');
expect(streamCallCount).toBe(2);

jest.restoreAllMocks();
});

test('should not run completion prompt when _completionPromptProcessed is set', async () => {
const agent = createMockedAgent();

let streamCallCount = 0;
let onCompleteFn = null;

const origBuild = agent._buildNativeTools.bind(agent);
jest.spyOn(agent, '_buildNativeTools').mockImplementation((opts, onComplete, ctx) => {
onCompleteFn = onComplete;
return origBuild(opts, onComplete, ctx);
});

jest.spyOn(agent, 'streamTextWithRetryAndFallback').mockImplementation(async () => {
streamCallCount++;
if (onCompleteFn) onCompleteFn('Result');
return createMockStreamResult('', []);
});

await agent.answer('Do the task', [], { _completionPromptProcessed: true });

// Only 1 streamText call — completion prompt should be skipped
expect(streamCallCount).toBe(1);

jest.restoreAllMocks();
});

test('should keep original result when completion prompt throws', async () => {
const agent = createMockedAgent();

let streamCallCount = 0;
let onCompleteFn = null;

const origBuild = agent._buildNativeTools.bind(agent);
jest.spyOn(agent, '_buildNativeTools').mockImplementation((opts, onComplete, ctx) => {
onCompleteFn = onComplete;
return origBuild(opts, onComplete, ctx);
});

jest.spyOn(agent, 'streamTextWithRetryAndFallback').mockImplementation(async () => {
streamCallCount++;
if (streamCallCount === 1) {
if (onCompleteFn) onCompleteFn('Original good result');
return createMockStreamResult('', []);
}
throw new Error('API error during completion prompt');
});

const consoleSpy = jest.spyOn(console, 'error').mockImplementation(() => {});

const result = await agent.answer('Do the task');

// Original result preserved despite completion prompt error
expect(result).toBe('Original good result');
expect(streamCallCount).toBe(2);

consoleSpy.mockRestore();
jest.restoreAllMocks();
});

test('should use updated result when completion prompt calls attempt_completion', async () => {
const agent = createMockedAgent();

let streamCallCount = 0;
let onCompleteFn = null;

const origBuild = agent._buildNativeTools.bind(agent);
jest.spyOn(agent, '_buildNativeTools').mockImplementation((opts, onComplete, ctx) => {
onCompleteFn = onComplete;
return origBuild(opts, onComplete, ctx);
});

jest.spyOn(agent, 'streamTextWithRetryAndFallback').mockImplementation(async () => {
streamCallCount++;
if (streamCallCount === 1) {
// Main turn: incomplete result
if (onCompleteFn) onCompleteFn('Incomplete - no PR yet');
return createMockStreamResult('', []);
}
// Completion prompt follow-up: agent creates the PR and calls attempt_completion again
if (onCompleteFn) onCompleteFn('Complete - PR created at https://github.com/test/pr/1');
return createMockStreamResult('', []);
});

const result = await agent.answer('Do the task');

// Updated result from completion prompt should be used
expect(result).toBe('Complete - PR created at https://github.com/test/pr/1');

jest.restoreAllMocks();
});

test('should not run completion prompt when no completionPrompt is configured', async () => {
const agent = createMockedAgent({ completionPrompt: '' }); // Empty = null

let streamCallCount = 0;
let onCompleteFn = null;

const origBuild = agent._buildNativeTools.bind(agent);
jest.spyOn(agent, '_buildNativeTools').mockImplementation((opts, onComplete, ctx) => {
onCompleteFn = onComplete;
return origBuild(opts, onComplete, ctx);
});

jest.spyOn(agent, 'streamTextWithRetryAndFallback').mockImplementation(async () => {
streamCallCount++;
if (onCompleteFn) onCompleteFn('Done');
return createMockStreamResult('', []);
});

await agent.answer('Do the task');

// Only 1 streamText call — no completion prompt
expect(streamCallCount).toBe(1);

jest.restoreAllMocks();
});
});

Check warning on line 625 in npm/tests/unit/completion-prompt.test.js

View check run for this annotation

probelabs / Visor: security

security Issue

The test suite creates mock agents with extensive spying on internal methods (_buildNativeTools, streamTextWithRetryAndFallback). While this is appropriate for unit testing, ensure these mocks don't mask security issues that would occur in production. The tests mock provider calls which bypasses authentication/authorization checks that would happen in real API calls.
Raw output
Consider adding integration tests that exercise the actual API call paths with test credentials to validate security controls.
Loading