Skip to content

Conversation

@blefo
Copy link
Member

@blefo blefo commented Nov 14, 2025

No description provided.

@blefo
Copy link
Member Author

blefo commented Nov 14, 2025

I’ve run the test many times and it now passes consistently. The earlier intermittent failures were likely caused by reasoning hallucinations from gpt-oss-20b.

@blefo blefo marked this pull request as ready for review November 14, 2025 13:13
@blefo blefo requested a review from Copilot November 26, 2025 09:41
Copilot finished reviewing on behalf of blefo November 26, 2025 09:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes websearch E2E tests to prevent reasoning token loops by making the tests more deterministic and constraining response generation. The changes apply uniform adjustments across four test files covering both responses and chat completions API endpoints.

Key changes:

  • Reduced temperature from 0.2 to 0.01 for more deterministic outputs
  • Added explicit instruction "Answer in 10 words maximum and do not reason" to prevent lengthy reasoning loops
  • Removed max_output_tokens parameter from test_responses_http.py

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/e2e/test_responses_http.py Updated web search test with lower temperature (0.01), added response constraint instructions, and removed max_output_tokens parameter
tests/e2e/test_responses.py Updated web search test with lower temperature (0.01) and added response constraint instructions
tests/e2e/test_chat_completions_http.py Updated web search test with lower temperature (0.01) and added response constraint instructions to system message
tests/e2e/test_chat_completions.py Updated web search test with lower temperature (0.01) and added response constraint instructions to system message

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@blefo blefo requested a review from jcabrero November 26, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants