stop rollouts on incomplete responses (no content or tools)#948
stop rollouts on incomplete responses (no content or tools)#948
Conversation
…ns list Co-authored-by: will brown <willccbb@users.noreply.github.com>
mikasenghaas
left a comment
There was a problem hiding this comment.
why do we handle this as a stop condition and not a vf.Error (this is what we currently do in openai_chat_completions_client.py
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| judge_response = await judge(prompt, completion, answer, state) | ||
| cleaned_completion = [ | ||
| {x["role"]: x["content"].split("</think>")[-1] for x in completion} | ||
| ] |
There was a problem hiding this comment.
Dict comprehension creates wrong message structure for judge
High Severity
The dict comprehension {x["role"]: x["content"].split("</think>")[-1] for x in completion} creates a single dictionary with role names (e.g., "assistant", "tool") as keys and cleaned content as values, wrapped in a list. This produces a structure like [{"assistant": "...", "tool": "..."}] instead of the expected list of message dicts with "role" and "content" keys. When the judge's parse_answer tries to find assistant messages, it looks for a "role" key in each element — which doesn't exist in this dict — so it always returns None, making the judge evaluate against a None response. The brackets likely need to be moved so the list comprehension wraps each message individually.
| async def judge_reward_func(judge, prompt, completion, answer, state) -> float: | ||
| judge_response = await judge(prompt, completion, answer, state) | ||
| cleaned_completion = [ | ||
| {x["role"]: x["content"].split("</think>")[-1] for x in completion} |
There was a problem hiding this comment.
Split on None content causes AttributeError
Medium Severity
AssistantMessage.content is typed as MessageContent | None and defaults to None for tool-call-only messages. The expression x["content"].split("</think>") will raise an AttributeError when content is None. In a multi-turn tool-use environment like wiki_search, assistant messages with only tool_calls and no content are common. The error is silently caught by _call_individual_reward_func, returning a reward of 0.0, which silently corrupts training signal.


Description
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Changes core rollout termination/truncation behavior in
MultiTurnEnv, which can alter evaluation/training outcomes when providers emit empty responses.Overview
Multi-turn rollouts now terminate when the model returns an “incomplete” response (no message content and no tool calls). This is implemented as a new
@vf.stopcondition (has_incomplete_response) and by marking such trajectory steps as truncated inMultiTurnEnv.add_model_response.Docs are updated to mention incomplete-response detection as a default stop condition, the
wiki-searchenvironment strips<think>content before LLM judging, and dataset builder fields inEnvironmentare explicitlycast()for type safety;.gitignorealso ignorespackages/tasksetsandpackages/harnesses.Written by Cursor Bugbot for commit 92a8da8. This will update automatically on new commits. Configure here.