Skip to content

Default max_tokens=2048 in tailor stage causes truncated JSON for long resumes #12

@tearl42

Description

@tearl42

Description of the bug
The tailor stage sets max_tokens=2048 when calling the LLM. For candidates with extensive work history, the prompt alone can exceed 5,000 tokens, and thinking models like gemini-2.5-flash consume additional tokens for reasoning. This leaves insufficient tokens for the full JSON response, causing truncated output and repeated EXHAUSTED_RETRIES failures.

To Reproduce
Set up a profile with 15+ years of work history and run the tailor stage with gemini-2.5-flash. Every job will hit EXHAUSTED_RETRIES with finishReason: MAX_TOKENS visible in the API logs. The response cuts off mid-JSON after only 82-418 tokens.

"finishReason": "MAX_TOKENS",
"candidatesTokenCount": 82,
"promptTokenCount": 5380,
"thoughtsTokenCount": 7769

Expected behavior
The max_tokens limit should be high enough to accommodate long resumes, or better yet, be a configurable parameter in profile.json or as a CLI flag so users can tune it for their situation.
Fix
In tailor.py around line 403, change:
raw = client.chat(messages, max_tokens=2048, temperature=0.4)
to:
raw = client.chat(messages, max_tokens=16384, temperature=0.4)

Environment

  • Resume length: 30 years of experience
  • Model: gemini-2.5-flash
  • Observed prompt token count: ~5,700 tokens

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions