Skip to content

Gemini 2.5-flash ignores JSON-only output instruction without response_format / responseMimeType enforcement #16

@tearl42

Description

@tearl42

Description of the bug
The tailor prompt instructs the LLM to "Return ONLY valid JSON. No markdown fences. No commentary." However, Gemini 2.5-flash frequently wraps its response in markdown code fences (json ... ) and occasionally injects random words mid-JSON (e.g. the word "scrumptious" appearing inside a JSON string), causing parse failures and validation retries.
While extract_json() correctly strips markdown fences, the random word injection causes malformed JSON that cannot be recovered, resulting in EXHAUSTED_RETRIES.

To Reproduce

  1. Configure ApplyPilot with gemini-2.5-flash
  2. Run the tailor stage
  3. Check AI Studio logs — responses frequently include json wrappers despite the prompt instruction
  4. Occasionally a random word will be injected mid-JSON causing a parse failure

Expected behavior
The API call should enforce structured JSON output using the appropriate parameter for the endpoint being used, eliminating both markdown wrapping and random token injection.

Fix
In llm.py, add JSON enforcement to both API call paths:
Native Gemini (_chat_native_gemini):
"generationConfig": { "temperature": temperature, "maxOutputTokens": max_tokens, "responseMimeType": "application/json", }

OpenAI-compat (_chat_compat):
payload = { "model": self.model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "response_format": {"type": "json_object"}, }

Environment

Model: gemini-2.5-flash
Endpoint: OpenAI-compat (https://generativelanguage.googleapis.com/v1beta/openai)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions