-
Notifications
You must be signed in to change notification settings - Fork 159
Description
Description of the bug
The tailor prompt instructs the LLM to "Return ONLY valid JSON. No markdown fences. No commentary." However, Gemini 2.5-flash frequently wraps its response in markdown code fences (json ... ) and occasionally injects random words mid-JSON (e.g. the word "scrumptious" appearing inside a JSON string), causing parse failures and validation retries.
While extract_json() correctly strips markdown fences, the random word injection causes malformed JSON that cannot be recovered, resulting in EXHAUSTED_RETRIES.
To Reproduce
- Configure ApplyPilot with
gemini-2.5-flash - Run the tailor stage
- Check AI Studio logs — responses frequently include
jsonwrappers despite the prompt instruction - Occasionally a random word will be injected mid-JSON causing a parse failure
Expected behavior
The API call should enforce structured JSON output using the appropriate parameter for the endpoint being used, eliminating both markdown wrapping and random token injection.
Fix
In llm.py, add JSON enforcement to both API call paths:
Native Gemini (_chat_native_gemini):
"generationConfig": { "temperature": temperature, "maxOutputTokens": max_tokens, "responseMimeType": "application/json", }
OpenAI-compat (_chat_compat):
payload = { "model": self.model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "response_format": {"type": "json_object"}, }
Environment
Model: gemini-2.5-flash
Endpoint: OpenAI-compat (https://generativelanguage.googleapis.com/v1beta/openai)