[AI] Add hybrid inference support in GenerativeModelSession#16043
[AI] Add hybrid inference support in GenerativeModelSession#16043andrewheard wants to merge 6 commits intomainfrom
GenerativeModelSession#16043Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. |
WIP - not ready for review
Started adding support for hybrid (on-device and cloud) inference. This is internally implemented as an array of fallback models, trying one model session and moving onto the next. This will be publicly exposed as "prefer cloud" or "prefer on-device", which just impacts the order of the models in the array. This could be expanded to other fallback strategies in the future if desired (e.g., Vertex AI --> Gemini Dev API, Gemini 3.1 --> Gemini 2.5) to handle cases when backends or models are resource constrained.
Note: Streaming is not yet implemented and fails if the first preference fails.
TODOs:
#no-changelog