Replies: 2 comments 1 reply
-
|
Can you add an option to manually specify the endpoint? My LM Studio instance is running on another machine on my network. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Coming soon 😃 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
The desktop app could leverage a locally-running Ollama instance as a fourth tier in the AI fallback chain (Groq → OpenRouter → Ollama → browser-side T5).
When detected on localhost:11434, the sidecar would route classification and summarization requests to any compatible model the user has pulled (e.g., Llama 3.1 8B, Mistral, Gemma). This eliminates cloud API dependency entirely for desktop users, no API keys needed, no rate limits, no data leaving the machine. The sidecar already runs all API handlers locally, so adding an Ollama adapter is a natural extension: probe the /api/tags endpoint at startup to detect available models, prefer quantized 8B variants for speed, and fall back to the next tier if Ollama isn't running or the request times out.
For users with Apple Silicon or modern GPUs, inference latency on 8B models is comparable to cloud round-trips, making this a zero-cost, fully private alternative.
Beta Was this translation helpful? Give feedback.
All reactions