High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
-
Updated
Nov 23, 2025 - Go
High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
A private, local RAG (Retrieval-Augmented Generation) system using Flowise, Ollama, and open-source LLMs to chat with your documents securely and offline.
LocalPrompt is an AI-powered tool designed to refine and optimize AI prompts, helping users run locally hosted AI models like Mistral-7B for privacy and efficiency. Ideal for developers seeking to run LLMs locally without external APIs.
Recallium is a local, self-hosted universal AI memory system providing a persistent knowledge layer for developer tools (Copilot, Cursor, Claude Desktop). It eliminates "AI amnesia" by automatically capturing, clustering, and surfacing decisions and patterns across all projects. It uses the MCP for universal compatibility and ensures privacy
Web-Based Q&A Tool enables users to extract and query website content using FastAPI, FAISS, and a local TinyLlama-1.1B model—without external APIs. Built with React, it offers a minimal UI for seamless AI-driven search
LocalPrompt is an AI-powered tool designed to refine and optimize AI prompts, helping users run locally hosted AI models like Mistral-7B for privacy and efficiency. Ideal for developers seeking to run LLMs locally without external APIs.
Powers the local RAG pipeline in the BrainDrive Chat w/ Docs plugin.
Add a description, image, and links to the self-hosted-ai topic page so that developers can more easily learn about it.
To associate your repository with the self-hosted-ai topic, visit your repo's landing page and select "manage topics."