Brainqub3 Chat is a local-first Next.js workspace that wraps a Brainqub3-themed chat UI, a vLLM OpenAI-compatible orchestrator, and an MCP bridge so models can call local tools. The UI, API routes, and MCP helpers all run on your machine; only the vLLM endpoint needs to be reachable over HTTP, which can be local or a remote RunPod deployment.
- Brainqub3 UI: Always-dark layout with cyan/purple glow accents, Geist typography, streaming messages, caret animation, and keyboard shortcuts (
⌘/Ctrl+Kfor new chats,⌘/Ctrl+Enterto send,↑edits last prompt). - Session management: Multi-session rail with previews, rename-on-first-answer, delete, and automatic persistence to
localStorage. - vLLM orchestrator:
/api/chatforwards OpenAI-style chat completion requests (withtool_choice:"auto") to a configurableVLLM_BASE_URL, loops through tool calls, and streams Server-Sent Events back to the browser. - MCP bridge (experimental):
/api/mcp/*endpoints manage stdio or HTTP MCP servers via the official TypeScript SDK, exposing each MCP tool as an OpenAI tool (mcp:<serverId>:<toolName>). This pathway is currently untested end-to-end, so expect to troubleshoot transports if you enable it. - Local controls: Model picker, per-session system prompt pill, estimated token budget bar, and live MCP server status cards.
- Node.js 18+ – required for the App Router and Node runtime routes.
- vLLM endpoint – any OpenAI-compatible vLLM server. You can:
- Run it locally (see Running vLLM locally below), or
- Deploy your own container on RunPod. Start from the RunPod vLLM template, then follow the official RunPod docs to finish configuring the worker and expose the HTTPS endpoint you will paste into
VLLM_BASE_URL.
- Optional: any MCP servers (stdio binaries or HTTP endpoints) you want the model to call.
npm installCreate .env.local in the project root:
VLLM_BASE_URL=http://localhost:8000 # or the HTTPS URL from your RunPod deployment
DEFAULT_MODEL=moonshotai/Kimi-K2-Thinking # server-side default passed to vLLM
NEXT_PUBLIC_DEFAULT_MODEL=moonshotai/Kimi-K2-ThinkingThe defaults ship with the Kimi K2 model; change both variables if you point at a different checkpoint. DEFAULT_MODEL drives the API’s fallback choice, while NEXT_PUBLIC_DEFAULT_MODEL seeds new sessions on the client.
python -m vllm.entrypoints.openai.api_server \
--model moonshotai/Kimi-K2-Thinking \
--host 0.0.0.0 --port 8000If you prefer an on-demand GPU endpoint, deploy the RunPod template linked above, then set VLLM_BASE_URL to the provided HTTPS endpoint. The Next.js app treats local and remote URLs the same.
npm run devVisit http://localhost:3000, create a chat, and start messaging. Expand MCP Servers to register stdio or HTTP transports; enabled servers automatically expose their tools to the model and show up under the “+Tools” indicator.
- API routes opt into the Node runtime so MCP stdio transports can spawn
child_processinstances. - Streaming uses Server-Sent Events and
eventsource-parserto buffer tool-call metadata while emitting text deltas immediately. - The MCP registry lives in memory; restarting
npm run devclears MCP state, but chat sessions remain in the browser thanks tolocalStorage. - Tailwind centralizes Brainqub3 design tokens (glows, gradients, type scale) so the sidebar, chat pane, and tool cards stay consistent.
npm run dev– Next.js dev servernpm run build– production buildnpm run start– serve the production buildnpm run lint– ESLint
Ideas for later: persist sessions to disk, add an MCP prompt/resource library, or integrate token-aware summarization when transcripts approach the context window.