A production-grade Model Context Protocol (MCP) server that augments local LLM inference (via vLLM or Ollama) with real-time external data (e.g., weather, finance, news) to make responses dynamic, accurate, and context-aware.
This system is designed for modularity, scalability, and performance in local or hybrid LLM deployments with support for concurrent multi-user interactions.
- β LLM-Aware Prompt Augmentation
- π Real-Time Tool Integration (Weather, Finance, News APIs)
- π§± Modular Architecture (Plug-in style tool injection)
- π High-performance Go backend
- π¦ Support for vLLM/Ollama via HTTP
- π React frontend
- π οΈ Testable and extensible core logic
- π Observability + Tracing support
- π³ Docker/Kubernetes deployment ready
- π₯ Concurrent Multi-User Support: Handles simultaneous user queries while maintaining context isolation and data security.
nekton/
βββ nekton-client/ # React frontend
β βββ web/ # Frontend (React)
β βββ Makefile
β βββ README.md
β
βββ nekton-server/ # Backend (Hexagonal MCP)
βββ cmd/ # Entrypoints and wire-up
β βββ api/
β βββ main.go # Starts the server
β βββ container.go # DI container (using dig or fx)
β
βββ internal/
β βββ domain/ # Core business logic (independent)
β β βββ contextor/ # Planner, enricher, prompt builder
β β β βββ planner.go
β β β βββ enricher.go
β β β βββ prompt_builder.go
β β βββ tool/ # Tool behavior and data contract
β β β βββ tool.go
β β βββ llm/ # LLM inference port interface
β β β βββ llm.go
β β βββ session/ # Session management for multi-user support
β β β βββ session_manager.go # Handles user session lifecycle
β β βββ port/ # Hexagonal ports (interfaces)
β β βββ tool_port.go
β β βββ llm_port.go
β β βββ session_port.go
β β βββ audit_port.go
β
β βββ adapter/ # External systems (driven adapters)
β β βββ http/ # HTTP server adapter (fasthttp)
β β β βββ handler.go
β β βββ llm/ # LLM backend (vLLM/Ollama)
β β β βββ vllm_client.go
β β βββ tool/
β β β βββ weather_adapter.go
β β β βββ finance_adapter.go
β β β βββ news_adapter.go
β β βββ redis/ # Redis for sessions + cache
β β β βββ redis_store.go
β β βββ kafka/ # Kafka producer/consumer
β β β βββ kafka_client.go
β β βββ postgres/ # PostgreSQL audit + cold data
β β βββ audit_logger.go
β
β βββ infra/ # Logging, config, tracing, shared
β β βββ config/
β β βββ logger/
β β βββ observability/
β β βββ errors/
β
β βββ tests/ # Unit + integration tests
β βββ mocks/
β βββ integration/
β
βββ scripts/ # Bootstrap & operational scripts
βββ api/ # OpenAPI / gRPC definitions
βββ deployments/ # Docker + K8s manifests
βββ Makefile
βββ go.mod
βββ README.md
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#BB2528',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#F8B229',
'secondaryColor': '#006100',
'tertiaryColor': '#fff',
'lineWidth': 8,
'fontSize':30
}
}
}%%
flowchart TD
A[User Input] --> B[API Gateway HTTP/gRPC]
B --> C[Context Planner]
C --> D[Tool Orchestrator]
D --> E[Prompt Builder]
E --> F[LLM Inference Engine<br/>vLLM / Ollama]
F --> G[Response Handler]
G --> H[User Output]
D --> T[Weather, Finance, News APIs]
T --> D
D --> C[Feedback to Planner]
B --> I[Session Manager] --> J[Manage User Context]
I --> K[Redis Cache]
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#BB2528',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#F8B229',
'secondaryColor': '#006100',
'tertiaryColor': '#fff',
'lineWidth': 8,
'fontSize':30
}
}
}%%
flowchart TD
UI[Client UI] --> APIGW[REST/gRPC API Gateway]
subgraph Gateway
APIGW --> Auth[Auth + Rate Limiting]
APIGW --> Planner[Context Planner + Tool Router]
APIGW --> Redis[Session Cache Redis]
end
Planner -->|Dispatch| WeatherTool[Weather Tool]
Planner -->|Dispatch| FinanceTool[Finance Tool]
Planner -->|Dispatch| NewsTool[News Tool]
WeatherTool --> WeatherEnricher[Weather Tool Enricher]
FinanceTool --> FinanceEnricher[Finance Tool Enricher]
NewsTool --> NewsEnricher[News Tool Enricher]
WeatherEnricher --> PromptBuilder
FinanceEnricher --> PromptBuilder
NewsEnricher --> PromptBuilder
PromptBuilder[Prompt Builder] --> LLM[vLLM / Ollama]
LLM --> RespBuilder[Response Builder]
RespBuilder --> Postgres[PostgreSQL audit + cold DB]
APIGW --> SessionManager[Session Manager] --> Redis
-
Session Management: The
SessionManagercomponent ensures that each user's query is handled in an isolated context. It supports user authentication, maintains state across requests, and stores session data in Redis for fast retrieval. -
Context Isolation: Each user interaction with the LLM is processed independently, ensuring that context for one user does not interfere with another. This is critical for handling concurrent users.
-
Rate Limiting: Each user is subject to configurable rate limits to prevent abuse and ensure system stability.
Service entrypoints:
api-gateway: Starts the HTTP API usingfasthttpfor high-performance routingworker: Background job runner (e.g., cron for refreshes)tools-runner: Manual tool testing CLI
Versioned REST/gRPC API layer.
handlers/: Input/output translationcontrollers/: Business logicmiddleware/: Auth, logging, rate limitingschemas/: Request/response type definitions
The MCP core. Determines what tools to use and how to inject data:
engine.go: Overall pipelineplanner.go: Chooses relevant toolsenricher.go: Gathers external contextprompt_builder.go: Final prompt construction
LLM client abstraction:
client.go: HTTP interface with vLLM or Ollamamodels.go: Model registry/configstreamer.go: Streaming completion support
Modular tool adapters:
weather/: OpenWeatherMap, Tomorrow.io, etc.finance/: Stock data, crypto, market sentimentnews/: RSS feeds, Google News, etc.
Each tool has:
provider.go: Fetches external dataenricher.go: Formats data into LLM-ready prompt fragments
Shared infrastructure:
config/: Env loading, config structslogger/: Structured logging (zap or slog)observability/: Prometheus, tracingerrors/: App-specific error types
React client:
- Live chat interface with streaming response
- Model switcher and tool visualizer
- Unit tests for context planners, prompt builders, tools
- Integration tests for full input β output validation
- Mock tools and inference clients for repeatable tests
- Go 1.21+ - Install Go
- vLLM or Ollama - For LLM inference
- vLLM (default):
pip install vllm- See vLLM documentation - Ollama (alternative):
curl -fsSL https://ollama.ai/install.sh | sh
- vLLM (default):
- Optional Services:
- Redis - For session persistence
- Kafka - For tool orchestration
- PostgreSQL - For audit logging
# Clone the repository
git clone https://github.com/echenim/Nekton-Server.git
cd Nekton-Server
# Install dependencies
make deps
# Setup vLLM (default LLM provider)
pip install vllm
# Start vLLM server in another terminal
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000
# Or setup Ollama (alternative)
# curl -fsSL https://ollama.ai/install.sh | sh
# ollama serve # Start in another terminal
# ollama pull llama2 # Pull a model
# Copy and configure
cp config.example.yml config.yml
# Edit config.yml as needed
# Build and run
make build
./bin/nekton-server# Check LLM connectivity
./check_llm.sh
# Test endpoints
./test_endpoints.sh
# Access Swagger UI
open http://localhost:8080/swagger/See LLM Setup Guide for detailed configuration.
docker-compose -f deployments/docker/docker-compose.yml up --buildkubectl apply -f deployments/k8s/Includes:
- MCP service
- LLM inference (vLLM)
- Redis (required for caching and session data)
- Kafka (required for real-time data pipeline)
- PostgreSQL (required for cold data storage and audit logging)
- Prometheus + Grafana (optional)
When the server is running, you can access the interactive API documentation at:
- Swagger UI: http://localhost:8080/swagger/ (or http://localhost:8080/docs/)
- Swagger JSON: http://localhost:8080/swagger/doc.json
GET /health- Health check endpoint (no authentication required)GET /api/health- Alternative health check endpointGET /debug/routes- Debug endpoint to list all registered routes
-
POST /api/v1/infer- Generate AI response- Requires: JSON body with
queryfield - Optional:
model,session_id,user_id - Returns: JSON response with AI-generated text
- Requires: JSON body with
-
POST /api/v1/infer/stream- Generate streaming AI response (Server-Sent Events)- Requires: JSON body with
queryfield - Optional:
model,session_id,user_id - Returns: SSE stream with incremental AI responses
- Requires: JSON body with
-
POST /api/v1/sessions- Create a test session- Requires: JSON body with
user_idfield - Returns: Session details including
session_id,created_at,expires_at - Purpose: For testing multi-user session functionality
- Requires: JSON body with
-
POST /api/v1/sessions/validate- Validate an existing session- Requires: JSON body with
session_idfield - Returns: Session details if valid, 404 if not found or expired
- Purpose: Check if a session is still valid
- Requires: JSON body with
To regenerate the Swagger documentation after making API changes:
make swaggerTo verify Swagger documentation is up to date:
make swagger-checkTest scripts are provided to verify endpoints:
# Test all basic endpoints
./test_endpoints.sh
# Test session management functionality
./test_sessions.sh
# Test swagger documentation
./test_swagger.sh{
"query": "Whatβs the weather like in Tokyo right now?",
"model": "llama3-8b-instruct"
}[TOOL: Weather API]
Current weather in Tokyo (as of 2025-07-25 12:00 JST): 28Β°C, clear skies.
Answer the following query using the context above:
Whatβs the weather like in Tokyo right now?
It's currently 28Β°C with clear skies in Tokyo. A perfect day for a walk!
To add a new tool:
- Create a new folder under
internal/tools/your_tool/ - Implement:
provider.go(API fetching logic)enricher.go(how to turn it into prompt text)
- Register it in
planner.goand optionallyconfig.yaml
- β Tool usage tracing + debugging UI
- β Function calling parser (OpenAI-compatible)
- π‘ Dynamic tool chaining (multi-hop)
- π‘ Local RAG support (knowledge base integration)
- π‘ Authenticated user sessions
- William Echenim (Architect)
- william.echenim@gmail.com
MIT License β see LICENSE file.