Skip to content

A production-grade Model Context Protocol (MCP) server that augments local LLM inference (via vLLM or Ollama) with real-time external data (e.g., weather, finance, news) to make responses dynamic, accurate, and context-aware. This system is designed for modularity, scalability, and performance in local or hybrid LLM deployments with support for co

Notifications You must be signed in to change notification settings

echenim/Nekton-Server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Nekton MCP Server

A production-grade Model Context Protocol (MCP) server that augments local LLM inference (via vLLM or Ollama) with real-time external data (e.g., weather, finance, news) to make responses dynamic, accurate, and context-aware.

This system is designed for modularity, scalability, and performance in local or hybrid LLM deployments with support for concurrent multi-user interactions.


🧠 Key Features

  • βœ… LLM-Aware Prompt Augmentation
  • πŸ”Œ Real-Time Tool Integration (Weather, Finance, News APIs)
  • 🧱 Modular Architecture (Plug-in style tool injection)
  • πŸš€ High-performance Go backend
  • πŸ“¦ Support for vLLM/Ollama via HTTP
  • 🌐 React frontend
  • πŸ› οΈ Testable and extensible core logic
  • πŸ“Š Observability + Tracing support
  • 🐳 Docker/Kubernetes deployment ready
  • πŸ‘₯ Concurrent Multi-User Support: Handles simultaneous user queries while maintaining context isolation and data security.

πŸ—‚οΈ Project Structure

nekton/
β”œβ”€β”€ nekton-client/                        # React frontend
β”‚   β”œβ”€β”€ web/                              # Frontend (React)
β”‚   β”œβ”€β”€ Makefile
β”‚   └── README.md
β”‚
└── nekton-server/                        # Backend (Hexagonal MCP)
    β”œβ”€β”€ cmd/                              # Entrypoints and wire-up
    β”‚   └── api/
    β”‚       β”œβ”€β”€ main.go                   # Starts the server
    β”‚       └── container.go              # DI container (using dig or fx)
    β”‚
    β”œβ”€β”€ internal/
    β”‚   β”œβ”€β”€ domain/                       # Core business logic (independent)
    β”‚   β”‚   β”œβ”€β”€ contextor/                # Planner, enricher, prompt builder
    β”‚   β”‚   β”‚   β”œβ”€β”€ planner.go
    β”‚   β”‚   β”‚   β”œβ”€β”€ enricher.go
    β”‚   β”‚   β”‚   └── prompt_builder.go
    β”‚   β”‚   β”œβ”€β”€ tool/                     # Tool behavior and data contract
    β”‚   β”‚   β”‚   └── tool.go
    β”‚   β”‚   β”œβ”€β”€ llm/                      # LLM inference port interface
    β”‚   β”‚   β”‚   └── llm.go
    β”‚   β”‚   β”œβ”€β”€ session/                  # Session management for multi-user support
    β”‚   β”‚   β”‚   └── session_manager.go    # Handles user session lifecycle
    β”‚   β”‚   └── port/                     # Hexagonal ports (interfaces)
    β”‚   β”‚       β”œβ”€β”€ tool_port.go
    β”‚   β”‚       β”œβ”€β”€ llm_port.go
    β”‚   β”‚       β”œβ”€β”€ session_port.go
    β”‚   β”‚       └── audit_port.go
    β”‚
    β”‚   β”œβ”€β”€ adapter/                      # External systems (driven adapters)
    β”‚   β”‚   β”œβ”€β”€ http/                     # HTTP server adapter (fasthttp)
    β”‚   β”‚   β”‚   └── handler.go
    β”‚   β”‚   β”œβ”€β”€ llm/                      # LLM backend (vLLM/Ollama)
    β”‚   β”‚   β”‚   └── vllm_client.go
    β”‚   β”‚   β”œβ”€β”€ tool/
    β”‚   β”‚   β”‚   β”œβ”€β”€ weather_adapter.go
    β”‚   β”‚   β”‚   β”œβ”€β”€ finance_adapter.go
    β”‚   β”‚   β”‚   └── news_adapter.go
    β”‚   β”‚   β”œβ”€β”€ redis/                    # Redis for sessions + cache
    β”‚   β”‚   β”‚   └── redis_store.go
    β”‚   β”‚   β”œβ”€β”€ kafka/                    # Kafka producer/consumer
    β”‚   β”‚   β”‚   └── kafka_client.go
    β”‚   β”‚   └── postgres/                 # PostgreSQL audit + cold data
    β”‚   β”‚       └── audit_logger.go
    β”‚
    β”‚   β”œβ”€β”€ infra/                        # Logging, config, tracing, shared
    β”‚   β”‚   β”œβ”€β”€ config/
    β”‚   β”‚   β”œβ”€β”€ logger/
    β”‚   β”‚   β”œβ”€β”€ observability/
    β”‚   β”‚   └── errors/
    β”‚
    β”‚   └── tests/                        # Unit + integration tests
    β”‚       β”œβ”€β”€ mocks/
    β”‚       └── integration/
    β”‚
    β”œβ”€β”€ scripts/                          # Bootstrap & operational scripts
    β”œβ”€β”€ api/                              # OpenAPI / gRPC definitions
    β”œβ”€β”€ deployments/                      # Docker + K8s manifests
    β”œβ”€β”€ Makefile
    β”œβ”€β”€ go.mod
    └── README.md

User Flow

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#BB2528',
      'primaryTextColor': '#fff',
      'primaryBorderColor': '#7C0000',
      'lineColor': '#F8B229',
      'secondaryColor': '#006100',
      'tertiaryColor': '#fff',
      'lineWidth': 8,
      'fontSize':30
    }
  }
}%%
flowchart TD
    A[User Input] --> B[API Gateway HTTP/gRPC]
    B --> C[Context Planner]
    C --> D[Tool Orchestrator]
    D --> E[Prompt Builder]
    E --> F[LLM Inference Engine<br/>vLLM / Ollama]
    F --> G[Response Handler]
    G --> H[User Output]
    D --> T[Weather, Finance, News APIs]
    T --> D
    D --> C[Feedback to Planner]
    B --> I[Session Manager] --> J[Manage User Context]
    I --> K[Redis Cache]
Loading

🧩 Architecture Overview

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#BB2528',
      'primaryTextColor': '#fff',
      'primaryBorderColor': '#7C0000',
      'lineColor': '#F8B229',
      'secondaryColor': '#006100',
      'tertiaryColor': '#fff',
      'lineWidth': 8,
      'fontSize':30
    }
  }
}%%
flowchart TD
    UI[Client UI] --> APIGW[REST/gRPC API Gateway]

    subgraph Gateway
        APIGW --> Auth[Auth + Rate Limiting]
        APIGW --> Planner[Context Planner + Tool Router]
        APIGW --> Redis[Session Cache Redis]
    end

    Planner -->|Dispatch| WeatherTool[Weather Tool]
    Planner -->|Dispatch| FinanceTool[Finance Tool]
    Planner -->|Dispatch| NewsTool[News Tool]

    WeatherTool --> WeatherEnricher[Weather Tool Enricher]
    FinanceTool --> FinanceEnricher[Finance Tool Enricher]
    NewsTool --> NewsEnricher[News Tool Enricher]

    WeatherEnricher --> PromptBuilder
    FinanceEnricher --> PromptBuilder
    NewsEnricher --> PromptBuilder

    PromptBuilder[Prompt Builder] --> LLM[vLLM / Ollama]
    LLM --> RespBuilder[Response Builder]
    RespBuilder --> Postgres[PostgreSQL audit + cold DB]
    APIGW --> SessionManager[Session Manager] --> Redis
Loading

🧩 Multi-User Support Details

  1. Session Management: The SessionManager component ensures that each user's query is handled in an isolated context. It supports user authentication, maintains state across requests, and stores session data in Redis for fast retrieval.

  2. Context Isolation: Each user interaction with the LLM is processed independently, ensuring that context for one user does not interfere with another. This is critical for handling concurrent users.

  3. Rate Limiting: Each user is subject to configurable rate limits to prevent abuse and ensure system stability.


πŸ›  Core Components

cmd/

Service entrypoints:

  • api-gateway: Starts the HTTP API using fasthttp for high-performance routing
  • worker: Background job runner (e.g., cron for refreshes)
  • tools-runner: Manual tool testing CLI

internal/api/

Versioned REST/gRPC API layer.

  • handlers/: Input/output translation
  • controllers/: Business logic
  • middleware/: Auth, logging, rate limiting
  • schemas/: Request/response type definitions

internal/contextor/

The MCP core. Determines what tools to use and how to inject data:

  • engine.go: Overall pipeline
  • planner.go: Chooses relevant tools
  • enricher.go: Gathers external context
  • prompt_builder.go: Final prompt construction

internal/inference/

LLM client abstraction:

  • client.go: HTTP interface with vLLM or Ollama
  • models.go: Model registry/config
  • streamer.go: Streaming completion support

internal/tools/

Modular tool adapters:

  • weather/: OpenWeatherMap, Tomorrow.io, etc.
  • finance/: Stock data, crypto, market sentiment
  • news/: RSS feeds, Google News, etc.

Each tool has:

  • provider.go: Fetches external data
  • enricher.go: Formats data into LLM-ready prompt fragments

internal/core/

Shared infrastructure:

  • config/: Env loading, config structs
  • logger/: Structured logging (zap or slog)
  • observability/: Prometheus, tracing
  • errors/: App-specific error types

web/

React client:

  • Live chat interface with streaming response
  • Model switcher and tool visualizer

πŸ§ͺ Testing Strategy

  • Unit tests for context planners, prompt builders, tools
  • Integration tests for full input β†’ output validation
  • Mock tools and inference clients for repeatable tests

πŸš€ Quick Start

Prerequisites

  1. Go 1.21+ - Install Go
  2. vLLM or Ollama - For LLM inference
    • vLLM (default): pip install vllm - See vLLM documentation
    • Ollama (alternative): curl -fsSL https://ollama.ai/install.sh | sh
  3. Optional Services:
    • Redis - For session persistence
    • Kafka - For tool orchestration
    • PostgreSQL - For audit logging

Basic Setup

# Clone the repository
git clone https://github.com/echenim/Nekton-Server.git
cd Nekton-Server

# Install dependencies
make deps

# Setup vLLM (default LLM provider)
pip install vllm
# Start vLLM server in another terminal
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000

# Or setup Ollama (alternative)
# curl -fsSL https://ollama.ai/install.sh | sh
# ollama serve  # Start in another terminal
# ollama pull llama2  # Pull a model

# Copy and configure
cp config.example.yml config.yml
# Edit config.yml as needed

# Build and run
make build
./bin/nekton-server

Verify Installation

# Check LLM connectivity
./check_llm.sh

# Test endpoints
./test_endpoints.sh

# Access Swagger UI
open http://localhost:8080/swagger/

See LLM Setup Guide for detailed configuration.


βš™οΈ Deployment

Docker Compose (Dev)

docker-compose -f deployments/docker/docker-compose.yml up --build

Kubernetes (Prod)

kubectl apply -f deployments/k8s/

Includes:

  • MCP service
  • LLM inference (vLLM)
  • Redis (required for caching and session data)
  • Kafka (required for real-time data pipeline)
  • PostgreSQL (required for cold data storage and audit logging)
  • Prometheus + Grafana (optional)

πŸ“š API Documentation

Swagger/OpenAPI Documentation

When the server is running, you can access the interactive API documentation at:

API Endpoints

System Endpoints

  • GET /health - Health check endpoint (no authentication required)
  • GET /api/health - Alternative health check endpoint
  • GET /debug/routes - Debug endpoint to list all registered routes

AI Inference Endpoints

  • POST /api/v1/infer - Generate AI response

    • Requires: JSON body with query field
    • Optional: model, session_id, user_id
    • Returns: JSON response with AI-generated text
  • POST /api/v1/infer/stream - Generate streaming AI response (Server-Sent Events)

    • Requires: JSON body with query field
    • Optional: model, session_id, user_id
    • Returns: SSE stream with incremental AI responses

Testing/Session Management Endpoints

  • POST /api/v1/sessions - Create a test session

    • Requires: JSON body with user_id field
    • Returns: Session details including session_id, created_at, expires_at
    • Purpose: For testing multi-user session functionality
  • POST /api/v1/sessions/validate - Validate an existing session

    • Requires: JSON body with session_id field
    • Returns: Session details if valid, 404 if not found or expired
    • Purpose: Check if a session is still valid

Development Commands

To regenerate the Swagger documentation after making API changes:

make swagger

To verify Swagger documentation is up to date:

make swagger-check

Testing Endpoints

Test scripts are provided to verify endpoints:

# Test all basic endpoints
./test_endpoints.sh

# Test session management functionality
./test_sessions.sh

# Test swagger documentation
./test_swagger.sh

πŸ“š Usage Example

Input

{
  "query": "What’s the weather like in Tokyo right now?",
  "model": "llama3-8b-instruct"
}

MCP-Generated Prompt

[TOOL: Weather API]
Current weather in Tokyo (as of 2025-07-25 12:00 JST): 28Β°C, clear skies.

Answer the following query using the context above:
What’s the weather like in Tokyo right now?

Output

It's currently 28Β°C with clear skies in Tokyo. A perfect day for a walk!


🧱 Extending the Protocol

To add a new tool:

  1. Create a new folder under internal/tools/your_tool/
  2. Implement:
    • provider.go (API fetching logic)
    • enricher.go (how to turn it into prompt text)
  3. Register it in planner.go and optionally config.yaml

πŸ“¬ Future Enhancements

  • βœ… Tool usage tracing + debugging UI
  • βœ… Function calling parser (OpenAI-compatible)
  • 🟑 Dynamic tool chaining (multi-hop)
  • 🟑 Local RAG support (knowledge base integration)
  • 🟑 Authenticated user sessions

πŸ‘¨β€πŸ’» Maintainers


πŸ“„ License

MIT License β€” see LICENSE file.

About

A production-grade Model Context Protocol (MCP) server that augments local LLM inference (via vLLM or Ollama) with real-time external data (e.g., weather, finance, news) to make responses dynamic, accurate, and context-aware. This system is designed for modularity, scalability, and performance in local or hybrid LLM deployments with support for co

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages