A production-ready RAG (Retrieval Augmented Generation) system for chatting with your documents
Built with privacy in mind • Extensible via MCP • Multi-provider AI support
Features • Quick Start • Architecture • MCP Integration • Configuration • API Reference
- Overview
- Features
- Architecture
- Quick Start
- MCP Integration
- Configuration
- Usage
- Development
- Troubleshooting
- FAQ
- License
Xantus is a privacy-first RAG system that lets you chat with your documents using AI. Unlike cloud-only solutions, Xantus can run completely locally or use cloud providers - your choice.
- Privacy-First: All data stays on your system with local AI
- Extensible: MCP (Model Context Protocol) integration for external tools
- Multiple UIs: Streamlit interface + OpenAI-compatible API
- Multi-Provider: Supports Ollama, OpenAI, Anthropic, and more
- Modular: Swap LLMs, embeddings, vector stores easily
- Production-Ready: Dependency injection, proper error handling, logging
- Document Chat: Upload PDFs, DOCX, TXT, Markdown and chat with them
- Semantic Search: RAG-powered retrieval with ChromaDB or Qdrant
- Multiple Interfaces:
- Clean Streamlit UI for end users
- RESTful API for integration
- Python SDK for developers
- Flexible AI Backends:
- Local: Ollama (privacy-first)
- Cloud: OpenAI, Anthropic
- Hybrid: Cloud LLM + local embeddings
- MCP Integration: Connect external tools (calculator, file system, databases)
- ⚙Configurable: YAML + environment variables
- Multiple Vector Stores: ChromaDB, Qdrant
- RAG Tuning: Adjust chunk size, overlap, top-k retrieval
- Secure: API key management via environment variables
- Scalable: Async API with proper dependency injection
Xantus is built on a modern, modular architecture:
┌─────────────────────────────────────────────────────────┐
│ User │
└────────────┬────────────────────────────┬───────────────┘
│ │
┌────────▼────────┐ ┌────────▼─────────┐
│ Streamlit UI │ │ API Clients │
│ (Port 8501) │ │ (curl, SDK) │
└────────┬────────┘ └────────┬─────────┘
│ │
└────────────┬───────────────┘
│
┌────────▼─────────┐
│ FastAPI Server │
│ (Port 8000) │
└────────┬─────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼───────┐ ┌────▼─────┐
│ Chat Service│ │Ingest Service│ │ MCP │
└──────┬──────┘ └──────┬───────┘ │ Service │
│ │ └────┬─────┘
│ │ │
┌──────▼───────────────▼──────────────▼─────┐
│ Dependency Injection Container │
│ (LLM • Embeddings • Vector Store • MCP) │
└────────────────────┬──────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌────▼────┐ ┌─────▼─────┐ ┌───▼────┐
│ LLM │ │ Embeddings│ │ Vector │
│Provider │ │ Provider │ │ Store │
└─────────┘ └───────────┘ └────────┘
│Ollama │ │HuggingFace│ │Chroma │
│OpenAI │ │ Ollama │ │Qdrant │
│Anthropic│ │ OpenAI │ └────────┘
└─────────┘ └───────────┘
┌──────────┐
│MCP Server│
│TypeScript│
└──────────┘
│Calculator│
│FileSystem│
│TextProc │
└──────────┘
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI + Python 3.10+ | High-performance async API |
| RAG Framework | LlamaIndex | Document indexing & retrieval |
| UI | Streamlit | User-friendly chat interface |
| Configuration | Pydantic + YAML | Type-safe settings |
| DI | Injector | Clean dependency injection |
| Vector DB | ChromaDB / Qdrant | Semantic search |
| MCP | Model Context Protocol | External tool integration |
xantus/
├── .env.example # Environment variable template
├── .gitignore # Git ignore patterns
├── config.yaml # Main configuration file
├── requirements.txt # Python dependencies
├── setup_mcp.sh # MCP setup automation
├── start_api.sh # API server startup script
├── start_ui.sh # UI startup script
│
├── xantus/ # Main application package
│ ├── __init__.py
│ ├── main.py # FastAPI application entry
│ ├── container.py # Dependency injection setup
│ │
│ ├── api/ # API endpoints
│ │ ├── chat_router.py # /v1/chat/completions
│ │ ├── ingest_router.py # /v1/ingest/*
│ │ └── embeddings_router.py # /v1/embeddings
│ │
│ ├── services/ # Business logic
│ │ ├── chat_service.py # RAG-powered chat
│ │ ├── ingest_service.py # Document processing
│ │ └── mcp_service.py # MCP tool orchestration
│ │
│ ├── components/ # Component factories
│ │ ├── llm/
│ │ │ └── llm_factory.py # LLM provider factory
│ │ ├── embeddings/
│ │ │ └── embedding_factory.py
│ │ └── vector_store/
│ │ └── vector_store_factory.py
│ │
│ ├── models/ # Data models
│ │ └── schemas.py # Pydantic request/response models
│ │
│ └── config/ # Configuration
│ └── settings.py # Settings management with Pydantic
│
├── ui/ # User interface
│ └── streamlit_app.py # Streamlit chat application
│
├── mcp-servers/ # MCP integration (git submodules)
│ └── mcp-starter-template-ts/ # TypeScript MCP server
│ ├── dist/ # Compiled JavaScript
│ │ └── start.js # Entry point
│ └── src/ # TypeScript source
│ └── tools/ # Tool implementations
│
├── data/ # Data directory (gitignored)
│ └── vector_store/ # Persisted vector embeddings
│
└── docs/ # Documentation
├── MCP_INTEGRATION.md # MCP technical guide
├── README_MCP.md # MCP quick start
└── SETUP_COMPLETE.md # Setup summary
- Python 3.10+ (Check:
python --version) - Node.js 18+ (For MCP integration, check:
node --version) - Git (For cloning submodules)
- (Optional) Ollama (For local AI)
# Clone with MCP submodules
git clone --recurse-submodules https://github.com/onamfc/rag-chat
cd xantus
# OR if you already cloned without submodules:
git submodule update --init --recursive# Create virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # Linux/Mac
# OR
venv\Scripts\activate # Windowspip install -r requirements.txt# This will:
# - Initialize MCP submodules
# - Install npm dependencies
# - Build TypeScript MCP server
./setup_mcp.sh# Copy the example file
cp .env.example .env
# Edit .env and add your API keys (if using cloud providers)
# For Anthropic:
XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here
# For OpenAI:
# XANTUS_LLM__API_KEY=sk-your-openai-key-hereEdit config.yaml to choose your providers:
Option A: Completely Local (Privacy-First)
llm:
provider: ollama
model: llama3.2
embedding:
provider: huggingface
model: BAAI/bge-small-en-v1.5
mcp:
enabled: true # Enable MCP toolsOption B: Cloud-Powered (Anthropic)
llm:
provider: anthropic
model: claude-sonnet-4-20250514
api_key: null # Read from .env
embedding:
provider: huggingface # Keep embeddings local
model: BAAI/bge-small-en-v1.5
mcp:
enabled: trueOption C: OpenAI
llm:
provider: openai
model: gpt-4
api_key: null # Read from .env
embedding:
provider: openai
model: text-embedding-3-small
api_key: null# Option 1: Using the startup script
./start_api.sh
# Option 2: Manual start
python -m xantus.main
# The API will be available at http://localhost:8000
# API docs at http://localhost:8000/docsYou should see:
INFO - Starting Xantus application...
INFO - Loaded settings with LLM provider: anthropic
INFO - Dependency injection container initialized
INFO - Starting server on 127.0.0.1:8000
With MCP enabled, you'll also see:
INFO - Starting MCP server 'mcp-starter-template': node mcp-servers/...
INFO - Loaded 4 tools from 'mcp-starter-template': ['calculate', 'filesystem', 'text-processing', 'weather']
# Activate venv again
source venv/bin/activate
# Start Streamlit
streamlit run ui/streamlit_app.py
# The UI will open in your browser at http://localhost:8501- Click "Upload Document" in the sidebar
- Select a PDF, TXT, DOCX, or Markdown file
- Wait for processing (you'll see the progress)
- Ask questions about your document!
Example Questions:
- "What is the main topic of this document?"
- "Summarize the key points"
- "Calculate the total revenue mentioned in section 3" (uses MCP calculator)
- "Compare this with the file in ../reports/2023.pdf" (uses MCP filesystem)
MCP (Model Context Protocol) allows Claude to use external tools while answering questions.
Your TypeScript MCP server (in mcp-servers/mcp-starter-template-ts/) provides:
| Tool | Function | Example Use |
|---|---|---|
| Calculator | Mathematical operations | "Calculate the sum of Q1-Q4 revenues" |
| File System | Read/write/list files | "Compare with last year's report in ../reports/" |
| Text Processing | Word count, sentiment, case conversion | "Analyze sentiment of customer feedback" |
| Weather | Weather data (mock) | "Check weather for event planning" |
User Question
↓
Xantus retrieves document context (RAG)
↓
Sends to Claude with available MCP tools
↓
Claude decides to use a tool (e.g., calculator)
↓
Xantus forwards tool call to MCP server (TypeScript)
↓
MCP server executes tool and returns result
↓
Claude incorporates result into answer
↓
User gets comprehensive response
In config.yaml:
mcp:
enabled: true # Set to false to disable MCP
servers:
- name: "mcp-starter-template"
command: "node"
args: ["mcp-servers/mcp-starter-template-ts/dist/start.js"]You can connect multiple MCP servers:
mcp:
enabled: true
servers:
# Your custom tools
- name: "my-tools"
command: "node"
args: ["mcp-servers/mcp-starter-template-ts/dist/start.js"]
# Database access
- name: "postgres"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
# Web search
- name: "brave-search"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-brave-search"]For complete MCP setup and customization:
- Quick Start:
README_MCP.md
Create a .env file in the project root:
# ===== LLM API Keys =====
# For Anthropic (double underscore for nested config!)
XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here
# For OpenAI
# XANTUS_LLM__API_KEY=sk-your-openai-key-here
# ===== Embedding API Keys (optional) =====
# XANTUS_EMBEDDING__API_KEY=sk-your-key-here
# ===== Override Other Settings =====
# Format: XANTUS_<SECTION>__<KEY>=value
# Examples:
# XANTUS_LLM__TEMPERATURE=0.5
# XANTUS_RAG__SIMILARITY_TOP_K=10
# XANTUS_SERVER__PORT=8001Important: Use double underscore (__) for nested configuration!
-
Install Ollama: https://ollama.com/download
-
Start Ollama:
ollama serve
-
Pull Models:
ollama pull llama3.2 # For chat ollama pull nomic-embed-text # For embeddings
-
Configure
config.yaml:llm: provider: ollama model: llama3.2 api_base: http://localhost:11434 # Default embedding: provider: ollama model: nomic-embed-text
-
Get API Key: https://console.anthropic.com/
-
Add to
.env:XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here
-
Configure
config.yaml:llm: provider: anthropic model: claude-sonnet-4-20250514 api_key: null # Read from environment temperature: 0.7 max_tokens: 4096 embedding: provider: huggingface # Use local for cost savings model: BAAI/bge-small-en-v1.5
-
Get API Key: https://platform.openai.com/api-keys
-
Add to
.env:XANTUS_LLM__API_KEY=sk-your-openai-key-here
-
Configure
config.yaml:llm: provider: openai model: gpt-4-turbo-preview api_key: null embedding: provider: openai model: text-embedding-3-small api_key: null
Fine-tune retrieval in config.yaml:
rag:
# Number of relevant chunks to retrieve
similarity_top_k: 5
# Size of text chunks (characters)
chunk_size: 1024
# Overlap between chunks (prevents context loss)
chunk_overlap: 200
# Enable advanced reranking (requires additional setup)
enable_reranking: falseTuning Guidelines:
- Larger chunks (1024-2048): Better for long-form content
- Smaller chunks (512-1024): Better for specific facts
- Higher top_k (8-10): More context but slower
- Lower top_k (3-5): Faster but may miss context
- Overlap: 15-20% of chunk_size is recommended
vector_store:
provider: chroma # or qdrant
# Path to persist vector data
persist_path: ./data/vector_store
# Collection name
collection_name: xantus_documentsserver:
host: 127.0.0.1 # Change to 0.0.0.0 for network access
port: 8000
# CORS settings
cors_enabled: true
cors_origins:
- "*" # Be more restrictive in production!The easiest way to use Xantus:
-
Start the API (terminal 1):
./start_api.sh
-
Start the UI (terminal 2):
./start_ui.sh # OR streamlit run ui/streamlit_app.py -
Navigate to http://localhost:8501
-
Upload documents via the sidebar
-
Chat with your documents!
Features:
- ✅ Document upload with progress
- ✅ Document management (list/delete)
- ✅ Chat history
- ✅ Context toggle (use RAG or not)
- ✅ Health monitoring
curl http://localhost:8000/healthResponse:
{
"status": "healthy",
"version": "0.1.0",
"components": {
"llm": "anthropic",
"embedding": "huggingface",
"vector_store": "chroma"
}
}curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What are the main findings in the report?"}
],
"use_context": true,
"stream": false
}'Response:
{
"id": "chat-123abc",
"object": "chat.completion",
"created": 1730000000,
"model": "claude-sonnet-4-20250514",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Based on the documents, the main findings are..."
},
"finish_reason": "stop"
}]
}curl -X POST http://localhost:8000/v1/ingest/file \
-F "file=@/path/to/document.pdf"Response:
{
"status": "success",
"document_id": "doc_abc123",
"chunks_created": 42
}curl http://localhost:8000/v1/ingest/documentscurl -X DELETE http://localhost:8000/v1/ingest/documents/doc_abc123curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "Text to embed", "model": "default"}'import requests
# Start a session
session = requests.Session()
api_url = "http://localhost:8000"
# Upload a document
with open("document.pdf", "rb") as f:
response = session.post(
f"{api_url}/v1/ingest/file",
files={"file": f}
)
print(f"Uploaded: {response.json()}")
# Chat with RAG
response = session.post(
f"{api_url}/v1/chat/completions",
json={
"messages": [
{"role": "user", "content": "Summarize the key points"}
],
"use_context": True,
"stream": False
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])- Privacy First: Default to local, support cloud
- Modularity: Easy to swap any component
- Simplicity: Minimal abstractions
- Type Safety: Pydantic everywhere
- Production Ready: Proper DI, error handling, logging
-
Add to settings (
xantus/config/settings.py):provider: Literal["ollama", "openai", "anthropic", "your-provider"]
-
Implement factory (
xantus/components/llm/llm_factory.py):def _create_your_provider_llm(config: LLMConfig) -> LLM: return YourProviderLLM( model=config.model, api_key=config.api_key, temperature=config.temperature )
-
Update factory dispatch:
elif config.provider == "your-provider": return _create_your_provider_llm(config)
Similar process in xantus/components/vector_store/vector_store_factory.py
# Format code
black xantus/
# Lint
ruff check xantus/
# Type check
mypy xantus/# Install test dependencies
pip install pytest pytest-asyncio
# Run tests
pytest tests/Solution: Ensure Ollama is running
ollama serveSolution: Check your .env file:
# Correct (double underscore!):
XANTUS_LLM__API_KEY=sk-ant-...
# Wrong (single underscore):
XANTUS_LLM_API_KEY=sk-ant-...Solution: Ensure you're in the right directory
cd xantus
python -c "import xantus; print('OK')"Solution: Build the MCP server
./setup_mcp.sh
# OR manually:
cd mcp-servers/mcp-starter-template-ts
npm install
npm run buildSolution: Kill existing processes or change port
# Kill existing
pkill -f "python.*xantus"
# OR change port in config.yaml:
server:
port: 8001Solution: Clear and recreate
rm -rf data/vector_store
mkdir -p data/vector_store
# Restart server, re-upload documentsEnable verbose logging:
# In xantus/main.py
import logging
logging.basicConfig(level=logging.DEBUG)Q: Does my data leave my machine? A: Only if you use cloud providers (OpenAI/Anthropic). With Ollama + HuggingFace, everything stays local.
Q: Which is faster - local or cloud? A: Cloud (OpenAI/Anthropic) is usually faster. Local (Ollama) depends on your hardware.
Q: Can I use multiple documents? A: Yes! Upload as many as you want. They're all indexed in the vector store.
Q: What's the maximum document size? A: No hard limit, but larger documents take longer to process.
Q: Can I delete documents?
A: Yes, via the API /v1/ingest/documents/{doc_id} or Streamlit UI.
Q: Is streaming supported?
A: Yes! Set "stream": true in chat completion requests.
Q: What LLM is best? A:
- Best quality: Claude Sonnet 4, GPT-4
- Best local: Llama 3.2, Mistral
- Best balance: Claude Haiku, GPT-3.5-turbo
Q: How do I add authentication?
A: Add FastAPI middleware in xantus/main.py for API key or OAuth.
- MCP Quick Start:
README_MCP.md - API Documentation: http://localhost:8000/docs (when running)
- LlamaIndex Docs: https://docs.llamaindex.ai/
- FastAPI Docs: https://fastapi.tiangolo.com/
- Streamlit Docs: https://docs.streamlit.io/
Contributions are welcome! This project is designed to be:
- Easy to understand
- Simple to extend
- Well-documented
Feel free to:
- Add new providers
- Improve the UI
- Enhance MCP tools
- Fix bugs
- Improve documentation
This project is provided as-is for educational and research purposes.
Built with:
- FastAPI - Modern async web framework
- LlamaIndex - RAG framework
- Streamlit - Data apps framework
- ChromaDB - Vector database
- Ollama - Local LLM runtime
- Model Context Protocol - Tool integration
Made with ❤️ for the open source community
