A Retrieval-Augmented Generation (RAG) MCP server for markdown documentation with semantic search capabilities.
- Document Indexing: Process markdown files with YAML frontmatter support, automatic chunking, and metadata extraction
- Semantic Search: Find relevant content using natural language queries with configurable similarity thresholds
- Incremental Updates: Change detection and indexing for large document collections
- Real-time Monitoring: Automatic file system monitoring with live index updates
- Advanced Embeddings: HuggingFace sentence-transformers with local model execution
- Vector Storage: High-performance Milvus vector database with Docker Compose setup
- CLI Interface: Beautiful command-line tools with progress tracking and interactive demos
This system is designed as an MCP server, providing a search tool with semantic search functionality accessible via MCP protocol.
For the full system architecture and components overview, check the Architecture Guide.
- Python 3.12+
- Docker and Docker Compose
- 
Clone and setup: git clone <repository-url> cd markdown-rag-mcp 
- 
Start Milvus database: docker-compose -f docker/docker-compose.yml up -d 
- 
Install dependencies using uv uv sync 
- 
Install the package: pip install -e .
# Index documents (with optional monitoring)
markdown-rag-mcp index ./documents --recursive --watch
# Semantic search with confidence scoring
markdown-rag-mcp search "authentication setup" --limit 5
# System health monitoring
markdown-rag-mcp statusFor the full overview of the CLI interface, check the CLI Guide.
# Experience incremental indexing with performance metrics
python examples/incremental_indexing_demo.py --setup --runs 5
# Complete RAG pipeline demonstration
python examples/milvus_embeddings_demo.pyFor the full list of demo scripts, check the Examples Guide.
Configure via environment variables or .env file, you can use .env.example for some defaults:
# Vector Database Configuration
MARKDOWN_RAG_MCP_MILVUS_HOST=localhost
MARKDOWN_RAG_MCP_MILVUS_PORT=19530
MARKDOWN_RAG_MCP_COLLECTION_NAME=markdown_docs
# Embedding Model Settings
MARKDOWN_RAG_MCP_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
MARKDOWN_RAG_MCP_EMBEDDING_DEVICE=auto  # cpu, cuda, mps, auto
MARKDOWN_RAG_MCP_EMBEDDING_DIMENSIONS=384
# Search and Processing
MARKDOWN_RAG_MCP_SIMILARITY_THRESHOLD=0.7
MARKDOWN_RAG_MCP_CHUNK_SIZE_LIMIT=1000
MARKDOWN_RAG_MCP_CHUNK_OVERLAP=200
MARKDOWN_RAG_MCP_MAX_CONCURRENT_INDEXING=2
# File Monitoring
MARKDOWN_RAG_MCP_WATCH_DEBOUNCE_SECONDS=2
MARKDOWN_RAG_MCP_WATCH_PATTERNS="**/*.md,**/*.markdown"markdown-rag-mcp/
βββ src/markdown_rag_mcp/         # Core library implementation
β   βββ cli/                      # Command-line interface
β   βββ config/                   # Configuration management
β   βββ core/                     # RAG engine and interfaces
β   βββ embeddings/               # Embedding providers
β   βββ indexing/                 # Document processing pipeline
β   βββ models/                   # Data models and schemas
β   βββ monitoring/               # File system monitoring
β   βββ parsers/                  # Markdown and frontmatter parsing
β   βββ search/                   # Query processing and search
β   βββ storage/                  # Vector database integration
βββ tests/                        # Comprehensive test suite
βββ examples/                     # Demo scripts
βββ docker/                       # Docker Compose configuration
βββ specs/                        # Technical specifications
βββ documents/                    # Markdown documents for indexing and searching
To run the test suite, use the following commands:
# Run complete test suite
uv sync --all-extras
pytest
# Run specific component tests
pytest tests/indexing/ -v
pytest tests/search/ -v
pytest tests/embeddings/ -v- Architecture Guide: Detailed system architecture and components overview
- CLI Guide: Command-line interface guide
- Examples Guide: Demo scripts
MIT License - see LICENSE file for details.
Built with β€οΈ for developers who need intelligent, markdown-based document search capabilities