Skip to content

Docsray is the easiest way for your AI / Agent (MCP client like Claude Pro, Cursor, Claude Agent, N8n) can have x-ray vision on a document, spreadsheet, presentation, image, etc. :) Magic? No, it's just regular old AI superpowers, built by humans for humans.

License

Notifications You must be signed in to change notification settings

Anant/docsray-mcp

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Docsray MCP Server

PyPI License: Apache 2.0 Python 3.9+ MCP Status Netlify Status

Docsray is a powerful Model Context Protocol (MCP) server that gives AI assistants like Claude advanced document perception capabilities. Extract text, navigate pages, analyze structure, and understand any document with ease.

βœ… Status: Published to PyPI and TestPyPI - Working in Cursor, Claude Desktop, and other MCP clients

✨ Features

🎯 Seven Powerful Tools

  1. docsray_peek - Quick document overview with format detection and provider capabilities
  2. docsray_map - Generate comprehensive document structure maps with caching
  3. docsray_xray - AI-powered deep analysis extracting entities, relationships, and insights
  4. docsray_extract - Extract content in multiple formats (markdown, text, JSON, tables)
  5. docsray_seek - Navigate to specific pages, sections, or search for content
  6. docsray_fetch - Unified document retrieval from web URLs or filesystem with caching
  7. docsray_search - Intelligent filesystem search using coarse-to-fine methodology

πŸ”Œ Multi-Provider Architecture

  • PyMuPDF4LLM - Lightning-fast PDF processing (βœ… Implemented)

    • Fast markdown extraction
    • Basic table detection
    • Multi-page support
    • Always enabled as fallback
  • LlamaParse - Deep document understanding with LLMs (βœ… Implemented)

    • AI-powered entity extraction
    • Custom analysis instructions
    • Comprehensive caching in .docsray directories
    • Rich format preservation (markdown, images, tables)
  • IBM.Docling - Advanced document understanding (βœ… Implemented)

    • Best-in-class layout understanding
    • Visual Language Model integration
    • Advanced table and figure detection
    • Multi-format support (PDF, DOCX, HTML, images)
    • Reading order preservation
    • Structured extraction capabilities
  • MIMIC.DocsRay - Coarse-to-fine search methodology (βœ… Implemented)

    • Semantic search with RAG capabilities
    • Hybrid OCR engine (AI + traditional)
    • Document chunking and embedding
    • Multimodal analysis
    • Filesystem search optimization
    • Context-aware analysis
  • PyTesseract - OCR for scanned documents (πŸ”„ Planned)

  • Mistral OCR - AI-powered OCR and analysis (πŸ”„ Planned)

πŸš€ Key Benefits

  • Universal Input Support - Local files (./path, ../path, /absolute) and URLs (https://)
  • Intelligent Provider Selection - Automatically chooses the best tool for each task
  • Smart Caching - LlamaParse results cached in .docsray directories for instant access
  • Dynamic Discovery - Tools report actual capabilities based on what's enabled
  • Production Ready - Comprehensive error handling, logging, and 56 tests
  • Self-Documenting - Built-in resources for discovery by MCP clients

πŸ“¦ Installation

Quick Start with uvx (Recommended)

# Run directly without installation
uvx docsray-mcp start

# Or install globally
uv tool install docsray-mcp
# Then run with:
docsray start
# or
docsray-mcp start

Alternative: Install with pip

# Basic installation (PyMuPDF4LLM only)
pip install docsray-mcp

# With LlamaParse for AI analysis
pip install "docsray-mcp[ai]"

# Development installation
pip install -e ".[dev]"

🐳 Docker Installation

Option 1: Docker Hub (Recommended)

# Pull from Docker Hub
docker pull xingh/docsray-mcp:latest

# Run in stdio mode
docker run -it --rm xingh/docsray-mcp:latest

# Run in HTTP mode
docker run -it --rm -p 3000:3000 -e DOCSRAY_TRANSPORT=http xingh/docsray-mcp:latest

Option 2: GitHub Container Registry

# Pull from GHCR
docker pull ghcr.io/xingh/docsray-mcp:latest

# Run (same commands as above, just different image)
docker run -it --rm ghcr.io/xingh/docsray-mcp:latest

Available Tags:

  • latest - Latest stable release
  • 0.6.0 - Specific version
  • dev - Development builds from main branch

Development with VS Code DevContainer:

  1. Install the "Dev Containers" extension
  2. Open project in VS Code
  3. Click "Reopen in Container"
  4. Includes Claude Desktop pre-configured!

See Docker Guide for complete documentation.

πŸš€ Quick Start

1. Set up API Keys (Optional but Recommended)

Create a .env file in your project:

# For AI-powered analysis with LlamaParse
# Either use the Docsray-specific env var (preferred):
DOCSRAY_LLAMAPARSE_API_KEY=llx-your-key-here

# Or use the standard LlamaParse env var (also supported):
# LLAMAPARSE_API_KEY=llx-your-key-here

# Note: DOCSRAY_LLAMAPARSE_API_KEY takes precedence if both are set

# Or use environment variables
export DOCSRAY_LLAMAPARSE_API_KEY=llx-your-key-here
# export LLAMAPARSE_API_KEY=llx-your-key-here  # Alternative

Get your free LlamaParse API key at cloud.llamaindex.ai

2. Configure with Your MCP Client

For Cursor

Add to your Cursor settings:

{
  "mcpServers": {
    "docsray": {
      "command": "uvx",
      "args": ["docsray-mcp"],
      "env": {
        "LLAMAPARSE_API_KEY": "llx-your-key-here"
      }
    }
  }
}

Note: You can use either LLAMAPARSE_API_KEY (shown above) or DOCSRAY_LLAMAPARSE_API_KEY in the MCP client configuration.

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "docsray": {
      "command": "uvx",
      "args": ["docsray-mcp"],
      "env": {
        "LLAMAPARSE_API_KEY": "llx-your-key-here"
      }
    }
  }
}

Note: You can use either LLAMAPARSE_API_KEY (shown above) or DOCSRAY_LLAMAPARSE_API_KEY in the MCP client configuration.

πŸ“š Usage Examples

Basic Document Overview

Peek at ./document.pdf to see its structure and available formats

Extract Entities from Contracts

Xray ./contract.pdf and extract all parties, dates, payment terms, and obligations

Navigate Documents

Map the complete structure of ./manual.pdf including all sections and subsections

Extract Specific Content

Extract pages 10-20 from ./report.pdf as markdown

Analyze Web Documents

Analyze https://arxiv.org/pdf/2301.00234.pdf for methodology and key findings

Fetch Documents from Web or Filesystem

Fetch https://example.com/document.pdf with processed format
Fetch ./local/document.pdf with metadata-only format

Search Documents Intelligently

Search for "machine learning" in ./research/ with coarse-to-fine strategy
Find documents about "contracts" in /legal/ using semantic search

Compare Providers

Extract text from document.pdf with provider pymupdf4llm (fast)
Xray document.pdf with provider llama-parse (AI analysis)
Analyze document.pdf with provider ibm-docling (advanced layout)
Search documents with provider mimic-docsray (semantic)

πŸ› οΈ Advanced Configuration

Environment Variables

# Provider Configuration
DOCSRAY_PYMUPDF4LLM_ENABLED=true  # Always true by default
DOCSRAY_LLAMAPARSE_ENABLED=true
LLAMAPARSE_API_KEY=llx-your-key

# IBM.Docling Provider
DOCSRAY_IBM_DOCLING_ENABLED=false
DOCSRAY_IBM_DOCLING_USE_VLM=true
DOCSRAY_IBM_DOCLING_USE_ASR=false
DOCSRAY_IBM_DOCLING_OCR_ENABLED=true
DOCSRAY_IBM_DOCLING_TABLE_DETECTION=true
DOCSRAY_IBM_DOCLING_FIGURE_DETECTION=true
DOCSRAY_IBM_DOCLING_DEVICE=cpu  # or cuda

# MIMIC.DocsRay Provider
DOCSRAY_MIMIC_ENABLED=false
DOCSRAY_MIMIC_RAG_ENABLED=true
DOCSRAY_MIMIC_SEMANTIC_RANKING=true
DOCSRAY_MIMIC_MULTIMODAL=true
DOCSRAY_MIMIC_HYBRID_OCR=true
DOCSRAY_MIMIC_COARSE_TO_FINE=true
DOCSRAY_MIMIC_CHUNK_SIZE=1000
DOCSRAY_MIMIC_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Performance Tuning
DOCSRAY_CACHE_ENABLED=true
DOCSRAY_CACHE_TTL=3600
DOCSRAY_MAX_CONCURRENT_REQUESTS=5
DOCSRAY_TIMEOUT_SECONDS=30

# Logging
DOCSRAY_LOG_LEVEL=INFO

Provider Capabilities

PyMuPDF4LLM (Always Available)

  • βœ… Fast text extraction
  • βœ… Markdown formatting
  • βœ… Basic table detection
  • βœ… Multi-page support
  • ❌ No AI analysis
  • ❌ No OCR

LlamaParse (When API Key Configured)

  • βœ… AI-powered analysis
  • βœ… Entity extraction
  • βœ… Custom instructions
  • βœ… Table extraction
  • βœ… Image extraction
  • βœ… Layout preservation
  • βœ… Relationship mapping
  • βœ… Result caching

IBM.Docling (When Enabled)

  • βœ… Advanced layout understanding
  • βœ… Visual Language Model integration
  • βœ… Best-in-class table detection
  • βœ… Figure classification and understanding
  • βœ… Multi-format support (PDF, DOCX, HTML, images)
  • βœ… Reading order preservation
  • βœ… Structured information extraction
  • βœ… Document classification
  • βœ… OCR with layout understanding
  • βœ… Form field detection
  • βœ… Multi-language support

MIMIC.DocsRay (When Enabled)

  • βœ… Coarse-to-fine search methodology
  • βœ… Semantic search with RAG
  • βœ… Document chunking and embedding
  • βœ… Hybrid OCR (AI + traditional)
  • βœ… Multimodal analysis
  • βœ… Context-aware analysis
  • βœ… Filesystem search optimization
  • βœ… Semantic ranking
  • βœ… Entity extraction
  • βœ… Relationship mapping

πŸ§ͺ Testing

# Run all tests
pytest tests/

# Run only unit tests (no API calls)
pytest tests/unit/

# Run integration tests
pytest tests/integration/

# Run with coverage
pytest tests/ --cov=src/docsray --cov-report=html

Current test coverage: 52 tests passing with comprehensive coverage across all components

πŸ“– API Reference

Tool: docsray_peek

Get quick document overview and metadata.

{
  "document_url": "path/to/document.pdf",
  "depth": "structure",  # metadata | structure | preview
  "provider": "auto"     # auto | pymupdf4llm | llama-parse
}

Tool: docsray_map

Generate comprehensive document structure map.

{
  "document_url": "path/to/document.pdf",
  "include_content": false,
  "analysis_depth": "deep",  # basic | deep | comprehensive
  "provider": "auto"
}

Tool: docsray_xray

Deep AI-powered document analysis.

{
  "document_url": "path/to/document.pdf",
  "analysis_type": ["entities", "key-points"],
  "custom_instructions": "Extract all dates and amounts",
  "provider": "llama-parse"
}

Tool: docsray_extract

Extract content in various formats.

{
  "document_url": "path/to/document.pdf",
  "extraction_targets": ["text", "tables"],
  "output_format": "markdown",  # markdown | text | json
  "pages": [1, 2, 3],  # Optional: specific pages
  "provider": "auto"
}

Tool: docsray_seek

Navigate to specific document locations.

{
  "document_url": "path/to/document.pdf",
  "target": {"page": 5},  # or {"section": "Introduction"} or {"query": "search text"}
  "extract_content": true,
  "provider": "auto"
}

Tool: docsray_fetch

Unified document retrieval from web URLs or filesystem.

{
  "source": "https://example.com/doc.pdf",  # or "./local/path.pdf"
  "fetch_options": {"timeout": 30000, "headers": {}},
  "cache_strategy": "use-cache",  # use-cache | bypass-cache | refresh-cache
  "return_format": "processed",  # raw | processed | metadata-only
  "provider": "auto"
}

Tool: docsray_search

Intelligent filesystem search with coarse-to-fine methodology.

{
  "query": "machine learning algorithms",
  "searchPath": "./research/",
  "searchStrategy": "coarse-to-fine",  # coarse-to-fine | semantic | keyword | hybrid
  "fileTypes": ["pdf", "docx", "md"],
  "maxResults": 10,
  "provider": "mimic-docsray"
}

πŸ—οΈ Architecture

docsray-mcp/
β”œβ”€β”€ src/docsray/
β”‚   β”œβ”€β”€ server.py           # FastMCP server with discovery resources
β”‚   β”œβ”€β”€ providers/          # Provider implementations
β”‚   β”‚   β”œβ”€β”€ base.py        # Provider interface
β”‚   β”‚   β”œβ”€β”€ pymupdf4llm.py # Fast PDF extraction
β”‚   β”‚   └── llamaparse.py  # AI-powered analysis
β”‚   β”œβ”€β”€ tools/             # MCP tool implementations
β”‚   β”‚   β”œβ”€β”€ peek.py        # Document overview
β”‚   β”‚   β”œβ”€β”€ map.py         # Structure mapping
β”‚   β”‚   β”œβ”€β”€ xray.py        # Deep analysis
β”‚   β”‚   β”œβ”€β”€ extract.py     # Content extraction
β”‚   β”‚   └── seek.py        # Navigation
β”‚   └── utils/             # Utilities
β”‚       β”œβ”€β”€ cache.py       # Document caching
β”‚       └── llamaparse_cache.py  # LlamaParse .docsray cache
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit/              # Fast isolated tests
β”‚   β”œβ”€β”€ integration/       # Component interaction tests
β”‚   └── manual/            # Debugging scripts
└── PROMPTS.md            # Example prompts for all use cases

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/docsray/docsray-mcp.git
cd docsray-mcp

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run linting
ruff check src/

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“¬ Support


Made with ❀️ for the MCP ecosystem

About

Docsray is the easiest way for your AI / Agent (MCP client like Claude Pro, Cursor, Claude Agent, N8n) can have x-ray vision on a document, spreadsheet, presentation, image, etc. :) Magic? No, it's just regular old AI superpowers, built by humans for humans.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.3%
  • Shell 2.6%
  • Makefile 1.2%
  • Dockerfile 0.9%