English | 中文
An intelligent document content extraction service built with Rust's official MCP SDK, supporting multiple document formats and integrated with LLM APIs for smart content extraction, running as a standard MCP server.
- 🚀 High-Performance Async Architecture - Built on Tokio async runtime
- 🧠 Smart Content Extraction - Integrated with multiple LLM APIs
- 📄 Multi-Format Support - TXT, MD, JSON, YAML, TOML, XML, CSV
- 🔧 MCP Server - Standard Model Context Protocol server
- ⚙️ Flexible Configuration - Support for config files and environment variables
- 🐳 Container Support - Docker deployment ready
- 🧪 Complete Testing - Unit and integration test coverage
- Rust 1.75+
- LLM API key (OpenAI, Claude, Alibaba Cloud, etc.)
git clone https://github.com/yourusername/mcp-smart-fetch.git
cd mcp-smart-fetch
cargo build --release- Copy environment variable example:
cp .env.example .env- Edit
.envfile with your API key:
LLM_API_KEY="your-api-key-here"
LLM_MODEL="gpt-4"
LLM_API_ENDPOINT="https://api.openai.com/v1/chat/completions"cargo run -- extract input.txt
cargo run -- extract --input document.pdf --prompt "Summarize key points"
cargo run -- extract -i data.json -o result.txtcargo run -- extract-text --text "This is text that needs analysis..."
cargo run -- extract-text -t "text content" -p "Extract key information"# Start MCP server (stdio mode)
cargo run -- serve
# View detailed configuration
cargo run --verbose servemcp-smart-fetch can run as a standard MCP server, providing the following tools:
- extract_from_file - Extract intelligent content from files
- extract_from_text - Extract intelligent content from text
- get_config - Get server configuration information
- list_supported_formats - List supported document formats
Add to claude_desktop_config.json:
{
"mcpServers": {
"smart-fetch": {
"command": "cargo",
"args": ["run", "--", "serve"],
"env": {
"LLM_API_KEY": "your-api-key"
}
}
}
}# Use environment variables
docker run --env-file .env -v $(pwd)/templates:/app/templates mcp-smart-fetch serve
# Use docker-compose
docker-compose up mcp-serverThe project supports complete configuration through environment variables, with environment variables taking precedence over configuration files.
LLM_API_KEY- LLM API key (required)LLM_API_ENDPOINT- API endpoint URLLLM_MODEL- Model name to useLLM_MAX_TOKENS- Maximum tokens (u32)LLM_TEMPERATURE- Temperature parameter (f64, 0.0-2.0)LLM_TIMEOUT_SECONDS- Request timeout (u64, seconds)
SERVER_HOST- Server listen addressSERVER_PORT- Server port (u16)SERVER_MAX_CONNECTIONS- Maximum connections (u32)SERVER_REQUEST_TIMEOUT_SECONDS- Request timeout (u64, seconds)
TEMPLATES_DIR- Template directory pathDEFAULT_TEMPLATE- Default template nameMAX_DOCUMENT_SIZE_MB- Maximum document size (f64, MB)CHUNK_SIZE- Chunk size (usize)ENABLE_PREPROCESSING- Enable preprocessing (bool)
ENABLE_CLEANING- Enable cleaning functionality (bool)REMOVE_BASE64_IMAGES- Remove base64 images (bool)REMOVE_BINARY_DATA- Remove binary data (bool)REMOVE_HTML_TAGS- Remove HTML tags (bool)NORMALIZE_WHITESPACE- Normalize whitespace (bool)MAX_STRING_LENGTH- Maximum string length (usize)
Configuration file located at config/config.toml, supporting layered configuration:
[llm]
api_endpoint = "https://api.openai.com/v1/chat/completions"
model = "gpt-4"
max_tokens = 32768
temperature = 0.7
[server]
host = "127.0.0.1"
port = 8080
[processing]
max_document_size_mb = 10.0
chunk_size = 4000
supported_formats = ["txt", "md", "json", "yaml", "yml", "toml", "xml", "csv"]# View all supported environment variables
cargo run -- env-vars
# View detailed configuration
cargo run --verbose extract-text --text "test"# Run all tests
cargo test
# Run unit tests
cargo test --lib
# Run integration tests (MCP server)
cargo test --test mcp_server_test
# Run specific tests
cargo test test_extract_from_text- Unit tests: Independent functionality of each module
- MCP server tests: Complete MCP protocol testing
- Integration tests: End-to-end content extraction workflow
docker build -t mcp-smart-fetch .# Use environment variables
docker run --env-file .env -v $(pwd)/templates:/app/templates mcp-smart-fetch
# Run as MCP server
docker run --env-file .env mcp-smart-fetch serveversion: '3.8'
services:
mcp-server:
build: .
command: ["serve"]
environment:
- LLM_API_KEY=${LLM_API_KEY}
- LLM_MODEL=${LLM_MODEL}
volumes:
- ./templates:/app/templatesmcp-smart-fetch/
├── src/
│ ├── main.rs # Main program entry
│ ├── lib.rs # Library entry
│ ├── config.rs # Configuration management
│ ├── mcp_server.rs # MCP server implementation
│ ├── llm_client.rs # LLM client
│ ├── document.rs # Document processing
│ ├── prompt_template.rs # Prompt template system
│ ├── cleaner.rs # Content cleaning
│ ├── progress.rs # Progress display
│ └── error.rs # Error handling
├── tests/
│ ├── unit_test.rs # Unit tests
│ ├── integration_test.rs # Integration tests
│ ├── cleaning_test.rs # Cleaning tests
│ └── mcp_server_test.rs # MCP server tests
├── config/
│ └── config.toml # Configuration file
├── templates/ # Template directory
├── examples/ # Example files
├── .env.example # Environment variable example
├── docker-compose.yml # Docker Compose config
├── Dockerfile # Docker image config
└── README.md # Project documentation
# Clone project
git clone https://github.com/yourusername/mcp-smart-fetch.git
cd mcp-smart-fetch
# Install Rust toolchain
rustup install stable
rustup component add clippy rustfmt
# Install pre-commit hooks (optional)
cargo install pre-commit
pre-commit install# Build project
cargo build
# Run development version
cargo run
# Run tests
cargo test
# Format code
cargo fmt
# Check code
cargo clippy
# Generate documentation
cargo doc --open- Add new request format in
src/llm_client.rs - Add new configuration example in
config/config.toml - Update environment variable documentation
- Add corresponding test cases
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- 📧 Email: your-email@example.com
- 🐛 Bug Reports: GitHub Issues
- 📖 Documentation: Wiki
- Model Context Protocol - MCP Protocol
- rmcp - Rust MCP SDK
- Tokio - Async Runtime
- Handlebars - Template Engine
⭐ If this project helps you, please give it a star!