DeepFetch

DeepFetch is a semantic web search MCP server for Claude Desktop, Gemini CLI, Codex CLI, and other Model Context Protocol clients. It combines Kagi discovery, Scrapfly extraction, local ONNX reranking, and PDF-aware retrieval so agents get evidence-rich snippets instead of raw link lists.

Search terms: MCP search server, semantic web search, Model Context Protocol, Claude Desktop search tool, Gemini CLI MCP server, Codex CLI search, PDF search, Kagi, Scrapfly, ONNX reranking.

Why DeepFetch

Return evidence, not just URLs. internet_search reranks extracted page content and returns the strongest snippets from unique domains.
Stay compatible with the MCP clients people already use. The default path is local stdio, which works well for Docker-based local servers.
Handle PDFs as first-class sources. Search results that resolve to PDFs are routed through the PDF pipeline automatically, and pdf_extract_text is available when you already know the document URL.
Keep the deployment path simple. End users only need Docker plus KAGI_API_KEY and SCRAPFLY_API_KEY.
Preserve a clean upgrade path. The same FastMCP server can also run over streamable-http for managed deployments.

Quick Start

Run the server locally in Docker:

docker run --rm -i \
  -e KAGI_API_KEY=your_kagi_key \
  -e SCRAPFLY_API_KEY=your_scrapfly_key \
  ghcr.io/vinay9986/deepfetch:latest

Then point your MCP client at the containerized server using one of the config examples in examples/clients.

For maintainers, .github/workflows/publish-image.yml publishes the multi-arch image to ghcr.io/vinay9986/deepfetch from GitHub Actions on the default branch and release tags.

If you want to test the repo before publishing an image, build it locally and use the direct MCP smoke client from docs/getting-started.md.

Tool Surface

Tool	Purpose	Best fit
`internet_search`	Discover, fetch, and rerank current public-web content.	Time-sensitive facts, current events, source-backed lookup, and public web research.
`pdf_extract_text`	Extract text and page-numbered matches from a known PDF.	Reports, filings, papers, manuals, and PDF verification workflows.

Architecture Snapshot

flowchart LR
    Client[Claude Desktop / Gemini CLI / Codex CLI / other MCP client]
    Client --> Transport[FastMCP transport<br/>stdio or streamable-http]
    Transport --> Search[internet_search]
    Transport --> PDF[pdf_extract_text]
    Search --> Kagi[Kagi search API]
    Search --> Scrapfly[Scrapfly extraction]
    Search --> ONNX[Local ONNX embedder]
    Search --> PDF
    PDF --> PyPDF[pypdf page extraction]
    PDF --> ONNX

At runtime, DeepFetch deduplicates hosts before scraping, uses bounded parallel fetches, keeps short-lived in-process caches for Kagi and Scrapfly responses, and falls back to keyword snippets when semantic assets are unavailable.

Docs Map

Getting Started: Docker, local smoke testing, source installs, and test commands.
Architecture: transport model, request flow, module layout, and deployment choices.
Configuration: environment variables, transport knobs, semantic asset paths, and client configs.
Tool Examples: concrete call_tool payloads for both exposed tools.
ADR 0001: local-first, multi-transport rationale.

Status

DeepFetch currently exposes two MCP tools from src/deepfetch/server.py:

internet_search
pdf_extract_text

The primary distribution model is a Docker image with ONNX assets baked in. Local stdio is the default mode, and DEEPFETCH_TRANSPORT=streamable-http enables the managed deployment path.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
onnx_assets		onnx_assets
src/deepfetch		src/deepfetch
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepFetch

Why DeepFetch

Quick Start

Tool Surface

Architecture Snapshot

Docs Map

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepFetch

Why DeepFetch

Quick Start

Tool Surface

Architecture Snapshot

Docs Map

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages