This page shows concrete MCP tool-call examples for the two tools DeepFetch exposes.
All examples below are the arguments payloads you would pass to an MCP call_tool request.
Use examples/direct_mcp_client.py when you want to test the Dockerized MCP server directly.
Build the image:
docker build -t deepfetch:test .Install the local client dependencies:
python -m pip install -e '.[dev]'Export keys for internet_search:
export KAGI_API_KEY=your_kagi_key
export SCRAPFLY_API_KEY=your_scrapfly_keyList tools:
python examples/direct_mcp_client.py list-tools --image deepfetch:testRun a search:
python examples/direct_mcp_client.py search \
--image deepfetch:test \
--query "Model Context Protocol official specification"Run a PDF extraction:
python examples/direct_mcp_client.py pdf \
--image deepfetch:test \
--url "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" \
--query "dummy"Run the full smoke sequence:
python examples/direct_mcp_client.py smoke --image deepfetch:testUse internet_search when you need current public-web information and want DeepFetch to discover, fetch, and rerank results for you.
Use explicit time context when it matters.
{
"query": "OpenAI API pricing March 2026",
"extraction_model": "article"
}Why this works:
- It names the entity directly
- It includes the specific subject
- It resolves the time context instead of using a vague word like
latest
Use a domain filter when the source itself matters.
{
"query": "site:fda.gov semaglutide shortage status",
"extraction_model": "article"
}Why this works:
- It narrows the search to an authoritative source
- It uses domain language the source is likely to use
DeepFetch will automatically handle PDFs found during internet_search.
{
"query": "cybersecurity report filetype:pdf after:2024",
"extraction_model": "article"
}Why this works:
filetype:pdfbiases discovery toward documentsafter:2024helps reduce stale sources
Choose a non-default extraction model when the page type is obvious.
{
"query": "Sony WH-1000XM6 specifications site:sony.com",
"extraction_model": "product"
}When to use a different extraction_model:
articlefor news, blog posts, docs, and general pagesproductfor product pagesstockfor market pagesorganizationfor company profile pageseventfor event pages
[
{
"url": "https://modelcontextprotocol.io/specification/2025-11-25",
"title": "Specification - Model Context Protocol",
"target_status_code": 200,
"snippet": "Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools.",
"score": 0.7008,
"source": "semantic_ai"
}
]Use pdf_extract_text when you already know the PDF you want to inspect.
Use keyword mode for exact terms, names, and numbers.
{
"url": "https://example.com/financial-report.pdf",
"query": "revenue guidance",
"search_mode": "keyword",
"max_matches": 5,
"context_chars": 600
}Use semantic when you care about meaning more than exact phrasing.
{
"url": "https://example.com/research-paper.pdf",
"query": "retrieval-augmented generation",
"search_mode": "semantic",
"max_matches": 5,
"context_chars": 800,
"min_similarity": 0.25
}Use page limits when the relevant section is near the front or in a known range.
{
"url": "https://example.com/10-k.pdf",
"query": "risk factors",
"search_mode": "auto",
"start_page": 1,
"max_pages": 25,
"max_matches": 8
}Omit query when you just want text extraction metadata for a subset of pages.
{
"url": "https://example.com/brief.pdf",
"start_page": 1,
"max_pages": 3
}Use pdf_base64 when the client already has the file bytes.
{
"pdf_base64": "<base64-pdf-bytes>",
"query": "incident response playbook",
"search_mode": "auto"
}{
"source_type": "url",
"source_url": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
"final_url": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
"content_type": "application/pdf; qs=0.001",
"total_pages": 1,
"pages_processed": 1,
"search_mode_requested": "auto",
"search_mode_used": "semantic",
"matches": [
{
"page": 1,
"query": "dummy",
"snippet": "Dummy PDF file",
"score": 0.3462
}
],
"notes": ""
}For internet_search, good queries are usually:
- Specific
- Short
- Rich in proper nouns and source-like terminology
- Explicit about time, location, product, or organization when those change the answer
Good:
OpenAI API pricing March 2026
site:fda.gov semaglutide shortage status
cybersecurity report filetype:pdf after:2024
Bad:
latest pricing current openai api all models now
drug shortage government website maybe semaglutide current
cybersecurity lots of reports pdf security recent