Skip to content

Latest commit

 

History

History
265 lines (197 loc) · 5.62 KB

File metadata and controls

265 lines (197 loc) · 5.62 KB

DeepFetch Tool Examples

This page shows concrete MCP tool-call examples for the two tools DeepFetch exposes.

All examples below are the arguments payloads you would pass to an MCP call_tool request.

Quick Test Without an LLM

Use examples/direct_mcp_client.py when you want to test the Dockerized MCP server directly.

Build the image:

docker build -t deepfetch:test .

Install the local client dependencies:

python -m pip install -e '.[dev]'

Export keys for internet_search:

export KAGI_API_KEY=your_kagi_key
export SCRAPFLY_API_KEY=your_scrapfly_key

List tools:

python examples/direct_mcp_client.py list-tools --image deepfetch:test

Run a search:

python examples/direct_mcp_client.py search \
  --image deepfetch:test \
  --query "Model Context Protocol official specification"

Run a PDF extraction:

python examples/direct_mcp_client.py pdf \
  --image deepfetch:test \
  --url "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" \
  --query "dummy"

Run the full smoke sequence:

python examples/direct_mcp_client.py smoke --image deepfetch:test

internet_search

Use internet_search when you need current public-web information and want DeepFetch to discover, fetch, and rerank results for you.

Example 1: Current factual lookup

Use explicit time context when it matters.

{
  "query": "OpenAI API pricing March 2026",
  "extraction_model": "article"
}

Why this works:

  • It names the entity directly
  • It includes the specific subject
  • It resolves the time context instead of using a vague word like latest

Example 2: Source-constrained search

Use a domain filter when the source itself matters.

{
  "query": "site:fda.gov semaglutide shortage status",
  "extraction_model": "article"
}

Why this works:

  • It narrows the search to an authoritative source
  • It uses domain language the source is likely to use

Example 3: PDF-oriented discovery

DeepFetch will automatically handle PDFs found during internet_search.

{
  "query": "cybersecurity report filetype:pdf after:2024",
  "extraction_model": "article"
}

Why this works:

  • filetype:pdf biases discovery toward documents
  • after:2024 helps reduce stale sources

Example 4: Product page extraction

Choose a non-default extraction model when the page type is obvious.

{
  "query": "Sony WH-1000XM6 specifications site:sony.com",
  "extraction_model": "product"
}

When to use a different extraction_model:

  • article for news, blog posts, docs, and general pages
  • product for product pages
  • stock for market pages
  • organization for company profile pages
  • event for event pages

Representative response

[
  {
    "url": "https://modelcontextprotocol.io/specification/2025-11-25",
    "title": "Specification - Model Context Protocol",
    "target_status_code": 200,
    "snippet": "Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools.",
    "score": 0.7008,
    "source": "semantic_ai"
  }
]

pdf_extract_text

Use pdf_extract_text when you already know the PDF you want to inspect.

Example 1: Exact keyword search in a PDF

Use keyword mode for exact terms, names, and numbers.

{
  "url": "https://example.com/financial-report.pdf",
  "query": "revenue guidance",
  "search_mode": "keyword",
  "max_matches": 5,
  "context_chars": 600
}

Example 2: Semantic concept search

Use semantic when you care about meaning more than exact phrasing.

{
  "url": "https://example.com/research-paper.pdf",
  "query": "retrieval-augmented generation",
  "search_mode": "semantic",
  "max_matches": 5,
  "context_chars": 800,
  "min_similarity": 0.25
}

Example 3: Scan only part of a large PDF

Use page limits when the relevant section is near the front or in a known range.

{
  "url": "https://example.com/10-k.pdf",
  "query": "risk factors",
  "search_mode": "auto",
  "start_page": 1,
  "max_pages": 25,
  "max_matches": 8
}

Example 4: Extract without a focused query

Omit query when you just want text extraction metadata for a subset of pages.

{
  "url": "https://example.com/brief.pdf",
  "start_page": 1,
  "max_pages": 3
}

Example 5: Base64 PDF input

Use pdf_base64 when the client already has the file bytes.

{
  "pdf_base64": "<base64-pdf-bytes>",
  "query": "incident response playbook",
  "search_mode": "auto"
}

Representative response

{
  "source_type": "url",
  "source_url": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  "final_url": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  "content_type": "application/pdf; qs=0.001",
  "total_pages": 1,
  "pages_processed": 1,
  "search_mode_requested": "auto",
  "search_mode_used": "semantic",
  "matches": [
    {
      "page": 1,
      "query": "dummy",
      "snippet": "Dummy PDF file",
      "score": 0.3462
    }
  ],
  "notes": ""
}

Query Writing Tips

For internet_search, good queries are usually:

  • Specific
  • Short
  • Rich in proper nouns and source-like terminology
  • Explicit about time, location, product, or organization when those change the answer

Good:

OpenAI API pricing March 2026
site:fda.gov semaglutide shortage status
cybersecurity report filetype:pdf after:2024

Bad:

latest pricing current openai api all models now
drug shortage government website maybe semaglutide current
cybersecurity lots of reports pdf security recent