Skip to content

feat(tools): add OlostepTool for web scraping, search, answers, batch…#130

Closed
ZeeshanAdilButt wants to merge 1 commit intoCelestoAI:mainfrom
ZeeshanAdilButt:feat/olostep-tool
Closed

feat(tools): add OlostepTool for web scraping, search, answers, batch…#130
ZeeshanAdilButt wants to merge 1 commit intoCelestoAI:mainfrom
ZeeshanAdilButt:feat/olostep-tool

Conversation

@ZeeshanAdilButt
Copy link

@ZeeshanAdilButt ZeeshanAdilButt commented Feb 20, 2026

…, crawl, and site mapping

Adds a new OlostepTool that integrates Olostep's REST API directly via httpx (no SDK dependency required beyond the existing httpx transitive dep).

Capabilities:

  • scrape: extract content from a URL as markdown, html, json, or text
  • search_web: structured Google Search results via @olostep/google-search parser
  • answers: AI-powered answers with citations from live web data
  • batch_scrape: scrape up to 10,000 URLs in parallel
  • crawl: autonomously discover and scrape an entire website
  • map_website: discover all URLs on a site, optionally ranked by query relevance

Every capability returns str and is decorated with @capability so it is automatically exposed as an OpenAI function and MCP tool.

Live-tested against the Olostep API — all 6 capabilities verified working.

Closes #olostep-integration

Greptile Summary

Adds OlostepTool with 6 capabilities (scrape, search_web, answers, batch_scrape, crawl, map_website) for web scraping and research via Olostep's REST API using httpx. Integration follows the standard BaseTool pattern with @capability decorators.

Critical Issues:

  • _retrieve method has a logic bug where the loop overwrites params["formats"] on each iteration, only keeping the last format
  • search_web constructs Google Search URLs without URL-encoding the query parameter, breaking searches with special characters
  • README references agentor[olostep] extra but it's not defined in pyproject.toml

Additional Notes:

  • All 6 capabilities return str and handle errors properly with try/except blocks
  • Polling logic for async operations (batch_scrape, crawl) uses fixed 120 iterations × 5s = 10 min max timeout
  • Example file demonstrates 4 of the 6 capabilities with clear use cases

Confidence Score: 2/5

  • This PR has critical bugs that will cause runtime failures in production
  • Two critical logic errors found: _retrieve bug will cause incorrect API behavior, and missing URL encoding in search_web will break searches with spaces or special chars. Missing pyproject.toml configuration means installation will fail.
  • Pay close attention to src/agentor/tools/olostep.py for the logic bugs and verify pyproject.toml has the olostep extra configured

Important Files Changed

Filename Overview
src/agentor/tools/olostep.py New OlostepTool with 6 capabilities - found critical bugs in _retrieve (overwrites formats param) and search_web (missing URL encoding)
examples/tools/olostep.py Clean example showing OlostepTool usage with 4 demo scenarios - no issues found
examples/tools/README.md Added olostep to examples list and install command - missing pyproject.toml configuration for olostep extra
src/agentor/tools/init.py Added OlostepTool import and export - standard integration, no issues

Sequence Diagram

sequenceDiagram
    participant User
    participant Agentor
    participant OlostepTool
    participant OlostepAPI

    User->>Agentor: agent.arun("scrape example.com")
    Agentor->>OlostepTool: scrape(url, output_format)
    OlostepTool->>OlostepTool: _headers() with Bearer token
    OlostepTool->>OlostepAPI: POST /v1/scrapes
    OlostepAPI-->>OlostepTool: JSON response with content
    OlostepTool->>OlostepTool: extract content by format key
    OlostepTool-->>Agentor: return formatted content string
    Agentor-->>User: final_output with scraped content

    User->>Agentor: agent.arun("batch scrape URLs")
    Agentor->>OlostepTool: batch_scrape(urls, format)
    OlostepTool->>OlostepAPI: POST /v1/batches
    OlostepAPI-->>OlostepTool: batch_id
    loop Poll every 5s (max 10min)
        OlostepTool->>OlostepAPI: GET /v1/batches/{id}
        OlostepAPI-->>OlostepTool: status
    end
    OlostepTool->>OlostepAPI: GET /v1/batches/{id}/items
    OlostepAPI-->>OlostepTool: batch items with retrieve_ids
    loop For each item
        OlostepTool->>OlostepTool: _retrieve(retrieve_id, formats)
        OlostepTool->>OlostepAPI: GET /v1/retrieve
        OlostepAPI-->>OlostepTool: content data
    end
    OlostepTool-->>Agentor: JSON array of results
    Agentor-->>User: final_output with all scraped content
Loading

Last reviewed commit: 4611d75

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

…, crawl, and site mapping

Adds a new OlostepTool that integrates Olostep's REST API directly via httpx
(no SDK dependency required beyond the existing httpx transitive dep).

Capabilities:
- scrape: extract content from a URL as markdown, html, json, or text
- search_web: structured Google Search results via @olostep/google-search parser
- answers: AI-powered answers with citations from live web data
- batch_scrape: scrape up to 10,000 URLs in parallel
- crawl: autonomously discover and scrape an entire website
- map_website: discover all URLs on a site, optionally ranked by query relevance

Every capability returns str and is decorated with @capability so it is
automatically exposed as an OpenAI function and MCP tool.

Live-tested against the Olostep API — all 6 capabilities verified working.

Closes #olostep-integration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants