feat(tools): add OlostepTool for web scraping, search, answers, batch…#130
Closed
ZeeshanAdilButt wants to merge 1 commit intoCelestoAI:mainfrom
Closed
feat(tools): add OlostepTool for web scraping, search, answers, batch…#130ZeeshanAdilButt wants to merge 1 commit intoCelestoAI:mainfrom
ZeeshanAdilButt wants to merge 1 commit intoCelestoAI:mainfrom
Conversation
…, crawl, and site mapping Adds a new OlostepTool that integrates Olostep's REST API directly via httpx (no SDK dependency required beyond the existing httpx transitive dep). Capabilities: - scrape: extract content from a URL as markdown, html, json, or text - search_web: structured Google Search results via @olostep/google-search parser - answers: AI-powered answers with citations from live web data - batch_scrape: scrape up to 10,000 URLs in parallel - crawl: autonomously discover and scrape an entire website - map_website: discover all URLs on a site, optionally ranked by query relevance Every capability returns str and is decorated with @capability so it is automatically exposed as an OpenAI function and MCP tool. Live-tested against the Olostep API — all 6 capabilities verified working. Closes #olostep-integration
4611d75 to
1abe6dc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…, crawl, and site mapping
Adds a new OlostepTool that integrates Olostep's REST API directly via httpx (no SDK dependency required beyond the existing httpx transitive dep).
Capabilities:
Every capability returns str and is decorated with @capability so it is automatically exposed as an OpenAI function and MCP tool.
Live-tested against the Olostep API — all 6 capabilities verified working.
Closes #olostep-integration
Greptile Summary
Adds
OlostepToolwith 6 capabilities (scrape, search_web, answers, batch_scrape, crawl, map_website) for web scraping and research via Olostep's REST API using httpx. Integration follows the standard BaseTool pattern with@capabilitydecorators.Critical Issues:
_retrievemethod has a logic bug where the loop overwritesparams["formats"]on each iteration, only keeping the last formatsearch_webconstructs Google Search URLs without URL-encoding the query parameter, breaking searches with special charactersagentor[olostep]extra but it's not defined inpyproject.tomlAdditional Notes:
Confidence Score: 2/5
_retrievebug will cause incorrect API behavior, and missing URL encoding insearch_webwill break searches with spaces or special chars. Missing pyproject.toml configuration means installation will fail.src/agentor/tools/olostep.pyfor the logic bugs and verifypyproject.tomlhas the olostep extra configuredImportant Files Changed
_retrieve(overwrites formats param) andsearch_web(missing URL encoding)Sequence Diagram
sequenceDiagram participant User participant Agentor participant OlostepTool participant OlostepAPI User->>Agentor: agent.arun("scrape example.com") Agentor->>OlostepTool: scrape(url, output_format) OlostepTool->>OlostepTool: _headers() with Bearer token OlostepTool->>OlostepAPI: POST /v1/scrapes OlostepAPI-->>OlostepTool: JSON response with content OlostepTool->>OlostepTool: extract content by format key OlostepTool-->>Agentor: return formatted content string Agentor-->>User: final_output with scraped content User->>Agentor: agent.arun("batch scrape URLs") Agentor->>OlostepTool: batch_scrape(urls, format) OlostepTool->>OlostepAPI: POST /v1/batches OlostepAPI-->>OlostepTool: batch_id loop Poll every 5s (max 10min) OlostepTool->>OlostepAPI: GET /v1/batches/{id} OlostepAPI-->>OlostepTool: status end OlostepTool->>OlostepAPI: GET /v1/batches/{id}/items OlostepAPI-->>OlostepTool: batch items with retrieve_ids loop For each item OlostepTool->>OlostepTool: _retrieve(retrieve_id, formats) OlostepTool->>OlostepAPI: GET /v1/retrieve OlostepAPI-->>OlostepTool: content data end OlostepTool-->>Agentor: JSON array of results Agentor-->>User: final_output with all scraped contentLast reviewed commit: 4611d75
(2/5) Greptile learns from your feedback when you react with thumbs up/down!