-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Feature Request: API support for runtime start URL override when executing a robot
Summary
Allow overriding the robot's configured start URL at runtime via the API when calling the POST /robots/{id}/run endpoint.
Currently, the API always uses the start URL defined in the robot's workflow configuration. This prevents reusing a single robot across multiple pages that share the same structure but have different URLs.
Use Case
In many real-world scraping scenarios, multiple pages share the exact same HTML structure and extraction logic, but differ only by URL.
Example:
All pages share the same layout and selectors.
Instead of creating one robot per URL, it would be significantly more efficient to:
- Create one robot with a fixed extraction structure
- Override the start URL dynamically at execution time through the API
This enables:
- Reuse of a single robot
- Avoiding duplication of nearly identical robots
- Simpler orchestration logic
- Better scalability when processing large URL lists (e.g., from a sitemap)
Proposed API Change
Extend the existing run endpoint:
POST /robots/{id}/run
to accept an optional parameter:
{
"overrideUrl": "https://example.com/product/abc",
"input": { ... }
}Behavior
- If
overrideUrlis provided via API:- The robot execution should use this URL instead of the configured start URL.
- The override should apply only to this execution (no persistence in the workflow).
- If
overrideUrlis not provided:- Current behavior remains unchanged.
Technical Considerations
- The override should be applied at runtime only, based on the API payload.
- The stored workflow definition must remain untouched.
- The override should modify the navigation step in-memory before execution.
- Basic URL validation should be performed.
- The change should be fully backward compatible.
Why This Is Important
This API feature enables a clean separation between:
- Workflow definition (structure of extraction)
- Execution context (which URL to process)
It makes Maxun significantly more flexible for:
- Processing URL lists from sitemaps
- Batch scraping
- Microservice orchestration
- High-volume page processing without robot duplication
Alternatives Considered
- Creating one robot per URL → not scalable
- Modifying robots before each execution → error-prone
- Forking and maintaining a custom build → undesirable for long-term maintainability
Compatibility
This would be a non-breaking additive change to the API.