Skip to content

[Feature Request]: API support for runtime start URL override when executing a robot #977

@Fastidio96

Description

@Fastidio96

Feature Request: API support for runtime start URL override when executing a robot

Summary

Allow overriding the robot's configured start URL at runtime via the API when calling the POST /robots/{id}/run endpoint.

Currently, the API always uses the start URL defined in the robot's workflow configuration. This prevents reusing a single robot across multiple pages that share the same structure but have different URLs.

Use Case

In many real-world scraping scenarios, multiple pages share the exact same HTML structure and extraction logic, but differ only by URL.

Example:

All pages share the same layout and selectors.

Instead of creating one robot per URL, it would be significantly more efficient to:

  • Create one robot with a fixed extraction structure
  • Override the start URL dynamically at execution time through the API

This enables:

  • Reuse of a single robot
  • Avoiding duplication of nearly identical robots
  • Simpler orchestration logic
  • Better scalability when processing large URL lists (e.g., from a sitemap)

Proposed API Change

Extend the existing run endpoint:

POST /robots/{id}/run

to accept an optional parameter:

{
  "overrideUrl": "https://example.com/product/abc",
  "input": { ... }
}

Behavior

  • If overrideUrl is provided via API:
    • The robot execution should use this URL instead of the configured start URL.
    • The override should apply only to this execution (no persistence in the workflow).
  • If overrideUrl is not provided:
    • Current behavior remains unchanged.

Technical Considerations

  • The override should be applied at runtime only, based on the API payload.
  • The stored workflow definition must remain untouched.
  • The override should modify the navigation step in-memory before execution.
  • Basic URL validation should be performed.
  • The change should be fully backward compatible.

Why This Is Important

This API feature enables a clean separation between:

  • Workflow definition (structure of extraction)
  • Execution context (which URL to process)

It makes Maxun significantly more flexible for:

  • Processing URL lists from sitemaps
  • Batch scraping
  • Microservice orchestration
  • High-volume page processing without robot duplication

Alternatives Considered

  • Creating one robot per URL → not scalable
  • Modifying robots before each execution → error-prone
  • Forking and maintaining a custom build → undesirable for long-term maintainability

Compatibility

This would be a non-breaking additive change to the API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions