Skip to content

[Feature Request]: Add filtering options for selectors (startWith, contains, endsWith, etc.) #978

@Fastidio96

Description

@Fastidio96

Feature Request: Add filtering options for selectors (startWith, contains, endsWith, etc.)

Summary

Add the ability to apply filters to elements selected during extraction, both via UI and API, for single elements or capture lists.

Currently, selectors in Maxun allow selecting elements by CSS or XPath, but there is no built-in way to filter them based on content patterns (e.g., only links starting with a certain prefix, containing a substring, or ending with a specific suffix).

Use Case

In many scraping scenarios, not all selected elements are relevant. For example:

  • A page contains multiple links, but we only want links starting with https://example.com/product-.
  • Or a capture list contains items with extra unrelated text, and we want to filter by suffix, contains, or not contains.

Having native filtering would avoid:

  • Post-processing JSON to remove unwanted entries
  • Writing custom scripts for trivial pattern-based filtering
  • Creating multiple robots for minor variations

Proposed Solution

Extend the selector functionality to support optional filters:

  • Filter types: startWith, contains, endsWith, notContains, regex
  • Filters apply to:
    • Single element selection
    • Capture lists
  • Filters available:
    • In UI: a new optional field in the selector step
    • In API: an optional parameter when defining or running a selector step

Example API payload:

{
  "selector": "a.product-link",
  "filter": {
    "type": "startWith",
    "value": "https://example.com/product-"
  }
}

Benefits

  • Cleaner extraction workflow
  • Less manual post-processing
  • Reusable robots across similar pages
  • Flexible both in UI and API context

Compatibility

Fully backward compatible: if no filter is provided, behavior remains unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions