-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Feature Request: Add filtering options for selectors (startWith, contains, endsWith, etc.)
Summary
Add the ability to apply filters to elements selected during extraction, both via UI and API, for single elements or capture lists.
Currently, selectors in Maxun allow selecting elements by CSS or XPath, but there is no built-in way to filter them based on content patterns (e.g., only links starting with a certain prefix, containing a substring, or ending with a specific suffix).
Use Case
In many scraping scenarios, not all selected elements are relevant. For example:
- A page contains multiple links, but we only want links starting with
https://example.com/product-. - Or a capture list contains items with extra unrelated text, and we want to filter by suffix, contains, or not contains.
Having native filtering would avoid:
- Post-processing JSON to remove unwanted entries
- Writing custom scripts for trivial pattern-based filtering
- Creating multiple robots for minor variations
Proposed Solution
Extend the selector functionality to support optional filters:
- Filter types:
startWith,contains,endsWith,notContains,regex - Filters apply to:
- Single element selection
- Capture lists
- Filters available:
- In UI: a new optional field in the selector step
- In API: an optional parameter when defining or running a selector step
Example API payload:
{
"selector": "a.product-link",
"filter": {
"type": "startWith",
"value": "https://example.com/product-"
}
}Benefits
- Cleaner extraction workflow
- Less manual post-processing
- Reusable robots across similar pages
- Flexible both in UI and API context
Compatibility
Fully backward compatible: if no filter is provided, behavior remains unchanged.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels