Fix excluded tags lookup to use correct key type by seanstory · Pull Request #417 · elastic/crawler

seanstory · 2026-02-06T16:08:42Z

Closes #416

The exclude_tags configuration was not being applied correctly. The config stores exclude_tags keyed by domain URL strings (e.g., "https://example.com"), but the lookup in get_body_tag was using the URL object directly as the hash key instead of url.site.

This fix changes the lookup to use url.site (which returns the scheme + host as a string) to match how the config stores the keys.

Checklists

Pre-Review Checklist

Changes Requiring Extra Attention

N/A - This is a straightforward bug fix with no security implications or new dependencies.

Release Note

Fixed exclude_tags domain configuration not being applied during crawl. Tags specified in exclude_tags for a domain are now correctly excluded from the document body.

seanstory · 2026-02-06T19:47:19Z

Customer tested the changes and confirmed the fix. I think we're good to merge, after an approval.

### Closes #416 The `exclude_tags` configuration was not being applied correctly. The config stores exclude_tags keyed by domain URL strings (e.g., `"https://example.com"`), but the lookup in `get_body_tag` was using the URL object directly as the hash key instead of `url.site`. This fix changes the lookup to use `url.site` (which returns the scheme + host as a string) to match how the config stores the keys. ### Checklists #### Pre-Review Checklist - [x] This PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check `crawler.yml.example` and `elasticsearch.yml.example`) - [x] This PR has a meaningful title - [x] This PR links to all relevant GitHub issues that it fixes or partially addresses - Fixes #416 - [x] this PR has a thorough description - [x] Covered the changes with automated tests - [ ] Tested the changes locally - [x] Added a label for each target release version (example: `v0.1.0`) - [x] Considered corresponding documentation changes - N/A - this is a bug fix, no documentation changes needed - [x] Contributed any configuration settings changes to the configuration reference - N/A - no configuration changes - [x] Ran `make notice` if any dependencies have been added - N/A - no dependencies added #### Changes Requiring Extra Attention N/A - This is a straightforward bug fix with no security implications or new dependencies. ### Release Note Fixed `exclude_tags` domain configuration not being applied during crawl. Tags specified in `exclude_tags` for a domain are now correctly excluded from the document body.

github-actions · 2026-02-06T20:48:50Z

💚 Backport PR(s) successfully created

Status	Branch	Result
✅	0.4	#418

This backport PR will be merged automatically after passing CI.

Backports the following commits to 0.4: - Fix excluded tags lookup to use correct key type (#417) Co-authored-by: Sean Story <sean.story@elastic.co>

seanstory added 2 commits February 6, 2026 11:05

Fix excluded tags lookup to use right type for key

72dc46f

Fix excluded tags lookup to use right type for key

8fa3ec5

seanstory requested a review from a team as a code owner February 6, 2026 16:08

seanstory added auto-backport v0.5.0 v0.4.3 labels Feb 6, 2026

seanstory mentioned this pull request Feb 6, 2026

exclude_tags not working? #416

Closed

seanstory enabled auto-merge (squash) February 6, 2026 19:46

mattnowzari approved these changes Feb 6, 2026

View reviewed changes

seanstory merged commit e22528b into main Feb 6, 2026
5 checks passed

seanstory deleted the seanstory/416-fix-exclude-tags-lookup branch February 6, 2026 20:48

github-actions bot mentioned this pull request Feb 6, 2026

[0.4] Fix excluded tags lookup to use correct key type (#417) #418

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix excluded tags lookup to use correct key type#417

Fix excluded tags lookup to use correct key type#417
seanstory merged 2 commits intomainfrom
seanstory/416-fix-exclude-tags-lookup

seanstory commented Feb 6, 2026 •

edited

Loading

Uh oh!

seanstory commented Feb 6, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seanstory commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Closes #416

Checklists

Pre-Review Checklist

Changes Requiring Extra Attention

Release Note

Uh oh!

seanstory commented Feb 6, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2026

💚 Backport PR(s) successfully created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seanstory commented Feb 6, 2026 •

edited

Loading