Skip to content

feat: content blocking detection for headless browsers#73

Merged
avifenesh merged 9 commits intomainfrom
feature/auth-wall-detection-38
Feb 26, 2026
Merged

feat: content blocking detection for headless browsers#73
avifenesh merged 9 commits intomainfrom
feature/auth-wall-detection-38

Conversation

@avifenesh
Copy link
Collaborator

Summary

  • Add detectContentBlocked() function to detect when sites serve pages but block content from headless browsers (e.g., X.com empty timelines)
  • Enhance browser stealth with anti-bot evasion (window.chrome, navigator.plugins, WebGL, permissions.query spoofing)
  • Integrate content-blocked warning into goto action with --no-content-block-detect flag
  • Add X.com provider config with content selectors and blocking indicators

Test Plan

  • 541/541 tests passing (33 new tests added)
  • Content blocking detection covers: provider-specific selectors, text patterns, empty content, generic patterns, persistent spinners
  • X.com-specific test scenarios: empty feed, error state, no false positive with real content
  • Edge cases: empty contentSelectors array, page query errors, threshold boundaries
  • Integration tests verify goto action wiring

Related Issues

Closes #38

Add contentSelectors and contentBlockedIndicators fields to the X
(Twitter) provider entry. These define the DOM selectors and text
patterns used to detect when X.com blocks headless browsers from
viewing feed content. Updated notes to document blocking behavior.
Add a new detectContentBlocked() function that detects when sites serve
pages but block actual content from headless browsers. Uses five ordered
heuristics (OR logic): provider blocked selectors, provider blocked text
patterns, empty content areas, generic error text with short body, and
persistent loading indicators. Exports CONTENT_BLOCKED_TEXT_PATTERNS.
Expand the addInitScript block with additional stealth measures:
- Spoof window.chrome object (present in real Chrome, missing in headless)
- Spoof navigator.plugins with non-empty PluginArray-like object
- Set navigator.languages to ['en-US', 'en']
- Override WebGL vendor/renderer to Intel Inc. / Intel Iris OpenGL Engine
- Override permissions.query for 'notifications' to return denied state
Import detectContentBlocked and add matchProviderByDomain helper with
lazy-loaded Map for O(1) provider lookup. After goto navigation and
waitForLoaded, detect content blocking using provider-specific config
from providers.json. When detected, add contentBlocked, warning, reason,
and suggestion fields to the result. Add --no-content-block-detect flag
to skip detection.
Add 19 tests covering all detection heuristics:
- Provider-specific blocked selectors and text patterns
- Empty content area detection with threshold
- Generic error text with short body
- Persistent loading indicators (visible vs invisible)
- Error handling for page.$() and textContent() failures
- Default emptyContentThreshold of 200
- X.com-specific: empty feed, error state, no false positives
- CONTENT_BLOCKED_TEXT_PATTERNS export validation
…xports

Add two tests to the existing auth-wall-detect test suite confirming
that the new detectContentBlocked function and CONTENT_BLOCKED_TEXT_PATTERNS
constant are properly exported from the module.
…e case tests

- Cache bodyText fetch in detectContentBlocked to avoid redundant DOM query
- Export LOADING_INDICATOR_SELECTORS for testability
- Add empty contentSelectors array edge case test
- Add LOADING_INDICATOR_SELECTORS validation tests
@avifenesh avifenesh merged commit f02e6a9 into main Feb 26, 2026
2 of 3 checks passed
@avifenesh avifenesh deleted the feature/auth-wall-detection-38 branch February 26, 2026 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

X.com feed blocks headless browsers - use as test case for auth wall detection

1 participant