-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context
X.com (Twitter) is one of the most aggressive sites at detecting and blocking automated/headless browser access. This makes it an ideal test case for building robust auth wall and content-blocking detection (related to #34).
Observed Behavior
goto https://x.com- loads page with status 200 but empty snapshotcheckpoint(headed mode) - user is logged in, sidebar renders, URL changes to/home- Back in headless mode - page shows infinite loading spinner
- Screenshot confirms: sidebar with nav items loads initially, but feed content never renders
evaluate "document.querySelectorAll('article').length"returns 0document.body.innerText.lengthreturns 0- After a few attempts, even the sidebar disappears - just a blank page with spinner
Why This Matters
X.com represents the hardest category of sites to automate. If web-ctl can handle X, it can handle anything. This issue tracks making X.com a first-class test case for:
-
Content blocking detection - Page chrome loads but main content area stays as a spinner. Web-ctl should detect this pattern and surface
"warning": "content_blocked"rather than returning empty snapshots silently. -
Headless detection evasion - Research what signals X uses to detect headless browsers (navigator.webdriver, CDP detection, canvas fingerprinting, etc.) and whether Playwright stealth techniques can help.
-
Session persistence - Cookies from headed checkpoint don't seem to carry sufficient auth state back to headless mode. X may be checking for properties that only exist in headed contexts.
-
Feed API approach - Investigate whether intercepting X's internal API calls (via network capture) could be a more reliable path than DOM scraping.
Acceptance Criteria
- web-ctl can navigate to x.com/home and read feed content in headless mode
- If blocked, web-ctl surfaces a clear warning within 10 seconds (not infinite spinner)
- Document the techniques that work/don't work for reference
Labels
This is a research/learning issue - treat it as a special use case for hardening web-ctl against aggressive anti-bot sites.