Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .claude/web-ctl/workflow-status.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"taskId": "34",
"title": "Auto-detect auth walls after successful authentication",
"phase": "implementation",
"currentStep": 5,
"totalSteps": 5,
"implementationComplete": true,
"filesModified": ["scripts/auth-wall-detect.js", "scripts/web-ctl.js", "tests/auth-wall-detect.test.js", "tests/web-ctl-actions.test.js"],
"lastActivityAt": "2026-02-25T00:03:00Z"
}
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## [Unreleased]

### Added
- Auto-detect authentication walls after goto navigation - uses three-heuristic detection (domain cookies, URL auth patterns, DOM login elements) and automatically opens headed checkpoint. Disable with `--no-auth-wall-detect` flag
- Smart default snapshot scoping - snapshots automatically scope to `<main>` element (then `[role="main"]`, fallback to `<body>`), reducing output size by excluding navigation, headers, and footers. Use `--snapshot-full` to capture full page body when needed
- `--snapshot-compact` flag for token-efficient LLM consumption - applies four transforms: link collapsing (merges link + /url child into `link "Title" -> /path`), heading inlining (merges heading with single link child), decorative image removal (strips img nodes with empty or single-char alt text), and duplicate URL dedup (removes second occurrence at same depth scope). Applied after `--snapshot-depth` and before `--snapshot-collapse` in the pipeline
- `--snapshot-max-lines <N>` flag to truncate snapshot output to a maximum number of lines, with a `... (K more lines)` marker when lines are omitted
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ web-ctl session end github

| Action | Usage | Returns |
|--------|-------|---------|
| `goto` | `run <s> goto <url>` | `{ url, status, snapshot }` |
| `goto` | `run <s> goto <url> [--no-auth-wall-detect]` | `{ url, status, authWallDetected, checkpointCompleted, snapshot }` |
| `snapshot` | `run <s> snapshot` | `{ url, snapshot }` |
| `click` | `run <s> click <sel> [--wait-stable]` | `{ url, clicked, snapshot }` |
| `click-wait` | `run <s> click-wait <sel> [--timeout]` | `{ url, clicked, settled, snapshot }` |
Expand Down Expand Up @@ -172,6 +172,7 @@ This eliminates the common click-snapshot-check loop that wastes agent turns on
| `--filter <pattern>` | `network` | Filter captured requests by URL pattern |
| `--path <file>` | `screenshot` | Custom screenshot path (within session dir) |
| `--allow-evaluate` | `evaluate` | Required safety flag for JS execution |
| `--no-auth-wall-detect` | `goto` | Disable automatic auth wall detection and checkpoint opening |
| `--snapshot-depth <N>` | Any action with snapshot | Limit ARIA tree depth (e.g. 3 for top 3 levels) |
| `--snapshot-selector <sel>` | Any action with snapshot | Scope snapshot to a DOM subtree |
| `--snapshot-max-lines <N>` | Any action with snapshot | Truncate snapshot to N lines |
Expand Down
151 changes: 151 additions & 0 deletions scripts/auth-wall-detect.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
'use strict';

/**
* Auth wall detection module.
*
* Detects whether a page is showing an authentication wall after navigation.
* Uses three heuristics (ALL must pass - AND logic):
* 1. Domain cookies exist for the target URL
* 2. Current page URL matches a known auth URL pattern
* 3. Page DOM contains login-related elements or text
*
* Short-circuits: if cookie check fails, skips URL and DOM checks.
*/

const AUTH_URL_PATTERNS = [
'login',
'signin',
'sign_in',
'sign-in',
'oauth',
'accounts',
'auth/realms'
];

const AUTH_DOM_SELECTORS = [
'input[type="password"]',
'form[action*="login"]',
'form[action*="signin"]',
'form[action*="authenticate"]',
'input[name="username"]',
'input[name="email"][type="email"]'
];

const AUTH_TEXT_PATTERNS = [
'sign in',
'log in',
'enter your password',
'choose an account',
'pick an account',
'select an account'
];

/**
* Extract the domain from a URL string.
* Returns null if the URL is invalid.
*/
function extractDomain(url) {
try {
return new URL(url).hostname;
} catch {
return null;
}
}

/**
* Check whether a cookie domain matches the target domain.
* Supports parent domain matching (e.g., cookie for `.github.com` matches `github.com`).
*/
function cookieDomainMatches(cookieDomain, targetDomain) {
const bare = cookieDomain.replace(/^\./, '');
if (bare === targetDomain) return true;
if (targetDomain.endsWith('.' + bare)) return true;
return false;
}

/**
* Detect whether the current page is an auth wall.
*
* @param {import('playwright').Page} page
* @param {import('playwright').BrowserContext} context
* @param {string} targetUrl - The URL we navigated to
* @returns {Promise<{ detected: boolean, reason: string, details?: object }>}
*/
async function detectAuthWall(page, context, targetUrl) {
const targetDomain = extractDomain(targetUrl);
if (!targetDomain) {
return { detected: false, reason: 'invalid_target_url' };
}

// Heuristic 1: Domain cookies exist
let cookies;
try {
cookies = await context.cookies();
} catch {
return { detected: false, reason: 'cookie_read_error' };
}

const hasDomainCookies = cookies.some(c => cookieDomainMatches(c.domain, targetDomain));
if (!hasDomainCookies) {
return { detected: false, reason: 'no_domain_cookies' };
}

// Heuristic 2: URL matches auth pattern
const currentUrl = page.url().toLowerCase();
const authUrlPattern = AUTH_URL_PATTERNS.find(pattern => currentUrl.includes(pattern));
if (!authUrlPattern) {
return { detected: false, reason: 'url_not_auth_pattern' };
}

// Heuristic 3: DOM contains auth elements
// 3a: Check selectors (parallel for performance)
let matchedSelector = null;
try {
const results = await Promise.allSettled(
AUTH_DOM_SELECTORS.map(async (sel) => ({ sel, el: await page.$(sel) }))
);
for (const r of results) {
if (r.status === 'fulfilled' && r.value.el) {
matchedSelector = r.value.sel;
break;
}
}
} catch {
}

if (matchedSelector) {
return {
detected: true,
reason: 'auth_wall',
details: {
hasDomainCookies: true,
authUrlPattern,
domElement: matchedSelector
}
};
}

// 3b: Check text patterns
let matchedText = null;
try {
const bodyText = (await page.textContent('body') || '').slice(0, 5000).toLowerCase();
matchedText = AUTH_TEXT_PATTERNS.find(pattern => bodyText.includes(pattern));
} catch {
}

if (matchedText) {
return {
detected: true,
reason: 'auth_wall',
details: {
hasDomainCookies: true,
authUrlPattern,
domElement: matchedText
}
};
}

return { detected: false, reason: 'no_auth_elements' };
}

module.exports = { detectAuthWall, AUTH_URL_PATTERNS, AUTH_DOM_SELECTORS, AUTH_TEXT_PATTERNS };
32 changes: 30 additions & 2 deletions scripts/web-ctl.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
'use strict';

const sessionStore = require('./session-store');
const { launchBrowser, closeBrowser, randomDelay, waitForStable } = require('./browser-launcher');
const { launchBrowser, closeBrowser, randomDelay, waitForStable, canLaunchHeaded } = require('./browser-launcher');
const { detectAuthWall } = require('./auth-wall-detect');
const { runAuthFlow } = require('./auth-flow');
const { checkAuthSuccess } = require('./auth-check');
const { sanitizeWebContent, wrapOutput } = require('./redact');
Expand All @@ -20,7 +21,7 @@ const BOOLEAN_FLAGS = new Set([
'--allow-evaluate', '--no-snapshot', '--wait-stable', '--vnc',
'--exact', '--accept', '--submit', '--dismiss', '--auto',
'--snapshot-collapse', '--snapshot-text-only', '--snapshot-compact',
'--snapshot-full',
'--snapshot-full', '--no-auth-wall-detect',
]);

function validateSessionName(name) {
Expand Down Expand Up @@ -919,6 +920,33 @@ async function runAction(sessionName, action, actionArgs, opts) {
if (!url) throw new Error('URL required: run <session> goto <url>');
validateUrl(url);
const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
if (!opts.noAuthWallDetect) {
const detection = await detectAuthWall(page, context, url);
if (detection.detected) {
console.warn('[WARN] Auth wall detected for ' + new URL(url).hostname);
await closeBrowser(sessionName, context);
const headed = await canLaunchHeaded();
if (headed) {
const headedBrowser = await launchBrowser(sessionName, { headless: false });
context = headedBrowser.context;
page = headedBrowser.page;
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
const ckTimeout = Math.min(opts.timeout ? parseInt(opts.timeout, 10) : 120, 3600) * 1000;
console.warn('[WARN] Checkpoint open for ' + (ckTimeout / 1000) + 's');
await new Promise(resolve => setTimeout(resolve, ckTimeout));
const snapshot = await getSnapshot(page, opts);
result = { url: page.url(), authWallDetected: true, checkpointCompleted: true,
...(snapshot != null && { snapshot }) };
break;
} else {
const snapshot = await getSnapshot(page, opts);
result = { url: page.url(), authWallDetected: true, checkpointCompleted: false,
message: 'Auth wall detected but no display for headed checkpoint.',
...(snapshot != null && { snapshot }) };
break;
}
}
}
const snapshot = await getSnapshot(page, opts);
result = { url: page.url(), status: response ? response.status() : null, ...(snapshot != null && { snapshot }) };
break;
Expand Down
13 changes: 11 additions & 2 deletions skills/web-browse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,19 @@ Safe practice: always double-quote URL arguments.
### goto - Navigate to URL

```bash
node ${PLUGIN_ROOT}/scripts/web-ctl.js run <session> goto <url>
node ${PLUGIN_ROOT}/scripts/web-ctl.js run <session> goto <url> [--no-auth-wall-detect]
```

Returns: `{ url, status, snapshot }`
Navigates to a URL and automatically detects authentication walls using a three-heuristic detection system:
1. Domain cookies (checks for auth-related cookie names on the target domain)
2. URL auth patterns (detects common login URL patterns like `/login`, `/signin`, `/auth`)
3. DOM login elements (scans the page for login forms and auth UI elements)

When an authentication wall is detected, the tool automatically opens a headed checkpoint, allowing the user to complete authentication. The checkpoint times out after 120 seconds by default.

Use `--no-auth-wall-detect` to disable this automatic detection and skip the checkpoint, navigating headlessly without waiting for user interaction.

Returns: `{ url, status, authWallDetected, checkpointCompleted, snapshot }`

### snapshot - Get Accessibility Tree

Expand Down
Loading
Loading