E2E tests use Playwright (@playwright/test) and run against the full application stack
(API, Web UI, Kratos auth, Retrack, Postgres) served via Docker Compose at http://localhost:7171.
Tests live in e2e/tests/ and are named *.spec.ts.
# Start the full e2e stack (all services in Docker)
make e2e-up # add BUILD=1 to rebuild images
# Run all tests
make e2e-test
# Run a specific test file
make e2e-test ARGS="tests/registration.spec.ts"
# Run in headed mode (opens a browser)
make e2e-test ARGS="--headed"
# Open the Playwright UI runner
make e2e-test ARGS="--ui"
# View the HTML report after a run
make e2e-report
# Tear down the stack
make e2e-downTo check whether a test is reliably passing, run it in a loop. Both e2e-test and
docs-screenshots have loop variants that accept the same ARGS plus a RUNS count
(default 10):
# Run a specific e2e test 20 times
make e2e-test-loop ARGS="tests/registration.spec.ts" RUNS=20
# Run docs screenshot tests 20 times
make docs-screenshots-loop ARGS="docs/csp.spec.ts" RUNS=20Each run streams PASS / FAIL to the terminal. On failure, the full Playwright log is
written to /tmp/e2e-loop-results/run-N.log and any failure screenshots / traces are
copied to /tmp/e2e-loop-results/artifacts-run-N/.
When the agent cannot run tests directly (e.g. due to sandbox restrictions on the
Playwright browser binary), ask the user to run the loop command and share the results.
The agent can then read the log files and failure screenshots directly from
/tmp/e2e-loop-results/ to diagnose failures:
# User runs this, then the agent reads /tmp/e2e-loop-results/ to debug
make docs-screenshots-loop ARGS="docs/csp.spec.ts" RUNS=20Choose locators in this order of preference. Prefer semantic, user-visible locators that mirror how a real user perceives the page.
-
getByRole- first choice for buttons, headings, links, and other ARIA-role elements. Usenameto disambiguate andexact: truewhen the label is a substring of another element's label.page.getByRole('button', { name: 'Sign up', exact: true }); page.getByRole('heading', { name: 'Welcome', level: 2 });
-
getByPlaceholder- preferred for form inputs that carry placeholder text. Useexact: truewhen a shorter placeholder is a prefix of another (e.g. "Password" vs "Repeat password").page.getByPlaceholder('Email'); page.getByPlaceholder('Password', { exact: true });
-
getByText- fallback for elements that have visible text but no clear ARIA role or placeholder (e.g. menu items, labels).page.getByText('Sign out');
-
Raw
locator(CSS/XPath) - last resort for cases where no semantic locator is practical, such as checking that any form control is present without caring about exact text.page.locator('input[name="password"], input[name="identifier"], form');
The Docker-based stack (Kratos auth + API + Web UI) can be slow to respond, especially on the first load. Use explicit timeouts on visibility and URL assertions:
toBeVisible({ timeout: 15000 })- for elements that depend on the page fully rendering or an auth redirect completing.toHaveURL(pattern, { timeout: 30000 })- for navigations that involve server-side processing (registration, login).
Do not add timeouts to assertions that follow an already-awaited element on the same page.
- Group related tests with
test.describe. - Use
test.beforeEachfor setup that must run before every test in the group (e.g., cleaning up test users via the API). - Keep one test file per feature area (e.g.
registration.spec.ts,app.spec.ts).
Tests that create server-side state (users, resources) must clean up in beforeEach so each
run starts from a known state. Use request (Playwright's built-in API context) to call
internal API endpoints directly:
test.beforeEach(async ({ request }) => {
await request.post('/api/users/remove', {
headers: { Authorization: `Bearer ${OPERATOR_TOKEN}` },
data: { email: EMAIL },
});
});Define credentials and operator tokens as module-level constants at the top of the file.
When UI assertions alone are not enough, make API calls within the test to verify
server-side state through page.request:
const stateResponse = await page.request.get('/api/ui/state');
expect(stateResponse.ok()).toBeTruthy();
const state = await stateResponse.json();
expect(state.user.email).toBe(EMAIL);| What to check | Assertion |
|---|---|
| Element is on screen | await expect(el).toBeVisible({ timeout: … }) |
| Current URL | await expect(page).toHaveURL(/pattern/) |
| Page title | await expect(page).toHaveTitle(/pattern/) |
| API response OK | expect(response.ok()).toBeTruthy() |
| JSON field exists | expect(body).toHaveProperty('key') |
| JSON field value | expect(body.field).toBe(value) |
The e2e project uses ESLint + Prettier with these key rules:
- Max line length: 120 characters (strings and template literals exempt).
- Use
typeimports (import type { … }) where possible - enforced byconsistent-type-imports. - Import order: builtins, externals, internals, then parent/sibling/index - alphabetized, separated by blank lines.
- No unused variables or expressions.
Docs screenshot tests generate screenshots used in the documentation site
(components/secutils-docs/). Each test file in e2e/docs/ corresponds to a guide topic
(e.g. csp.spec.ts, webhooks.spec.ts, digital_certificates.spec.ts,
web_scraping.spec.ts). Screenshots are saved directly into
components/secutils-docs/static/img/docs/guides/<topic>/.
# Run all docs screenshot tests
make docs-screenshots
# Run a specific file
make docs-screenshots ARGS="docs/csp.spec.ts"
# Run a single test by name
make docs-screenshots ARGS="docs/csp.spec.ts -g 'test a content security policy'"All docs tests import from helpers.ts. Key exports:
| Helper | Purpose |
|---|---|
ensureUserAndLogin(request, page) |
Remove existing user, register a fresh one, and log in. |
goto(page, url) |
Navigate, inject stability CSS, and patch page.screenshot() for determinism. |
highlightOn(locator) |
Add a red dashed outline around an element for visual emphasis. |
highlightOff(locator) |
Remove the highlight outline. |
dismissAllToasts(page) |
Dismiss every visible toast notification (iterate all, not just one). |
pinEntityTimestamps(json) |
Replace createdAt/updatedAt with FIXED_ENTITY_TIMESTAMP in a JSON value. |
fixEntityTimestamps(page, pattern) |
Set up a route handler that pins timestamps in GET JSON responses matching pattern. |
fixResponderRequestFields(page) |
Intercept responder request history API and pin createdAt/clientAddress to fixed values. |
fixCertificateTemplateValidityDates(page) |
Pin notValidBefore/notValidAfter to fixed dates while preserving their duration. |
fixTrackerResourceRevisions(page) |
Stabilize tracker revision history: strip URL query strings, normalize webhook subdomains, compute deterministic sizes, fix timestamps. |
Screenshots must be byte-identical across runs. The stability system has multiple layers
that work together automatically when using goto():
These apply to every screenshot without any test-level code:
-
CSS injection -
goto()injects a<style>tag after navigation that:- Disables all CSS animations and transitions (
animation-duration: 0s; transition-duration: 0s). - Forces greyscale anti-aliasing (
-webkit-font-smoothing: antialiased; text-rendering: geometricPrecision) - reduces font rendering variance from ±8 to ±1. - Forces icon buttons and toggle switches into GPU compositing layers (
.euiButtonIcon, .euiSwitch__body { will-change: transform }) - reduces SVG/toggle rendering variance from ±24 to ±1. - Hides Monaco editor non-deterministic elements (cursor layer, minimap, decorations overview ruler, scroll decoration).
- Hides the system text caret (
caret-color: transparent). - Hides scrollbars (
::-webkit-scrollbar { width: 0; height: 0 }).
- Disables all CSS animations and transitions (
-
Pre-screenshot stabilization -
waitForStableUiBeforeScreenshot()runs before everypage.screenshot()call and:- Waits for
domcontentloadedandnetworkidle(with 5 s timeout). - Waits for all EUI icons to finish loading (
.euiIcon[data-is-loading="true"]). - Waits for all web fonts to reach
loadedstatus (document.fonts.status). - Normalizes webhook URLs in the DOM - replaces user-specific UUIDs in
/api/webhooks/u/<uuid>/with/api/webhooks/u/preview/in links, input values, code blocks, and data grid popovers. - Waits three animation frames for layout/paint/composite to settle.
- Waits for
-
Sticky-pixel screenshot stabilization -
stabilizeScreenshot()runs after everypage.screenshot(). Before the screenshot is taken, the existing file on disk (if any) is saved as a byte buffer. After capturing, both the reference and new PNGs are decoded to raw RGBA pixels withpngjs(PNG.sync.read). If every channel value in the new image is within ±1 of the reference (MAX_CHANNEL_DIFF), the image has not meaningfully changed - the original reference bytes are written back verbatim, producing zero diff. This absorbs non-deterministic sub-pixel anti-aliasing jitter from Chromium's GPU compositor between browser sessions. When any pixel genuinely differs (channel diff > 1) or the dimensions changed, the new Playwright file is kept as-is and becomes the new baseline for future runs.
Each source of dynamic data needs explicit stabilization in the test code:
- Timestamps / dates: Intercept the API response with
page.route()and replace dynamic timestamps withFIXED_ENTITY_TIMESTAMP(epoch1740000000, renders as "February 19, 2025" - deliberately >3 days old so the UI shows an absolute date instead of a relative string like "a few seconds ago"). - Client addresses: Pin to a fixed value like
172.18.0.1:12345. - CSP nonces: Intercept responses and replace rotating nonces with a fixed value
(e.g.
nonce-m0ck). - URL query strings: Strip random cache-buster parameters from resource URLs.
- Webhook subdomains: Normalize user-specific subdomains to a fixed value
(e.g.
preview.webhooks.secutils.dev). - Cryptographic output (JWK values, key exports): Replace dynamic fields via
element.evaluate()after the UI renders them. - Home page summary: Intercept
/api/ui/home/summaryand callpinEntityTimestamps()onrecentItemsto avoid relative time strings.
General pattern for stabilization - intercept with page.route(), call route.fetch()
to get the real response, mutate the JSON, then route.fulfill({ response, json }):
await page.route('**/api/some/endpoint', async (route) => {
const response = await route.fetch();
const json = await response.json();
json.dynamicField = 'fixed-value';
await route.fulfill({ response, json });
});When a page.route() handler may receive non-array responses (e.g. POST refresh vs GET
list), always guard with if (!Array.isArray(json)) before iterating.
When screenshots are clipped to a bounding box (e.g. tooltip + section), round coordinates to whole pixels and use generous padding to absorb sub-pixel layout jitter:
const PAD = 16;
const x = Math.floor(Math.min(sectionBox.x, tooltipBox.x)) - PAD;
const y = Math.floor(Math.min(sectionBox.y, tooltipBox.y)) - PAD;
const right = Math.ceil(Math.max(sectionBox.x + sectionBox.width, tooltipBox.x + tooltipBox.width)) + PAD;
const bottom = Math.ceil(Math.max(sectionBox.y + sectionBox.height, tooltipBox.y + tooltipBox.height)) + PAD;
await page.screenshot({ path, clip: { x, y, width: right - x, height: bottom - y } });When screenshots differ between runs, use the comparison tooling to diagnose:
# Run docs screenshots twice and diff all PNGs (pixel + byte level)
make docs-screenshots-diff
# Or for a single spec file:
make docs-screenshots-diff ARGS="docs/csp.spec.ts"
# Analyze diffs with detailed per-file report (pixel counts, regions, categories)
make docs-screenshots-analyzeThe tools output to /tmp/screenshot-diff/:
| Path | Contents |
|---|---|
run-a/, run-b/ |
PNG snapshots from each run |
diffs/ |
ImageMagick visual diff images (red = changed pixels) |
analysis/ |
Python-annotated diff images with bounding boxes |
report.txt |
Summary with per-file pixel diff counts and byte sizes |
analysis-report.json |
Detailed JSON: pixel counts, bounding boxes, diff categories |
run-a.log, run-b.log |
Full Playwright output from each run |
Workflow for diagnosing instability:
- Run
make docs-screenshots-diffto produce two runs of screenshots. - Run
make docs-screenshots-analyzeto get a detailed report. - Check the report categories:
Byte-identical- no action needed.Byte-diff only (0 pixel diffs)- DEFLATE compression non-determinism (should be resolved byreEncodePngDeterministic; if it re-appears, check for PNG chunk changes).- Files with pixel diffs > 0 - need investigation (see below).
- For files with pixel diffs, run a deep pixel analysis to locate the exact element:
# In Python (or inline via shell): from PIL import Image a = Image.open('/tmp/screenshot-diff/run-a/<file>').convert('RGBA') b = Image.open('/tmp/screenshot-diff/run-b/<file>').convert('RGBA') for i, (pa, pb) in enumerate(zip(a.tobytes(), b.tobytes())): if pa != pb: px = (i // 4) % a.size[0]; py = (i // 4) // a.size[0] print(f'({px},{py}) {"RGBA"[i%4]}: {pa}->{pb} delta={pb-pa}')
- Crop the diff region (
Image.crop()) and view it to identify the UI element. - Apply the appropriate fix from the troubleshooting table.
- Use the loop command to verify a fix is stable:
make docs-screenshots-loop ARGS="docs/csp.spec.ts" RUNS=10
Expected residual instability: With sticky-pixel stabilization, all 159 screenshots should be byte-identical across runs. If new screenshots are added without an existing reference file on disk, the first run establishes the baseline; subsequent runs converge.
Common instability patterns and their solutions:
| Symptom | Likely Cause | Fix |
|---|---|---|
| Byte-diff but no pixel diff | PNG DEFLATE non-determinism or ±1 AA jitter | stabilizeScreenshot() (automatic - restores reference file) |
| Text changes between runs | Relative timestamps ("a few seconds ago") | fixEntityTimestamps() or pinEntityTimestamps() |
| URL segments differ | User-specific webhook UUIDs | Automatic DOM normalization in waitForStableUiBeforeScreenshot |
| ±1 diffs at icon/text edges | Sub-pixel anti-aliasing between browser runs | Handled by sticky-pixel stabilization (automatic) |
| Thin line diffs at edges | Scrollbar visibility | Hidden by stability CSS (::-webkit-scrollbar) |
| Monaco editor differences | Cursor, minimap, decorations | Hidden by stability CSS |
| Clipped region shifts | Tooltip/bounding box sub-pixel jitter | Use Math.floor/Math.ceil + generous padding |
| Animation artifacts | CSS transitions captured in screenshot | addStyleTag after goto() disables transitions before screenshots |
Important: Do NOT use addInitScript to inject stability CSS. Injecting
transition-duration: 0s before the React app renders prevents transitionend events from
firing during EUI component initialization, causing the page to never finish loading. The CSS
must be injected AFTER navigation via addStyleTag so initial transitions complete normally.
Each test follows a consistent step-based pattern:
- Step 1: Navigate to the relevant page, highlight the primary action button (e.g. "Create responder", "Track page"), and take a screenshot of the empty/initial state.
- Create entity - either via the UI form (for simple fields like Name, Path, Body textarea) or via API (for complex inputs like Monaco editor scripts). When using the API, reload the page afterward and open the Edit flyout to screenshot the pre-filled form.
- Subsequent steps: Show the created entity in the grid, expand rows, click action buttons, and screenshot each meaningful state.
Screenshot naming convention: {section}_step{N}_{description}.png, e.g.
html_step2_form.png, detect_resources_step7_responders_created.png.
The Monaco code editor (used for Script and Content extractor fields) cannot be reliably
filled via Playwright's .fill() or .pressSequentially() - it times out or produces
syntax errors. Instead:
- Create the entity via
page.request.post()API with the script in the request body. - Reload the page, find the row, click Edit.
- Scroll to the script section with
flyout.getByText('...').scrollIntoViewIfNeeded(). - Screenshot the pre-filled form.
- Name / Path / Body textarea: Use
locator.fill(value). - Body textarea scroll: After filling, reset scroll with
bodyTextarea.evaluate((el) => (el.scrollTop = 0))so the screenshot shows the top. - Headers combo box: Remove the default header first with
flyout.getByRole('button', { name: /Remove Content-Type/ }).click(), then fill and press Enter on the combo box. - Combo boxes with substring labels: Use
{ exact: true }when one label is a prefix of another (e.g. "Key usage" vs "Extended key usage", "Export passphrase" vs "Repeat export passphrase"). - Flyout close: After screenshotting a form, close with
flyout.getByRole('button', { name: 'Close' }).click()and assertexpect(flyout).not.toBeVisible(). - Toast dismissal: Call
dismissAllToasts(page)after any save operation that triggers a success toast, before taking the next screenshot. - EUI actions column: When a grid row has a collapsed actions menu, click
row.getByRole('button', { name: 'All actions, row' })first, then select the action from the context menu scoped to the dialog/popover.
Each guide section in the .mdx files uses the <Steps> component with <CodeBlock> for
configuration tables. The pattern is:
import Steps from '@site/src/components/Steps';
import CodeBlock from '@theme/CodeBlock';
<Steps steps={[
{
img: '../../img/docs/guides/<topic>/<screenshot>.png',
caption: <>Navigate to ... and click <b>Action</b>.</>,
alt: 'Description for accessibility.',
},
{
img: '../../img/docs/guides/<topic>/<screenshot>.png',
caption: <>Fill in the form and click <b>Save</b>.<br/><br/>
<table className="su-table">
<tbody>
<tr><td><b>Name</b></td><td><CodeBlock>value</CodeBlock></td></tr>
<tr><td><b>Body</b></td><td><CodeBlock language="html">{`<html>...</html>`}</CodeBlock></td></tr>
</tbody>
</table></>,
alt: 'Fill in the form.',
},
]} />Key rules for MDX:
- Image paths in
<Steps>use relative paths from the.mdxfile to theimg/directory (e.g.../../img/docs/guides/...). - Inline markdown images (
) use absolute paths from thestatic/directory - Docusaurus resolves them differently. - Escape template literal backticks and
${}expressions inside<CodeBlock>JSX strings (e.g.\`...\`,\${...}). - Do not nest JSX components like
<CodeBlock>inside markdown numbered lists - the MDX parser cannot handle it. Use<Steps>or bold-text step numbers instead.