-
Notifications
You must be signed in to change notification settings - Fork 11
Description
📊 Current CI/CD Pipeline Status
This repository has a mature and well-structured CI/CD pipeline with 56 active workflows across build, test, security, and agentic layers. The pipeline covers a broad range of quality gates from linting to end-to-end smoke tests with live AI engines.
Recent run health (last 30 runs):
- ✅ Success: 12 (40%)
- 🟡 Skipped: 10 (33%) — mostly CI Doctor on non-qualifying triggers
⚠️ Action Required: 7 (23%) — agentic workflows awaiting approval- ❌ Failure: 1 (3%) — Smoke Gemini
✅ Existing Quality Gates
On Every PR (pull_request trigger)
| Workflow | What it checks |
|---|---|
Build Verification (build.yml) |
TypeScript build + lint on Node 20 & 22 matrix |
Lint (lint.yml) |
ESLint rules |
TypeScript Type Check (test-integration.yml) |
tsc --noEmit strict type checking |
Test Coverage (test-coverage.yml) |
Jest unit tests + coverage comparison vs. base branch |
PR Title Check (pr-title.yml) |
Conventional Commits format enforcement |
CodeQL (codeql.yml) |
SAST for JS/TS and GitHub Actions |
Dependency Vulnerability Audit (dependency-audit.yml) |
npm audit --audit-level=high |
Examples Test (test-examples.yml) |
End-to-end examples with real Docker containers |
Test Setup Action (test-action.yml) |
GitHub Action installation smoke tests |
Chroot Integration Tests (test-chroot.yml) |
Multi-language chroot integration (4 parallel jobs) |
Security Guard (security-guard.lock.yml) |
Claude-based AI security review on every PR |
| Smoke Tests (Claude/Copilot/Codex/Gemini/Chroot) | Full end-to-end with real AI engines |
| Build Tests (Node/Go/Rust/Java/.NET/Bun/Deno/C++) | Language-specific AWF agentic build tests |
Path-Filtered (runs only when relevant files change)
| Workflow | Trigger condition |
|---|---|
Container Security Scan (container-scan.yml) |
Only on containers/** changes — Trivy scans for CVEs |
| Smoke Chroot | Only on src/**, containers/**, or package.json changes |
Scheduled Quality Checks
- Daily: Security review, dependency security monitor, CI/CD gaps assessment, secret diggers (3 engines)
- Weekly: CLI flag consistency checker, test coverage improver, dependency audit
- Hourly: Issue Monster, Secret Digger (Copilot)
🔍 Identified Gaps
🔴 High Priority
1. Coverage thresholds are critically low
Current thresholds: 38% statements, 30% branches, 35% functions, 38% lines. Two major components are severely under-tested:
cli.ts: 0% coverage (entry point, argument parsing, signal handling)docker-manager.ts: 18% statement / 4% function coverage (core container lifecycle logic)
The coverage gate is essentially decorative at 38% — it doesn't protect against major regressions in core functionality.
2. No shell script linting (ShellCheck)
The repository contains critical bash scripts in containers/agent/setup-iptables.sh, containers/agent/entrypoint.sh, containers/squid/entrypoint.sh, and multiple scripts in scripts/ci/. These set up iptables rules and security boundaries. Shell script bugs can silently fail without syntax checking. There is no shellcheck step in any PR workflow.
3. Container Security Scan is path-filtered and misses base image updates
The Trivy scan only runs when containers/** changes. If a CVE is introduced through a base image (ubuntu/squid:latest, ubuntu:22.04) after an unrelated PR, it won't be caught in that PR's checks. The weekly schedule is the only backstop.
4. Missing integration tests for critical AWF features
Several documented features lack dedicated integration test coverage in CI:
--allow-domains localhostkeyword andhost.docker.internalmapping- API proxy sidecar configuration (
--api-proxy-*flags) - DNS server restriction (
--dns-servers) --env-allpassthrough behavior- Exit code propagation from agent container
🟡 Medium Priority
5. Smoke Gemini is failing and has no auto-recovery path
The most recent workflow_dispatch run of Smoke Gemini failed. Smoke tests for other engines (Claude, Copilot, Codex) also require manual reaction triggers (❤️, 🎉, 👀) on PRs, which means they only run when a maintainer reacts. This creates inconsistent coverage — a PR might merge without any smoke test having run.
6. No Dockerfile linting (Hadolint)
containers/agent/Dockerfile and containers/squid/Dockerfile are not linted in CI. Hadolint catches common Dockerfile anti-patterns (e.g., apt-get without --no-install-recommends, improper COPY usage, missing HEALTHCHECK).
7. No binary/artifact size monitoring
There is no tracking of the compiled output size (dist/) or Docker image sizes across PRs. A PR that accidentally bundles large dependencies would not be caught. This is particularly relevant since the project produces release binaries (4 platform targets: linux-x64, linux-arm64, darwin-x64, darwin-arm64).
8. Missing SBOM (Software Bill of Materials) generation
No SBOM is generated or attested during CI or release. This is increasingly a baseline expectation for security-sensitive tools, particularly one that runs as a firewall for AI agents.
9. The test-integration.yml workflow name is misleading
It's named "TypeScript Type Check" in the workflow file name but registered as "TypeScript Type Check" — this is fine, but the file name test-integration.yml suggests integration tests to developers looking at the file system, creating confusion about what "integration tests" means in this repo.
10. No required status checks documented or enforced
With 15+ workflows running on PRs, it's unclear which checks are branch-protection required checks. A PR could theoretically merge if only PR Title Check passes if branch protection is misconfigured. This gap is in the repo configuration, not the workflow code itself.
🟢 Low Priority
11. Duplicate lint execution
Both build.yml and lint.yml run npm run lint on every PR, consuming duplicate CI time (~2 minutes × 2 runners).
12. No Prettier/code formatting check
ESLint is enforced, but there is no automated code formatting check (e.g., Prettier). Formatting inconsistencies are addressed manually.
13. No documentation link validation
The docs site (docs-site/) is deployed via deploy-docs.yml but there is no broken-link checker running on PRs that modify documentation.
14. Test examples skip github-copilot.sh
The Examples Test workflow explicitly skips github-copilot.sh because it requires a Copilot token. This leaves the primary use-case example untested in automated CI.
📋 Actionable Recommendations
Gap 1: Raise Coverage Thresholds Incrementally
Solution: Increase thresholds in jest.config.js by 5% per sprint while adding tests for cli.ts and docker-manager.ts. Target 60% within 2 months, 80% within 6 months.
Complexity: Low (threshold change) / High (writing the tests)
Impact: Prevents silent regressions in the core container lifecycle and CLI entry point
Gap 2: Add ShellCheck to PR Workflow
Solution: Add a step to build.yml or a new shellcheck.yml:
- name: Run ShellCheck
uses: ludeeus/action-shellcheck@master
with:
scandir: './containers'
severity: warningComplexity: Low
Impact: Catches bash bugs in security-critical iptables setup scripts before they reach production
Gap 3: Run Container Scan on All PRs (Not Just Container Changes)
Solution: Remove the paths: filter from container-scan.yml and add a weekly base-image-only scan. Or add a lightweight scan step in build.yml that runs only on PRs.
Complexity: Low
Impact: Catches CVEs introduced by base image updates on every PR
Gap 4: Add Integration Tests for Core Features
Solution: Expand tests/integration/ with test files for localhost keyword, DNS filtering, and exit code propagation. These can run in test-chroot.yml or a new test-integration-features.yml.
Complexity: High
Impact: Validates correctness of the firewall's most-used features
Gap 5: Make Key Smoke Tests Automatic (Not Reaction-Triggered)
Solution: Change the primary smoke test (e.g., Smoke Copilot) to run automatically on all PRs touching src/** or containers/** without requiring a reaction trigger.
Complexity: Low (configuration change)
Impact: Ensures end-to-end validation runs on every meaningful code change
Gap 6: Add Hadolint to Container Builds
Solution: Add a hadolint step before docker build in container-scan.yml and build.yml:
- name: Lint Dockerfiles
uses: hadolint/hadolint-action@54c9adbab1582c2ef04b2016b760714a4bfde3cf
with:
dockerfile: containers/agent/DockerfileComplexity: Low
Impact: Enforces Dockerfile best practices, reduces image size and security surface
Gap 7: Add Build Artifact Size Check
Solution: Add a step in build.yml that reports dist/ size and fails if it grows by >10% from baseline:
- name: Check dist size
run: |
SIZE=$(du -sk dist/ | cut -f1)
echo "dist size: \$\{SIZE}KB"
# Optional: compare against known baselineComplexity: Low
Impact: Prevents accidental bundle bloat in release binaries
Gap 8: Generate SBOM in Release Workflow
Solution: Add anchore/sbom-action to release.yml to generate and attest an SBOM for each release.
Complexity: Low
Impact: Aligns with supply chain security best practices for a security-critical tool
📈 Metrics Summary
| Metric | Value |
|---|---|
| Total active workflows | 56 |
| Workflows on PRs | ~15 automatic + up to 5 reaction-triggered smoke tests |
| Unit test count | 135 passing |
| Statement coverage | 38.39% (threshold: 38%) |
| Branch coverage | 31.78% (threshold: 30%) |
cli.ts coverage |
0% |
docker-manager.ts coverage |
18% |
logger.ts / squid-config.ts coverage |
100% |
| Recent run success rate (last 30) | 40% success, 33% skipped, 23% action-required, 3% failure |
| Scheduled security scans | Daily (3 secret diggers + security review + dep monitor) |
| SARIF security reports | CodeQL + Trivy (containers) + npm audit → GitHub Security tab |
The pipeline is broad and security-conscious but has a coverage depth problem — the most important runtime logic (docker-manager.ts, cli.ts) is nearly untested at the unit level, relying primarily on end-to-end smoke tests for validation. Strengthening the unit test layer would significantly improve confidence in PRs before they reach the smoke test stage.
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by CI/CD Pipelines and Integration Tests Gap Assessment
- expires on Feb 26, 2026, 10:21 PM UTC