[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment

## 📊 Current CI/CD Pipeline Status

This repository has a **mature and well-structured CI/CD pipeline** with 56 active workflows across build, test, security, and agentic layers. The pipeline covers a broad range of quality gates from linting to end-to-end smoke tests with live AI engines.

**Recent run health** (last 30 runs):
- ✅ Success: 12 (40%)
- 🟡 Skipped: 10 (33%) — mostly CI Doctor on non-qualifying triggers
- ⚠️ Action Required: 7 (23%) — agentic workflows awaiting approval
- ❌ Failure: 1 (3%) — Smoke Gemini

---

## ✅ Existing Quality Gates

### On Every PR (`pull_request` trigger)
| Workflow | What it checks |
|---|---|
| **Build Verification** (`build.yml`) | TypeScript build + lint on Node 20 & 22 matrix |
| **Lint** (`lint.yml`) | ESLint rules |
| **TypeScript Type Check** (`test-integration.yml`) | `tsc --noEmit` strict type checking |
| **Test Coverage** (`test-coverage.yml`) | Jest unit tests + coverage comparison vs. base branch |
| **PR Title Check** (`pr-title.yml`) | Conventional Commits format enforcement |
| **CodeQL** (`codeql.yml`) | SAST for JS/TS and GitHub Actions |
| **Dependency Vulnerability Audit** (`dependency-audit.yml`) | `npm audit --audit-level=high` |
| **Examples Test** (`test-examples.yml`) | End-to-end examples with real Docker containers |
| **Test Setup Action** (`test-action.yml`) | GitHub Action installation smoke tests |
| **Chroot Integration Tests** (`test-chroot.yml`) | Multi-language chroot integration (4 parallel jobs) |
| **Security Guard** (`security-guard.lock.yml`) | Claude-based AI security review on every PR |
| **Smoke Tests** (Claude/Copilot/Codex/Gemini/Chroot) | Full end-to-end with real AI engines |
| **Build Tests** (Node/Go/Rust/Java/.NET/Bun/Deno/C++) | Language-specific AWF agentic build tests |

### Path-Filtered (runs only when relevant files change)
| Workflow | Trigger condition |
|---|---|
| **Container Security Scan** (`container-scan.yml`) | Only on `containers/**` changes — Trivy scans for CVEs |
| **Smoke Chroot** | Only on `src/**`, `containers/**`, or `package.json` changes |

### Scheduled Quality Checks
- **Daily**: Security review, dependency security monitor, CI/CD gaps assessment, secret diggers (3 engines)
- **Weekly**: CLI flag consistency checker, test coverage improver, dependency audit
- **Hourly**: Issue Monster, Secret Digger (Copilot)

---

## 🔍 Identified Gaps

### 🔴 High Priority

**1. Coverage thresholds are critically low**
Current thresholds: 38% statements, 30% branches, 35% functions, 38% lines. Two major components are severely under-tested:
- `cli.ts`: **0% coverage** (entry point, argument parsing, signal handling)
- `docker-manager.ts`: **18% statement / 4% function coverage** (core container lifecycle logic)

The coverage gate is essentially decorative at 38% — it doesn't protect against major regressions in core functionality.

**2. No shell script linting (ShellCheck)**
The repository contains critical bash scripts in `containers/agent/setup-iptables.sh`, `containers/agent/entrypoint.sh`, `containers/squid/entrypoint.sh`, and multiple scripts in `scripts/ci/`. These set up iptables rules and security boundaries. Shell script bugs can silently fail without syntax checking. There is no `shellcheck` step in any PR workflow.

**3. Container Security Scan is path-filtered and misses base image updates**
The Trivy scan only runs when `containers/**` changes. If a CVE is introduced through a base image (`ubuntu/squid:latest`, `ubuntu:22.04`) after an unrelated PR, it won't be caught in that PR's checks. The weekly schedule is the only backstop.

**4. Missing integration tests for critical AWF features**
Several documented features lack dedicated integration test coverage in CI:
- `--allow-domains localhost` keyword and `host.docker.internal` mapping
- API proxy sidecar configuration (`--api-proxy-*` flags)
- DNS server restriction (`--dns-servers`)
- `--env-all` passthrough behavior
- Exit code propagation from agent container

---

### 🟡 Medium Priority

**5. Smoke Gemini is failing and has no auto-recovery path**
The most recent `workflow_dispatch` run of Smoke Gemini failed. Smoke tests for other engines (Claude, Copilot, Codex) also require manual reaction triggers (❤️, 🎉, 👀) on PRs, which means they only run when a maintainer reacts. This creates inconsistent coverage — a PR might merge without any smoke test having run.

**6. No Dockerfile linting (Hadolint)**
`containers/agent/Dockerfile` and `containers/squid/Dockerfile` are not linted in CI. Hadolint catches common Dockerfile anti-patterns (e.g., `apt-get` without `--no-install-recommends`, improper `COPY` usage, missing `HEALTHCHECK`).

**7. No binary/artifact size monitoring**
There is no tracking of the compiled output size (`dist/`) or Docker image sizes across PRs. A PR that accidentally bundles large dependencies would not be caught. This is particularly relevant since the project produces release binaries (4 platform targets: linux-x64, linux-arm64, darwin-x64, darwin-arm64).

**8. Missing SBOM (Software Bill of Materials) generation**
No SBOM is generated or attested during CI or release. This is increasingly a baseline expectation for security-sensitive tools, particularly one that runs as a firewall for AI agents.

**9. The `test-integration.yml` workflow name is misleading**
It's named "TypeScript Type Check" in the workflow file name but registered as "TypeScript Type Check" — this is fine, but the file name `test-integration.yml` suggests integration tests to developers looking at the file system, creating confusion about what "integration tests" means in this repo.

**10. No required status checks documented or enforced**
With 15+ workflows running on PRs, it's unclear which checks are branch-protection required checks. A PR could theoretically merge if only PR Title Check passes if branch protection is misconfigured. This gap is in the repo configuration, not the workflow code itself.

---

### 🟢 Low Priority

**11. Duplicate lint execution**
Both `build.yml` and `lint.yml` run `npm run lint` on every PR, consuming duplicate CI time (~2 minutes × 2 runners).

**12. No Prettier/code formatting check**
ESLint is enforced, but there is no automated code formatting check (e.g., Prettier). Formatting inconsistencies are addressed manually.

**13. No documentation link validation**
The docs site (`docs-site/`) is deployed via `deploy-docs.yml` but there is no broken-link checker running on PRs that modify documentation.

**14. Test examples skip `github-copilot.sh`**
The Examples Test workflow explicitly skips `github-copilot.sh` because it requires a Copilot token. This leaves the primary use-case example untested in automated CI.

---

## 📋 Actionable Recommendations

### Gap 1: Raise Coverage Thresholds Incrementally
**Solution**: Increase thresholds in `jest.config.js` by 5% per sprint while adding tests for `cli.ts` and `docker-manager.ts`. Target 60% within 2 months, 80% within 6 months.
**Complexity**: Low (threshold change) / High (writing the tests)
**Impact**: Prevents silent regressions in the core container lifecycle and CLI entry point

### Gap 2: Add ShellCheck to PR Workflow
**Solution**: Add a step to `build.yml` or a new `shellcheck.yml`:
````yaml
- name: Run ShellCheck
  uses: ludeeus/action-shellcheck@master
  with:
    scandir: './containers'
    severity: warning
````
**Complexity**: Low
**Impact**: Catches bash bugs in security-critical iptables setup scripts before they reach production

### Gap 3: Run Container Scan on All PRs (Not Just Container Changes)
**Solution**: Remove the `paths:` filter from `container-scan.yml` and add a weekly base-image-only scan. Or add a lightweight scan step in `build.yml` that runs only on PRs.
**Complexity**: Low
**Impact**: Catches CVEs introduced by base image updates on every PR

### Gap 4: Add Integration Tests for Core Features
**Solution**: Expand `tests/integration/` with test files for localhost keyword, DNS filtering, and exit code propagation. These can run in `test-chroot.yml` or a new `test-integration-features.yml`.
**Complexity**: High
**Impact**: Validates correctness of the firewall's most-used features

### Gap 5: Make Key Smoke Tests Automatic (Not Reaction-Triggered)
**Solution**: Change the primary smoke test (e.g., Smoke Copilot) to run automatically on all PRs touching `src/**` or `containers/**` without requiring a reaction trigger.
**Complexity**: Low (configuration change)
**Impact**: Ensures end-to-end validation runs on every meaningful code change

### Gap 6: Add Hadolint to Container Builds
**Solution**: Add a `hadolint` step before `docker build` in `container-scan.yml` and `build.yml`:
````yaml
- name: Lint Dockerfiles
  uses: hadolint/hadolint-action@54c9adbab1582c2ef04b2016b760714a4bfde3cf
  with:
    dockerfile: containers/agent/Dockerfile
````
**Complexity**: Low
**Impact**: Enforces Dockerfile best practices, reduces image size and security surface

### Gap 7: Add Build Artifact Size Check
**Solution**: Add a step in `build.yml` that reports `dist/` size and fails if it grows by >10% from baseline:
````yaml
- name: Check dist size
  run: |
    SIZE=$(du -sk dist/ | cut -f1)
    echo "dist size: \$\{SIZE}KB"
    # Optional: compare against known baseline
````
**Complexity**: Low
**Impact**: Prevents accidental bundle bloat in release binaries

### Gap 8: Generate SBOM in Release Workflow
**Solution**: Add `anchore/sbom-action` to `release.yml` to generate and attest an SBOM for each release.
**Complexity**: Low
**Impact**: Aligns with supply chain security best practices for a security-critical tool

---

## 📈 Metrics Summary

| Metric | Value |
|---|---|
| Total active workflows | 56 |
| Workflows on PRs | ~15 automatic + up to 5 reaction-triggered smoke tests |
| Unit test count | 135 passing |
| Statement coverage | 38.39% (threshold: 38%) |
| Branch coverage | 31.78% (threshold: 30%) |
| `cli.ts` coverage | **0%** |
| `docker-manager.ts` coverage | **18%** |
| `logger.ts` / `squid-config.ts` coverage | **100%** |
| Recent run success rate (last 30) | 40% success, 33% skipped, 23% action-required, 3% failure |
| Scheduled security scans | Daily (3 secret diggers + security review + dep monitor) |
| SARIF security reports | CodeQL + Trivy (containers) + npm audit → GitHub Security tab |

The pipeline is **broad and security-conscious** but has a **coverage depth problem** — the most important runtime logic (`docker-manager.ts`, `cli.ts`) is nearly untested at the unit level, relying primarily on end-to-end smoke tests for validation. Strengthening the unit test layer would significantly improve confidence in PRs before they reach the smoke test stage.

---

> **Note:** This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
>
> **Tip:** Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.




> Generated by [CI/CD Pipelines and Integration Tests Gap Assessment](https://github.com/github/gh-aw-firewall/actions/runs/22202365985)
> - [x] expires  on Feb 26, 2026, 10:21 PM UTC

Workflow	What it checks
Build Verification (`build.yml`)	TypeScript build + lint on Node 20 & 22 matrix
Lint (`lint.yml`)	ESLint rules
TypeScript Type Check (`test-integration.yml`)	`tsc --noEmit` strict type checking
Test Coverage (`test-coverage.yml`)	Jest unit tests + coverage comparison vs. base branch
PR Title Check (`pr-title.yml`)	Conventional Commits format enforcement
CodeQL (`codeql.yml`)	SAST for JS/TS and GitHub Actions
Dependency Vulnerability Audit (`dependency-audit.yml`)	`npm audit --audit-level=high`
Examples Test (`test-examples.yml`)	End-to-end examples with real Docker containers
Test Setup Action (`test-action.yml`)	GitHub Action installation smoke tests
Chroot Integration Tests (`test-chroot.yml`)	Multi-language chroot integration (4 parallel jobs)
Security Guard (`security-guard.lock.yml`)	Claude-based AI security review on every PR
Smoke Tests (Claude/Copilot/Codex/Gemini/Chroot)	Full end-to-end with real AI engines
Build Tests (Node/Go/Rust/Java/.NET/Bun/Deno/C++)	Language-specific AWF agentic build tests

Workflow	Trigger condition
Container Security Scan (`container-scan.yml`)	Only on `containers/**` changes — Trivy scans for CVEs
Smoke Chroot	Only on `src/`, `containers/`, or `package.json` changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #979

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

On Every PR (`pull_request` trigger)

Path-Filtered (runs only when relevant files change)

Scheduled Quality Checks

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

Gap 1: Raise Coverage Thresholds Incrementally

Gap 2: Add ShellCheck to PR Workflow

Gap 3: Run Container Scan on All PRs (Not Just Container Changes)

Gap 4: Add Integration Tests for Core Features

Gap 5: Make Key Smoke Tests Automatic (Not Reaction-Triggered)

Gap 6: Add Hadolint to Container Builds

Gap 7: Add Build Artifact Size Check

Gap 8: Generate SBOM in Release Workflow

📈 Metrics Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value
Total active workflows	56
Workflows on PRs	~15 automatic + up to 5 reaction-triggered smoke tests
Unit test count	135 passing
Statement coverage	38.39% (threshold: 38%)
Branch coverage	31.78% (threshold: 30%)
`cli.ts` coverage	0%
`docker-manager.ts` coverage	18%
`logger.ts` / `squid-config.ts` coverage	100%
Recent run success rate (last 30)	40% success, 33% skipped, 23% action-required, 3% failure
Scheduled security scans	Daily (3 secret diggers + security review + dep monitor)
SARIF security reports	CodeQL + Trivy (containers) + npm audit → GitHub Security tab

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #979

Description

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

On Every PR (pull_request trigger)

Path-Filtered (runs only when relevant files change)

Scheduled Quality Checks

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

Gap 1: Raise Coverage Thresholds Incrementally

Gap 2: Add ShellCheck to PR Workflow

Gap 3: Run Container Scan on All PRs (Not Just Container Changes)

Gap 4: Add Integration Tests for Core Features

Gap 5: Make Key Smoke Tests Automatic (Not Reaction-Triggered)

Gap 6: Add Hadolint to Container Builds

Gap 7: Add Build Artifact Size Check

Gap 8: Generate SBOM in Release Workflow

📈 Metrics Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

On Every PR (`pull_request` trigger)