[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment

## 📊 Current CI/CD Pipeline Status

The repository has a **mature and layered CI/CD setup** combining standard GitHub Actions workflows with agentic (AI-driven) workflows. Overall, the pipeline is healthy: all standard PR checks are passing at 100%, and most agentic build-test workflows succeed.

**Workflow inventory (43 total):**

| Category | Count | Trigger |
|---|---|---|
| Standard PR quality checks | 10 | `pull_request` |
| Agentic build-test integrations | 8 | `pull_request` |
| Agentic smoke tests (AI engines) | 5 | `pull_request` + schedule |
| Security scans | 3 | `pull_request` + schedule |
| Documentation / release / maintenance | 17 | push / schedule / dispatch |

**Notable status from recent runs:**
- All standard PR workflows: ✅ 100% pass rate
- Smoke Gemini: ❌ 0% (1/1 failed) on recent PRs
- Smoke Chroot: ❌ 0% (1/1 failed) on recent PRs
- Smoke Codex: ⚠️ 50% pass on recent PRs
- CI Doctor (scheduled): ❌ 0% (0/6 runs passing)

---

## ✅ Existing Quality Gates

The following checks run on every PR:

| Check | Workflow | What It Validates |
|---|---|---|
| PR Title Format | `pr-title.yml` | Conventional commits (type, lowercase, allowed scopes) |
| Build Verification | `build.yml` | TypeScript compiles on Node 20 & 22 (matrix) |
| Linting | `lint.yml` | ESLint rules |
| TypeScript Type Check | `test-integration.yml`¹ | `tsc --noEmit` strict check |
| Test Coverage | `test-coverage.yml` | Unit tests + coverage regression gate vs base branch |
| CodeQL | `codeql.yml` | Static analysis for JS/TS and Actions |
| Dependency Audit | `dependency-audit.yml` | `npm audit --audit-level=high` on main + docs packages |
| Examples Test | `test-examples.yml` | Runs 4 real `awf` shell examples end-to-end |
| Action Self-Test | `test-action.yml` | Tests the setup action (4 scenarios incl. invalid version) |
| Chroot Integration | `test-chroot.yml` | 4 parallel chroot test suites (languages, pkg managers, procfs, edge cases) |
| Agentic Security Guard | `security-guard.lock.yml` | AI-powered PR review for security-weakening changes (Claude) |
| Agentic Build Tests | 8 `build-test-*.lock.yml` | Real build+test of downstream projects using each runtime through the firewall |
| Smoke Tests | 5 `smoke-*.lock.yml` | Full AI agent runs (Claude, Codex, Copilot, Gemini, Chroot) |
| Container Scan | `container-scan.yml`² | Trivy CRITICAL/HIGH CVE scan of agent + squid images |

¹ File is named `test-integration.yml` but only contains a TypeScript type check — misleading name.  
² Only runs when `containers/**` files change.

---

## 🔍 Identified Gaps

### 🔴 High Priority

#### 1. Integration test suite is not triggered on PRs (only chroot tests are)
The repository has 26 integration test files covering critical functionality — blocked domains, network security, credential hiding, API proxy, DNS, IPv6, exit codes, volume mounts, and more. **None of these run on PRs except the chroot subset.** A PR breaking `blocked-domains.test.ts` or `network-security.test.ts` would be merged without detection.

**Affected files:**
- `tests/integration/blocked-domains.test.ts`
- `tests/integration/network-security.test.ts`
- `tests/integration/credential-hiding.test.ts`
- `tests/integration/api-proxy.test.ts`
- `tests/integration/dns-servers.test.ts`
- `tests/integration/ipv6.test.ts`
- …and 20 more

#### 2. Unit test coverage is critically low for the most important files
Coverage thresholds are set far below acceptable levels for a security tool:

| File | Lines | Branches | Functions | Risk |
|---|---|---|---|---|
| `docker-manager.ts` (250 lines — core logic) | **18%** | **22%** | **4%** | 🔴 Critical |
| `cli.ts` (69 lines — entry point) | **0%** | **0%** | **0%** | 🔴 Critical |
| `domain-patterns.ts` | Unknown | Unknown | Unknown | ⚠️ |

Thresholds are: 38% lines, 30% branches, 35% functions — far too low for a firewall. A security bypass introduced in `docker-manager.ts` has only an 18% chance of being caught by a unit test.

#### 3. `security-guard.md` and `build-test-java.md` are not compiled
Both show `compiled: No` in `gh aw status`. An uncompiled agentic workflow means the `.lock.yml` is stale and may not reflect the `.md` source. Any change to `security-guard.md` will not take effect until recompiled, creating a silent gap in the security review gate.

#### 4. CI Doctor workflow is consistently failing (0% over 6 runs)
The `ci-doctor` workflow — which monitors for failures across all other workflows — has a 0% success rate in recent runs. This means the automated failure-detection watchdog is itself broken, leaving CI issues undetected and unresolved.

---

### 🟡 Medium Priority

#### 5. Container security scan is path-gated — misses logic changes
`container-scan.yml` only triggers when `containers/**` files change. A change to `src/squid-config.ts` or `src/docker-manager.ts` that modifies the generated container configuration will not trigger a fresh Trivy scan on that PR. The scan runs weekly on a schedule, but not in PR context for source changes.

#### 6. No shell script linting (shellcheck)
The repository contains critical shell scripts (`containers/agent/setup-iptables.sh`, `containers/agent/entrypoint.sh`, `containers/squid/entrypoint.sh`, all `scripts/ci/*.sh`) that implement the core iptables rules and security hardening. None are validated by `shellcheck` in CI. Shell script bugs in iptables setup could silently break the firewall.

#### 7. Coverage thresholds need significant upward revision
Current thresholds (38% lines, 30% branches) are appropriate for a bootstrap stage, but given the security-critical nature of this codebase they should be progressively raised. There is no automated PR ratchet preventing coverage from declining further.

#### 8. Smoke tests have significant reliability issues
Three of five smoke test workflows show failures or instability on recent PRs:
- **Smoke Gemini**: 0% pass rate (consistently failing)
- **Smoke Chroot**: 0% pass rate on recent PRs
- **Smoke Codex**: 50% pass rate

Flaky smoke tests train developers to ignore CI failures ("it's probably just flaky"), which reduces the signal value of the entire CI system.

#### 9. No dist/ artifact size monitoring
There is no check on the size of the compiled `dist/` output or packaged binaries. A dependency accidentally bundled or a large file added to `dist/` would pass all current checks. For a security tool distributed as pre-built binaries (4 platform targets), artifact size is a meaningful integrity signal.

---

### 🟢 Low Priority

#### 10. No mutation testing
The test suite validates behavior, but there is no mutation testing (e.g., Stryker) to verify that tests would actually catch regressions. This is especially relevant given the low branch coverage in `docker-manager.ts` — tests may pass even when logic is wrong.

#### 11. No code complexity / maintainability gate
There is no cyclomatic complexity check or maintainability index. `docker-manager.ts` at 250+ lines with 4% function coverage and complex branching logic is a candidate for this.

#### 12. Coverage badge not published
The `test-coverage.yml` uploads reports to artifacts and posts PR comments, but no coverage badge is published to the README or an external service (Codecov, Coveralls). This reduces visibility of coverage health for contributors.

#### 13. `test-integration.yml` is misnamed
The file named `test-integration.yml` actually runs only a TypeScript type check — the same check that runs in `build.yml` and elsewhere. This creates confusion about what "integration" means in this repository's CI and makes it harder for contributors to understand the test matrix.

---

## 📋 Actionable Recommendations

| Gap | Recommended Solution | Complexity | Impact |
|---|---|---|---|
| **#1** Integration tests not on PRs | Add `test-integration-core.yml` running `blocked-domains`, `network-security`, `credential-hiding`, `exit-code-propagation`, `dns-servers` on PRs | Medium | 🔴 High |
| **#2** Low coverage on critical files | Set a 90-day coverage improvement plan: raise thresholds to 50/40/50 within 90 days; track `docker-manager.ts` & `cli.ts` specifically | Low | 🔴 High |
| **#3** Uncompiled workflows | Run `gh aw compile security-guard build-test-java && npx tsx scripts/ci/postprocess-smoke-workflows.ts` and add a CI check that verifies all `.lock.yml` files are up-to-date | Low | 🔴 High |
| **#4** CI Doctor always failing | Investigate and fix the `ci-doctor` workflow; it is the monitoring system for all other workflows | Medium | 🔴 High |
| **#5** Path-gated container scan | Add `src/**` to the `container-scan.yml` path triggers so source changes also trigger a fresh Trivy scan | Low | 🟡 Medium |
| **#6** No shellcheck | Add a `shellcheck` step (using `ludeeus/action-shellcheck`) in `build.yml` or a dedicated `shell-lint.yml` covering `containers/**/*.sh` and `scripts/**/*.sh` | Low | 🟡 Medium |
| **#7** Low coverage thresholds | Add a coverage ratchet script that reads the current coverage and updates thresholds upward automatically | Medium | 🟡 Medium |
| **#8** Flaky smoke tests | Add retry logic or mark smoke tests as `continue-on-error: true` with explicit flakiness tracking; investigate root causes for Gemini/Chroot failures | Medium | 🟡 Medium |
| **#9** No artifact size check | Add a step in `build.yml` that fails if `dist/` exceeds a configured size threshold | Low | 🟢 Low |
| **#13** Misnamed workflow | Rename `test-integration.yml` to `type-check.yml` and update all references | Low | 🟢 Low |

---

## 📈 Metrics Summary

| Metric | Value |
|---|---|
| Total workflows | 43 (29 agentic `.md` + 14 standard `.yml`) |
| Workflows running on PRs | ~25 |
| Standard PR workflow pass rate (recent) | 100% |
| Agentic smoke test pass rate (recent PRs) | ~60% (3/5 reliable) |
| CI Doctor health | 0% (broken watchdog) |
| Unit test coverage — statements | 38.39% (threshold: 38%) |
| Unit test coverage — branches | 31.78% (threshold: 30%) |
| Unit test coverage — `docker-manager.ts` | **18%** |
| Unit test coverage — `cli.ts` | **0%** |
| Integration test files | 26 total; ~4 suites on PRs (chroot only) |
| Shell scripts without linting | ~12 files |
| Uncompiled agentic workflows | 2 (`security-guard`, `build-test-java`) |

The repository has an excellent foundation with diverse quality gates, real end-to-end smoke tests, AI-powered security review, and per-PR coverage comparison. The primary gaps are: running the broader integration test suite on PRs, improving unit test coverage for the critical `docker-manager.ts` and `cli.ts` files, fixing the broken CI Doctor, and compiling the two stale agentic workflows.

---

> **Note:** This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
>
> **Tip:** Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.




> Generated by [CI/CD Pipelines and Integration Tests Gap Assessment](https://github.com/github/gh-aw-firewall/actions/runs/22243246837)
> - [x] expires  on Feb 27, 2026, 10:19 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #990

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Integration test suite is not triggered on PRs (only chroot tests are)

2. Unit test coverage is critically low for the most important files

3. `security-guard.md` and `build-test-java.md` are not compiled

4. CI Doctor workflow is consistently failing (0% over 6 runs)

🟡 Medium Priority

5. Container security scan is path-gated — misses logic changes

6. No shell script linting (shellcheck)

7. Coverage thresholds need significant upward revision

8. Smoke tests have significant reliability issues

9. No dist/ artifact size monitoring

🟢 Low Priority

10. No mutation testing

11. No code complexity / maintainability gate

12. Coverage badge not published

13. `test-integration.yml` is misnamed

📋 Actionable Recommendations

📈 Metrics Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Category	Count	Trigger
Standard PR quality checks	10	`pull_request`
Agentic build-test integrations	8	`pull_request`
Agentic smoke tests (AI engines)	5	`pull_request` + schedule
Security scans	3	`pull_request` + schedule
Documentation / release / maintenance	17	push / schedule / dispatch

Check	Workflow	What It Validates
PR Title Format	`pr-title.yml`	Conventional commits (type, lowercase, allowed scopes)
Build Verification	`build.yml`	TypeScript compiles on Node 20 & 22 (matrix)
Linting	`lint.yml`	ESLint rules
TypeScript Type Check	`test-integration.yml`¹	`tsc --noEmit` strict check
Test Coverage	`test-coverage.yml`	Unit tests + coverage regression gate vs base branch
CodeQL	`codeql.yml`	Static analysis for JS/TS and Actions
Dependency Audit	`dependency-audit.yml`	`npm audit --audit-level=high` on main + docs packages
Examples Test	`test-examples.yml`	Runs 4 real `awf` shell examples end-to-end
Action Self-Test	`test-action.yml`	Tests the setup action (4 scenarios incl. invalid version)
Chroot Integration	`test-chroot.yml`	4 parallel chroot test suites (languages, pkg managers, procfs, edge cases)
Agentic Security Guard	`security-guard.lock.yml`	AI-powered PR review for security-weakening changes (Claude)
Agentic Build Tests	8 `build-test-*.lock.yml`	Real build+test of downstream projects using each runtime through the firewall
Smoke Tests	5 `smoke-*.lock.yml`	Full AI agent runs (Claude, Codex, Copilot, Gemini, Chroot)
Container Scan	`container-scan.yml`²	Trivy CRITICAL/HIGH CVE scan of agent + squid images

File	Lines	Branches	Functions	Risk
`docker-manager.ts` (250 lines — core logic)	18%	22%	4%	🔴 Critical
`cli.ts` (69 lines — entry point)	0%	0%	0%	🔴 Critical
`domain-patterns.ts`	Unknown	Unknown	Unknown	⚠️

Gap	Recommended Solution	Complexity	Impact
#1 Integration tests not on PRs	Add `test-integration-core.yml` running `blocked-domains`, `network-security`, `credential-hiding`, `exit-code-propagation`, `dns-servers` on PRs	Medium	🔴 High
#2 Low coverage on critical files	Set a 90-day coverage improvement plan: raise thresholds to 50/40/50 within 90 days; track `docker-manager.ts` & `cli.ts` specifically	Low	🔴 High
#3 Uncompiled workflows	Run `gh aw compile security-guard build-test-java && npx tsx scripts/ci/postprocess-smoke-workflows.ts` and add a CI check that verifies all `.lock.yml` files are up-to-date	Low	🔴 High
#4 CI Doctor always failing	Investigate and fix the `ci-doctor` workflow; it is the monitoring system for all other workflows	Medium	🔴 High
#5 Path-gated container scan	Add `src/**` to the `container-scan.yml` path triggers so source changes also trigger a fresh Trivy scan	Low	🟡 Medium
#6 No shellcheck	Add a `shellcheck` step (using `ludeeus/action-shellcheck`) in `build.yml` or a dedicated `shell-lint.yml` covering `containers/*/.sh` and `scripts/*/.sh`	Low	🟡 Medium
#7 Low coverage thresholds	Add a coverage ratchet script that reads the current coverage and updates thresholds upward automatically	Medium	🟡 Medium
#8 Flaky smoke tests	Add retry logic or mark smoke tests as `continue-on-error: true` with explicit flakiness tracking; investigate root causes for Gemini/Chroot failures	Medium	🟡 Medium
#9 No artifact size check	Add a step in `build.yml` that fails if `dist/` exceeds a configured size threshold	Low	🟢 Low
#13 Misnamed workflow	Rename `test-integration.yml` to `type-check.yml` and update all references	Low	🟢 Low

Metric	Value
Total workflows	43 (29 agentic `.md` + 14 standard `.yml`)
Workflows running on PRs	~25
Standard PR workflow pass rate (recent)	100%
Agentic smoke test pass rate (recent PRs)	~60% (3/5 reliable)
CI Doctor health	0% (broken watchdog)
Unit test coverage — statements	38.39% (threshold: 38%)
Unit test coverage — branches	31.78% (threshold: 30%)
Unit test coverage — `docker-manager.ts`	18%
Unit test coverage — `cli.ts`	0%
Integration test files	26 total; ~4 suites on PRs (chroot only)
Shell scripts without linting	~12 files
Uncompiled agentic workflows	2 (`security-guard`, `build-test-java`)

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #990

Description

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Integration test suite is not triggered on PRs (only chroot tests are)

2. Unit test coverage is critically low for the most important files

3. security-guard.md and build-test-java.md are not compiled

4. CI Doctor workflow is consistently failing (0% over 6 runs)

🟡 Medium Priority

5. Container security scan is path-gated — misses logic changes

6. No shell script linting (shellcheck)

7. Coverage thresholds need significant upward revision

8. Smoke tests have significant reliability issues

9. No dist/ artifact size monitoring

🟢 Low Priority

10. No mutation testing

11. No code complexity / maintainability gate

12. Coverage badge not published

13. test-integration.yml is misnamed

📋 Actionable Recommendations

📈 Metrics Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3. `security-guard.md` and `build-test-java.md` are not compiled

13. `test-integration.yml` is misnamed