Skip to content

DUBSOpenHub/shadow-score-spec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Shadow Score Spec

Spec Version License: MIT Conformance Levels

A framework-agnostic metric for measuring AI code generation quality.


The Problem

AI coding tools write code and tests together. The tests always pass β€” but they only validate what the AI thought to check, not what the specification actually requires. There's no independent quality signal.

The Solution

Shadow Score = (sealed_failures / sealed_total) Γ— 100

Generate acceptance tests from the spec before code exists. Hide them from the AI. Build the code. Then run both test suites. The delta is your Shadow Score β€” an adversarial, quantitative measure of implementation quality.

Interpretation Scale

Score Level Meaning
0% βœ… Perfect Implementation tests covered everything
1–15% 🟒 Minor Small blind spots β€” edge cases
16–30% 🟑 Moderate Meaningful gaps in coverage
31–50% 🟠 Significant Major gaps β€” review approach
>50% πŸ”΄ Critical Fundamental quality issues

Why "Shadow Score"?

Shadow Score measures what your AI missed when it couldn't see the tests. Shadow Score sticks because it maps to something developers already know: shadow testing β€” running hidden checks alongside production to catch what the main path misses. That's exactly what sealed-envelope testing does. The AI builds code. Shadow tests β€” written before the code existed and hidden from the builder β€” judge it. Whatever fails is what the AI couldn't see.

0% = no shadows. The implementation covered everything the spec required.
60% = flying blind. The AI missed more than half the acceptance criteria it never knew about.

Quick Start: Add Shadow Score in 5 Minutes

Option A: Use the reference validators

# Python
python validators/shadow-score.py --sealed results-sealed.json --open results-open.json

# Go
cd validators && go build -o shadow-score-go . && cd ..
./validators/shadow-score-go --sealed results-sealed.json --open results-open.json

# Shell (minimal deps)
./validators/shadow-score.sh results-sealed.txt results-open.txt

Option B: Compute it yourself

  1. Write sealed tests from your spec (requirements β†’ test cases, before code exists)
  2. Build the code β€” the implementer never sees the sealed tests
  3. Run both suites β€” sealed tests and the implementer's own tests
  4. Compute: failed_sealed / total_sealed Γ— 100
  5. Report: Use the JSON schema or markdown format

That's it. Framework, language, and tooling don't matter β€” Shadow Score works anywhere.

The Sealed-Envelope Protocol

Shadow Score is computed using the Sealed-Envelope Protocol β€” a 4-phase testing methodology:

 SPEC ──► SEAL GENERATION ──► IMPLEMENTATION ──► VALIDATION ──► HARDENING
           (tests from spec,     (code + own        (run both      (fix from
            hidden from          tests, never       suites,       failure msgs
            implementer)         sees sealed)       compute gap)   only β€” no
                                                                  test code)

The critical rule: The implementer never sees the sealed tests. During hardening, they receive only failure messages (test name, expected, actual) β€” never the test source code. This forces root-cause fixes, not test-targeting hacks.

Full protocol details: SPEC.md Β§4

Conformance Levels

Level What's Required Use Case
L1 β€” Shadow Score Compute + report Shadow Score Retrofitting onto existing test suites
L2 β€” Sealed Envelope L1 + test isolation + tamper hash AI agent pipelines
L3 β€” Full Protocol L2 + hardening loop + velocity tracking Production autonomous builds

Reference Implementation

The reference Level 3 implementation is Dark Factory β€” an autonomous agentic build system for the GitHub Copilot CLI with sealed-envelope testing.

Worked Examples

Example Shadow Score What Happened
01 β€” Perfect Score 0% βœ… All sealed tests passed
02 β€” Minor Gaps 11.1% 🟒 2 edge cases missed
03 β€” Critical Gaps 60% πŸ”΄ Only happy path tested

Reporting Format

Shadow Reports can be produced in JSON (machine-readable) or Markdown (human-readable).

JSON schema: validators/shadow-report-schema.json

{
  "shadow_score_spec_version": "1.0.0",
  "report": {
    "shadow_score": 11.1,
    "level": "minor"
  },
  "sealed_tests": { "total": 18, "passed": 16, "failed": 2 },
  "failures": [
    {
      "test_name": "test_rejects_gpl_dependency",
      "category": "security",
      "expected": "exit code 2",
      "actual": "exit code 0",
      "message": "GPL dependency not blocked"
    }
  ]
}

Full schema: SPEC.md Β§5

Adopters

Project Conformance Description
Dark Factory Level 3 Reference implementation β€” autonomous agentic build system

Using Shadow Score? Open a PR to add your project.

Full Specification

πŸ“„ Read the full spec β†’

Covers: definitions, formula, sealed-envelope protocol, reporting format, conformance levels, worked examples, and FAQ.

Contributing

Shadow Score is an open specification. Contributions welcome:

  • Spec changes: Open an issue to discuss before submitting a PR
  • New validators: PRs for additional language validators (Go, Rust, TypeScript) are welcome
  • Adopter listings: Add your project to the Adopters table

License

MIT Β© 2026 DUBSOpenHub

About

A framework-agnostic metric for measuring AI code generation quality. Sealed-envelope testing protocol + reference validators.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages