schemalock

LLM output contract testing CLI. Catch prompt regressions before they reach production.

When you update a prompt or switch models, your downstream code can break silently. schemalock gives you a test suite for LLM outputs - define what your pipeline must return, run it against any model, and get a clear pass/fail with cost tracking.

Quick Start

# 1. Install globally
npm install -g schemalock

# 2. Set your API key (stored in ~/.schemalock/.env)
schemalock config set ANTHROPIC_API_KEY sk-ant-...

# 3. Define a contract
schemalock define invoice-extractor \
  --prompt prompts/invoice-extractor.txt \
  --must-contain "total_amount,currency,date" \
  --cases cases/invoice-cases.json

# 4. Run tests
schemalock test invoice-extractor --model claude-sonnet-4-6

# 5. Compare models before switching
schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractor

Commands

`schemalock define <name>`

Create or update a contract.

schemalock define sentiment-classifier \
  --prompt prompts/sentiment.txt \
  --format json \
  --must-contain "sentiment,confidence,reasoning" \
  --cases cases/sentiment-cases.json \
  --description "Classifies customer review sentiment"

Flag	Description
`--prompt <file>`	System prompt file
`--format`	`json` (default), `text`, or `markdown`
`--must-contain <fields>`	Comma-separated required JSON fields
`--must-not-contain <phrases>`	Comma-separated banned phrases (for text output)
`--schema <file>`	JSON Schema file for strict validation
`--cases <file>`	JSON file with test cases
`--description <text>`	Human-readable description
`--overwrite`	Replace an existing contract

`schemalock test <name>`

Run the contract against a model. Exits 0 on pass, 1 on fail (CI-friendly).

schemalock test invoice-extractor --model claude-sonnet-4-6
schemalock test invoice-extractor --model gpt-4o --threshold 0.9
schemalock test invoice-extractor --model llama-3.3-70b-versatile  # Groq
schemalock test invoice-extractor --model ollama/llama3.2           # local
schemalock test invoice-extractor --output json                      # machine-readable

Flag	Default	Description
`--model`	`claude-sonnet-4-6`	Model to test
`--threshold`	`0.8`	Min pass rate (0.0-1.0) for exit code 0
`--max-tokens`	`1024`	Max output tokens per call
`--output`	`console`	`console` or `json`
`--cases <file>`	-	Override test cases (JSON file)
`--prompt <file>`	-	Override system prompt file
`--base-url <url>`	-	Custom OpenAI-compatible endpoint (Ollama, Groq, LM Studio)
`--api-key <key>`	-	API key (defaults to env var for the chosen model)
`--delay <ms>`	`0`	Delay between API calls in ms (avoids rate limits)

`schemalock diff <model1> <model2>`

Find regressions before switching models. Runs the full test suite on both models and shows where they disagree.

schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractor
schemalock diff claude-sonnet-4-6 llama-3.3-70b-versatile --contract invoice-extractor

Output:

  Comparison
                 claude-sonnet-4-6        gpt-4o
  ──────────────────────────────────────────────────────
  Pass Rate      100%                     80%
  Avg Latency    1230ms                   890ms
  Total Cost     $0.0041                  $0.0028

  Disagreements (1/5 cases differ):
    european-invoice          claude-sonnet-4-6=PASS  gpt-4o=FAIL

  claude-sonnet-4-6 leads by 20 percentage points. Consider regression risk before switching to gpt-4o.

Flag	Default	Description
`--contract <name>`	required	Contract to test against
`--cases <file>`	-	Override test cases
`--prompt <file>`	-	Override system prompt
`--max-tokens`	`1024`	Max output tokens per call
`--base-url <url>`	-	Custom OpenAI-compatible endpoint
`--api-key <key>`	-	API key override
`--delay <ms>`	`0`	Delay between API calls in ms

`schemalock list`

List all your contracts with last run status.

schemalock list
schemalock list --models    # show available models + pricing

`schemalock report <name>`

View test run history for a contract.

schemalock report invoice-extractor             # last 5 runs
schemalock report invoice-extractor --last 20   # last 20 runs
schemalock report invoice-extractor --run 7     # full case detail for run #7

`schemalock delete <name>`

Delete a contract and optionally its test history.

schemalock delete invoice-extractor             # prompts for confirmation
schemalock delete invoice-extractor --yes       # skip prompt (CI/scripts)
schemalock delete invoice-extractor --keep-history  # remove contract, keep run data

Flag	Description
`--yes`	Skip confirmation prompt (safe for CI/scripts)
`--keep-history`	Keep test run history in the database

`schemalock config`

Manage API keys and settings.

schemalock config set ANTHROPIC_API_KEY sk-ant-...
schemalock config set OPENAI_API_KEY sk-...
schemalock config set GROQ_API_KEY gsk_...
schemalock config get ANTHROPIC_API_KEY
schemalock config delete ANTHROPIC_API_KEY
schemalock config list-keys      # show all stored key names (values masked)
schemalock config update-pricing # write ~/.schemalock/models.json pricing template
schemalock config env            # show active paths and env var overrides

Keys are stored in ~/.schemalock/.env - they persist across projects and terminals.

Test Cases Format

[
  {
    "id": "simple-invoice",
    "input": "Invoice from Acme Corp. Date: Jan 15 2024. Total: $100 USD",
    "expected": {
      "total_amount": 100,
      "currency": "USD",
      "vendor_name": "Acme Corp"
    }
  }
]

id - unique identifier (shown in test output)
input - the user message sent to the LLM
expected - optional key/value pairs that must match the parsed output

Supported Models

Anthropic

Model	Input $/1M	Output $/1M
`claude-sonnet-4-6`	$3.00	$15.00
`claude-opus-4-6`	$5.00	$25.00
`claude-haiku-4-5`	$1.00	$5.00

OpenAI

GPT-5 (latest)

Model	Input $/1M	Output $/1M
`gpt-5`	$1.25	$10.00
`gpt-5-mini`	$0.25	$2.00
`gpt-5-nano`	$0.05	$0.40

GPT-4.1

Model	Input $/1M	Output $/1M
`gpt-4.1`	$2.00	$8.00
`gpt-4.1-mini`	$0.40	$1.60
`gpt-4.1-nano`	$0.10	$0.40

GPT-4o (previous generation)

Model	Input $/1M	Output $/1M
`gpt-4o`	$2.50	$10.00
`gpt-4o-mini`	$0.15	$0.60

o-series reasoning

Model	Input $/1M	Output $/1M
`o3`	$2.00	$8.00
`o4-mini`	$1.10	$4.40
`o3-mini`	$1.10	$4.40
`o1`	$15.00	$60.00

Groq (fast inference)

Model	Input $/1M	Output $/1M
`meta-llama/llama-4-scout-17b-16e-instruct`	$0.11	$0.34
`llama-3.3-70b-versatile`	$0.59	$0.79
`llama-3.1-8b-instant`	$0.05	$0.08
`mixtral-8x7b-32768`	$0.24	$0.24
`gemma2-9b-it`	$0.20	$0.20

Mistral

Model	Input $/1M	Output $/1M
`mistral-large-latest`	$0.50	$1.50
`mistral-medium-latest`	$0.40	$2.00
`codestral-latest`	$0.30	$0.90
`mistral-small-latest`	$0.10	$0.30

Google Gemini

Model	Input $/1M	Output $/1M
`gemini-2.5-pro`	$1.25	$10.00
`gemini-2.5-flash`	$0.30	$2.50
`gemini-2.0-flash`	$0.10	$0.40
`gemini-2.0-flash-lite`	$0.075	$0.30

Requires GOOGLE_API_KEY from Google AI Studio.

Ollama (local, free)

Model	Notes
`ollama/llama4`	Requires `ollama serve` running locally
`ollama/llama3.3`	Requires `ollama serve` running locally
`ollama/llama3.2`	Requires `ollama serve` running locally
`ollama/mistral`	Requires `ollama serve` running locally
`ollama/phi4`	Requires `ollama serve` running locally
`ollama/qwen2.5`	Requires `ollama serve` running locally

Any model served by Ollama works with --model ollama/<model-name>.

Custom / Self-Hosted

Any OpenAI-compatible endpoint (LM Studio, Together AI, Fireworks AI, vLLM, etc.):

schemalock test my-contract \
  --model meta-llama/Meta-Llama-3-70B-Instruct \
  --base-url https://api.together.xyz/v1 \
  --api-key $TOGETHER_API_KEY

Run schemalock list --models to see all built-in models with current pricing.

CI/CD Integration

# .github/workflows/test-prompts.yml
- name: Run schemalock contract tests
  run: |
    npx schemalock test invoice-extractor \
      --model claude-sonnet-4-6 \
      --threshold 0.9 \
      --yes \
      --output json > schemalock-results.json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

The --output json flag produces machine-readable output:

{
  "runId": 12,
  "contract": "invoice-extractor",
  "model": "claude-sonnet-4-6",
  "passRate": 0.95,
  "passedCount": 19,
  "total": 20,
  "passed": true,
  "totalCostUsd": 0.0041,
  "avgLatencyMs": 1230,
  "cases": [...]
}

Environment Variables

Variable	Purpose
`ANTHROPIC_API_KEY`	Anthropic API key
`OPENAI_API_KEY`	OpenAI API key
`GROQ_API_KEY`	Groq API key
`MISTRAL_API_KEY`	Mistral API key
`TOGETHER_API_KEY`	Together AI API key
`FIREWORKS_API_KEY`	Fireworks AI API key
`SCHEMALOCK_DB`	Override SQLite DB path (useful for per-project isolation in CI)

Keys can also be stored persistently with schemalock config set <KEY> <value>.

Data Storage

All data stored locally in ~/.schemalock/:

contracts/ - YAML contract definitions
results.db - SQLite database of all test runs and case results
.env - API keys set via schemalock config set
models.json - optional pricing overrides (created by schemalock config update-pricing)

Architecture

src/
  cli.js                  # Commander entry point
  commands/
    define.js             # schemalock define
    test.js               # schemalock test
    diff.js               # schemalock diff
    list.js               # schemalock list
    report.js             # schemalock report
    delete.js             # schemalock delete
    config.js             # schemalock config
  core/
    runner.js             # Anthropic + OpenAI API calls, timeout, client cache
    validator.js          # JSON Schema + field validation + expected value checks
    store.js              # SQLite persistence (WAL mode, busy_timeout 30s)
    contracts.js          # YAML contract load/save/delete
  utils/
    config.js             # ~/.schemalock/ directory + DB path management
    models.js             # Model registry, pricing, pricing overrides
    cases.js              # Case ID sanitization, count guards, structure validation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
bin		bin
examples		examples
src		src
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

schemalock

Quick Start

Commands

`schemalock define <name>`

`schemalock test <name>`

`schemalock diff <model1> <model2>`

`schemalock list`

`schemalock report <name>`

`schemalock delete <name>`

`schemalock config`

Test Cases Format

Supported Models

Anthropic

OpenAI

Groq (fast inference)

Mistral

Google Gemini

Ollama (local, free)

Custom / Self-Hosted

CI/CD Integration

Environment Variables

Data Storage

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

schemalock

Quick Start

Commands

schemalock define <name>

schemalock test <name>

schemalock diff <model1> <model2>

schemalock list

schemalock report <name>

schemalock delete <name>

schemalock config

Test Cases Format

Supported Models

Anthropic

OpenAI

Groq (fast inference)

Mistral

Google Gemini

Ollama (local, free)

Custom / Self-Hosted

CI/CD Integration

Environment Variables

Data Storage

Architecture

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`schemalock define <name>`

`schemalock test <name>`

`schemalock diff <model1> <model2>`

`schemalock list`

`schemalock report <name>`

`schemalock delete <name>`

`schemalock config`

Packages