LLM output contract testing CLI. Catch prompt regressions before they reach production.
When you update a prompt or switch models, your downstream code can break silently. schemalock gives you a test suite for LLM outputs - define what your pipeline must return, run it against any model, and get a clear pass/fail with cost tracking.
# 1. Install globally
npm install -g schemalock
# 2. Set your API key (stored in ~/.schemalock/.env)
schemalock config set ANTHROPIC_API_KEY sk-ant-...
# 3. Define a contract
schemalock define invoice-extractor \
--prompt prompts/invoice-extractor.txt \
--must-contain "total_amount,currency,date" \
--cases cases/invoice-cases.json
# 4. Run tests
schemalock test invoice-extractor --model claude-sonnet-4-6
# 5. Compare models before switching
schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractorCreate or update a contract.
schemalock define sentiment-classifier \
--prompt prompts/sentiment.txt \
--format json \
--must-contain "sentiment,confidence,reasoning" \
--cases cases/sentiment-cases.json \
--description "Classifies customer review sentiment"| Flag | Description |
|---|---|
--prompt <file> |
System prompt file |
--format |
json (default), text, or markdown |
--must-contain <fields> |
Comma-separated required JSON fields |
--must-not-contain <phrases> |
Comma-separated banned phrases (for text output) |
--schema <file> |
JSON Schema file for strict validation |
--cases <file> |
JSON file with test cases |
--description <text> |
Human-readable description |
--overwrite |
Replace an existing contract |
Run the contract against a model. Exits 0 on pass, 1 on fail (CI-friendly).
schemalock test invoice-extractor --model claude-sonnet-4-6
schemalock test invoice-extractor --model gpt-4o --threshold 0.9
schemalock test invoice-extractor --model llama-3.3-70b-versatile # Groq
schemalock test invoice-extractor --model ollama/llama3.2 # local
schemalock test invoice-extractor --output json # machine-readable| Flag | Default | Description |
|---|---|---|
--model |
claude-sonnet-4-6 |
Model to test |
--threshold |
0.8 |
Min pass rate (0.0-1.0) for exit code 0 |
--max-tokens |
1024 |
Max output tokens per call |
--output |
console |
console or json |
--cases <file> |
- | Override test cases (JSON file) |
--prompt <file> |
- | Override system prompt file |
--base-url <url> |
- | Custom OpenAI-compatible endpoint (Ollama, Groq, LM Studio) |
--api-key <key> |
- | API key (defaults to env var for the chosen model) |
--delay <ms> |
0 |
Delay between API calls in ms (avoids rate limits) |
Find regressions before switching models. Runs the full test suite on both models and shows where they disagree.
schemalock diff claude-sonnet-4-6 gpt-4o --contract invoice-extractor
schemalock diff claude-sonnet-4-6 llama-3.3-70b-versatile --contract invoice-extractorOutput:
Comparison
claude-sonnet-4-6 gpt-4o
──────────────────────────────────────────────────────
Pass Rate 100% 80%
Avg Latency 1230ms 890ms
Total Cost $0.0041 $0.0028
Disagreements (1/5 cases differ):
european-invoice claude-sonnet-4-6=PASS gpt-4o=FAIL
claude-sonnet-4-6 leads by 20 percentage points. Consider regression risk before switching to gpt-4o.
| Flag | Default | Description |
|---|---|---|
--contract <name> |
required | Contract to test against |
--cases <file> |
- | Override test cases |
--prompt <file> |
- | Override system prompt |
--max-tokens |
1024 |
Max output tokens per call |
--base-url <url> |
- | Custom OpenAI-compatible endpoint |
--api-key <key> |
- | API key override |
--delay <ms> |
0 |
Delay between API calls in ms |
List all your contracts with last run status.
schemalock list
schemalock list --models # show available models + pricingView test run history for a contract.
schemalock report invoice-extractor # last 5 runs
schemalock report invoice-extractor --last 20 # last 20 runs
schemalock report invoice-extractor --run 7 # full case detail for run #7Delete a contract and optionally its test history.
schemalock delete invoice-extractor # prompts for confirmation
schemalock delete invoice-extractor --yes # skip prompt (CI/scripts)
schemalock delete invoice-extractor --keep-history # remove contract, keep run data| Flag | Description |
|---|---|
--yes |
Skip confirmation prompt (safe for CI/scripts) |
--keep-history |
Keep test run history in the database |
Manage API keys and settings.
schemalock config set ANTHROPIC_API_KEY sk-ant-...
schemalock config set OPENAI_API_KEY sk-...
schemalock config set GROQ_API_KEY gsk_...
schemalock config get ANTHROPIC_API_KEY
schemalock config delete ANTHROPIC_API_KEY
schemalock config list-keys # show all stored key names (values masked)
schemalock config update-pricing # write ~/.schemalock/models.json pricing template
schemalock config env # show active paths and env var overridesKeys are stored in ~/.schemalock/.env - they persist across projects and terminals.
[
{
"id": "simple-invoice",
"input": "Invoice from Acme Corp. Date: Jan 15 2024. Total: $100 USD",
"expected": {
"total_amount": 100,
"currency": "USD",
"vendor_name": "Acme Corp"
}
}
]id- unique identifier (shown in test output)input- the user message sent to the LLMexpected- optional key/value pairs that must match the parsed output
| Model | Input $/1M | Output $/1M |
|---|---|---|
claude-sonnet-4-6 |
$3.00 | $15.00 |
claude-opus-4-6 |
$5.00 | $25.00 |
claude-haiku-4-5 |
$1.00 | $5.00 |
GPT-5 (latest)
| Model | Input $/1M | Output $/1M |
|---|---|---|
gpt-5 |
$1.25 | $10.00 |
gpt-5-mini |
$0.25 | $2.00 |
gpt-5-nano |
$0.05 | $0.40 |
GPT-4.1
| Model | Input $/1M | Output $/1M |
|---|---|---|
gpt-4.1 |
$2.00 | $8.00 |
gpt-4.1-mini |
$0.40 | $1.60 |
gpt-4.1-nano |
$0.10 | $0.40 |
GPT-4o (previous generation)
| Model | Input $/1M | Output $/1M |
|---|---|---|
gpt-4o |
$2.50 | $10.00 |
gpt-4o-mini |
$0.15 | $0.60 |
o-series reasoning
| Model | Input $/1M | Output $/1M |
|---|---|---|
o3 |
$2.00 | $8.00 |
o4-mini |
$1.10 | $4.40 |
o3-mini |
$1.10 | $4.40 |
o1 |
$15.00 | $60.00 |
| Model | Input $/1M | Output $/1M |
|---|---|---|
meta-llama/llama-4-scout-17b-16e-instruct |
$0.11 | $0.34 |
llama-3.3-70b-versatile |
$0.59 | $0.79 |
llama-3.1-8b-instant |
$0.05 | $0.08 |
mixtral-8x7b-32768 |
$0.24 | $0.24 |
gemma2-9b-it |
$0.20 | $0.20 |
| Model | Input $/1M | Output $/1M |
|---|---|---|
mistral-large-latest |
$0.50 | $1.50 |
mistral-medium-latest |
$0.40 | $2.00 |
codestral-latest |
$0.30 | $0.90 |
mistral-small-latest |
$0.10 | $0.30 |
| Model | Input $/1M | Output $/1M |
|---|---|---|
gemini-2.5-pro |
$1.25 | $10.00 |
gemini-2.5-flash |
$0.30 | $2.50 |
gemini-2.0-flash |
$0.10 | $0.40 |
gemini-2.0-flash-lite |
$0.075 | $0.30 |
Requires GOOGLE_API_KEY from Google AI Studio.
| Model | Notes |
|---|---|
ollama/llama4 |
Requires ollama serve running locally |
ollama/llama3.3 |
Requires ollama serve running locally |
ollama/llama3.2 |
Requires ollama serve running locally |
ollama/mistral |
Requires ollama serve running locally |
ollama/phi4 |
Requires ollama serve running locally |
ollama/qwen2.5 |
Requires ollama serve running locally |
Any model served by Ollama works with --model ollama/<model-name>.
Any OpenAI-compatible endpoint (LM Studio, Together AI, Fireworks AI, vLLM, etc.):
schemalock test my-contract \
--model meta-llama/Meta-Llama-3-70B-Instruct \
--base-url https://api.together.xyz/v1 \
--api-key $TOGETHER_API_KEYRun schemalock list --models to see all built-in models with current pricing.
# .github/workflows/test-prompts.yml
- name: Run schemalock contract tests
run: |
npx schemalock test invoice-extractor \
--model claude-sonnet-4-6 \
--threshold 0.9 \
--yes \
--output json > schemalock-results.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}The --output json flag produces machine-readable output:
{
"runId": 12,
"contract": "invoice-extractor",
"model": "claude-sonnet-4-6",
"passRate": 0.95,
"passedCount": 19,
"total": 20,
"passed": true,
"totalCostUsd": 0.0041,
"avgLatencyMs": 1230,
"cases": [...]
}| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
Anthropic API key |
OPENAI_API_KEY |
OpenAI API key |
GROQ_API_KEY |
Groq API key |
MISTRAL_API_KEY |
Mistral API key |
TOGETHER_API_KEY |
Together AI API key |
FIREWORKS_API_KEY |
Fireworks AI API key |
SCHEMALOCK_DB |
Override SQLite DB path (useful for per-project isolation in CI) |
Keys can also be stored persistently with schemalock config set <KEY> <value>.
All data stored locally in ~/.schemalock/:
contracts/- YAML contract definitionsresults.db- SQLite database of all test runs and case results.env- API keys set viaschemalock config setmodels.json- optional pricing overrides (created byschemalock config update-pricing)
src/
cli.js # Commander entry point
commands/
define.js # schemalock define
test.js # schemalock test
diff.js # schemalock diff
list.js # schemalock list
report.js # schemalock report
delete.js # schemalock delete
config.js # schemalock config
core/
runner.js # Anthropic + OpenAI API calls, timeout, client cache
validator.js # JSON Schema + field validation + expected value checks
store.js # SQLite persistence (WAL mode, busy_timeout 30s)
contracts.js # YAML contract load/save/delete
utils/
config.js # ~/.schemalock/ directory + DB path management
models.js # Model registry, pricing, pricing overrides
cases.js # Case ID sanitization, count guards, structure validation
MIT