feat: two-stage evaluation pipeline with prefilter (#14) by don-petry · Pull Request #21 · Joaolfelicio/context-scribe

don-petry · 2026-04-04T23:41:19Z

Why

Every user interaction is currently sent through full LLM evaluation, even routine code-assistance messages that contain no persistent preferences or rules. This wastes LLM calls and adds unnecessary latency and cost. A lightweight classification step can filter out the majority of non-rule interactions cheaply, targeting a 50%+ reduction in full evaluation calls.

Summary

Adds a lightweight Stage 1 pre-filter in BaseEvaluator that classifies interactions as rule-bearing or not before running full extraction
Non-rule interactions with confidence > 0.8 skip Stage 2 entirely
--skip-prefilter CLI flag to disable Stage 1
Dashboard shows prefilter skip rate metrics
Fixes bool("false")==True bug from PR feat: two-stage evaluation pipeline with prefilter #20 with proper _parse_bool() function
All evaluator subclasses updated to accept and forward **kwargs

Replaces PR #20 (rebuilt cleanly on upstream/main using BaseEvaluator pattern).

Closes #14

Testing evidence (live, Python 3.12 + mcp, Claude CLI)

Check	Result
Full test suite: 84/84 passed	✅ PASS
Live: non-rule interaction skipped by prefilter (5.7s, 1 LLM call)	✅ PASS
Live: rule interaction passed through, extracted correctly (13.2s, 2 LLM calls)	✅ PASS
Live: skip_prefilter=True bypasses Stage 1 (metrics untouched)	✅ PASS
Live: prefilter skip rate = 50% (1 skipped / 2 total)	✅ PASS
Live: extracted rule scope=GLOBAL, content includes snake_case directive	✅ PASS

Live test details

Test 1 (non-rule "help me debug"): prefilter skipped=1, result=None, 5.7s
Test 2 (rule "use snake_case"):    prefilter passed=1, scope=GLOBAL, 13.2s
Test 3 (skip_prefilter=True):      metrics 0/0, full eval only, 12.4s

Issues found and fixed during testing

test_daemons.py: patched evaluator classes no longer in main.py → fixed to patch get_evaluator
test_gemini_cli_llm.py: prefilter adds 2nd subprocess call → fixed with skip_prefilter=True
Dashboard generate_layout: MagicMock metrics caused TypeError → fixed with isinstance guard

🤖 Generated with Claude Code

Adds a lightweight Stage 1 pre-filter to BaseEvaluator that classifies interactions as rule-bearing or not before running full extraction. Non-rule interactions with confidence > 0.8 skip Stage 2, targeting 50%+ reduction in LLM calls. Key design decisions: - Prefilter is implemented in BaseEvaluator, so all evaluators get it - _parse_bool() fixes the bool("false")==True bug from PR Joaolfelicio#20 - Fail-open: prefilter errors pass through to full eval - --skip-prefilter CLI flag to disable Stage 1 - Dashboard shows prefilter skip rate metrics Closes Joaolfelicio#14 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Implements a two-stage evaluation pipeline by adding a lightweight prefilter step in BaseEvaluator to decide whether to skip full rule extraction, plus CLI/dashboard plumbing to expose prefilter behavior.

Changes:

Added PrefilterResult / PrefilterMetrics models and integrated Stage 1 prefiltering into BaseEvaluator.evaluate_interaction().
Added --skip-prefilter flag and surfaced prefilter skipped/total stats in the dashboard footer.
Added a new prefilter prompt template and a new test suite covering prefilter logic and _parse_bool().

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`context_scribe/evaluator/base_evaluator.py`	Adds prefilter stage, metrics tracking, and `_parse_bool()` for LLM JSON booleans.
`context_scribe/models/evaluator_models.py`	Introduces prefilter dataclasses and metrics helpers.
`context_scribe/evaluator/prefilter_template.md`	Provides the Stage 1 prompt template for rule-bearing classification.
`context_scribe/evaluator/__init__.py`	Updates `get_evaluator()` to accept and forward `**kwargs`.
`context_scribe/main.py`	Wires `--skip-prefilter` through to evaluator creation and displays prefilter stats in the dashboard.
`tests/test_prefilter.py`	Adds unit + integration-style tests for prefilter and boolean parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

context_scribe/evaluator/__init__.py

context_scribe/main.py

tests/test_prefilter.py

All evaluator subclasses now accept and forward **kwargs to BaseEvaluator.__init__(), allowing skip_prefilter and future params to propagate correctly via get_evaluator(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

don-petry · 2026-04-05T17:20:14Z

@Joaolfelicio - What do you think about this enhancement idea?

- test_daemons.py: patch get_evaluator instead of removed direct imports - test_gemini_cli_llm.py: use skip_prefilter=True in CLI flag test (prefilter adds a second subprocess.run call) - main.py: guard prefilter metrics sync with isinstance check to prevent TypeError when evaluator is mocked Found via full test suite run with Python 3.12 + mcp installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

context_scribe/evaluator/base_evaluator.py

Address two Copilot review comments: 1. _parse_bool() now returns None for unrecognised/null values instead of defaulting to False. _pre_evaluate() treats None as pass-through to full evaluation, ensuring malformed LLM output doesn't silently skip rule extraction (fail-open behaviour). 2. Template loading in BaseEvaluator.__init__() now uses importlib.resources.files() instead of __file__-relative Path reads, so templates are accessible in packaged installs (wheel/zip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 4, 2026 23:41

Copilot started reviewing on behalf of don-petry April 4, 2026 23:41 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

context_scribe/evaluator/__init__.py Show resolved Hide resolved

context_scribe/main.py Show resolved Hide resolved

tests/test_prefilter.py Show resolved Hide resolved

Copilot AI review requested due to automatic review settings April 5, 2026 17:28

Copilot started reviewing on behalf of don-petry April 5, 2026 17:29 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

context_scribe/evaluator/base_evaluator.py Show resolved Hide resolved

context_scribe/evaluator/base_evaluator.py Outdated Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: two-stage evaluation pipeline with prefilter (#14)#21

feat: two-stage evaluation pipeline with prefilter (#14)#21
don-petry wants to merge 4 commits intoJoaolfelicio:mainfrom
don-petry:feat/prefilter-pipeline

don-petry commented Apr 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

don-petry commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

don-petry commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Testing evidence (live, Python 3.12 + mcp, Claude CLI)

Live test details

Issues found and fixed during testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

don-petry commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

don-petry commented Apr 4, 2026 •

edited

Loading