Skip to content

feat: multi-dimensional quality scorer for structured outputs#14

Open
lokrim wants to merge 1 commit intoMint-Claw:mainfrom
lokrim:feat/quality-scorer
Open

feat: multi-dimensional quality scorer for structured outputs#14
lokrim wants to merge 1 commit intoMint-Claw:mainfrom
lokrim:feat/quality-scorer

Conversation

@lokrim
Copy link

@lokrim lokrim commented Feb 28, 2026

Summary

This pull request implements a high-performance, multi-dimensional quality scoring engine for structured submissions (JSON, Markdown, Code, Plain Text), addressing the requirements outlined in Issue #1.

Architecture

  • scoring.py: A pure Python implementation with zero external dependencies. It contains the core functional logic for evaluating content against the five specified dimensions. It also includes a robust detect_format() heuristics engine that utilizes Regex density checks to accurately parse and identify incoming strings.
  • test_scoring.py: A unified testing suite that houses both the unittest validations and the high-volume performance benchmarks.

Scoring Dimensions

Dimension Weight Measurement Focus
Completeness 0.30 Structural depth, baseline length, header count, and dictionary keys.
Format Compliance 0.20 Adherence to format-specific syntax conventions (e.g., valid JSON, standard Markdown spacing, code indentation).
Coverage 0.25 Topic breadth and unique vocabulary density.
Clarity 0.15 Readability, pacing, line lengths, and logical whitespace usage.
Validity 0.10 Detection of imbalanced brackets/quotes, trailing spaces, and structural anomalies.

Output Format

The engine returns a structured JSON response matching the requested schema:

{
  "weighted_score": 0.7926,
  "quality_rating": "B",
  "scores": {
    "completeness": 0.442,
    "format_compliance": 1.0,
    "coverage": 0.9,
    "clarity": 0.9,
    "validity": 1.0
  },
  "feedback": [
    "Detected format: JSON",
    "Completeness: Submission is too brief or lacks expected structural elements.",
    "Clarity: Clear, readable structure with appropriate spacing.",
    "Coverage: Excellent vocabulary range denoting good topic coverage.",
    "Submission meets the required quality baseline."
  ],
  "pass_threshold": true,
  "format_detected": "json"
}

Bonus Implementation: NLP Feedback Generation

I have implemented the generate_nlp_feedback() function to dynamically produce natural-language context strings. These strings are mapped specifically to the exact score threshold of the evaluated dimensions to provide clear, actionable feedback for the end-user.

Performance Benchmarking

The implementation exceeds the performance requirements:

  • Target: 100 submissions in <10s
  • Actual: 104 submissions processed in 0.0023 seconds (~0.02ms per submission)
  • Relying strictly on the Python standard library ensures zero cold-start latency.

Test Coverage

The included test_scoring.py suite covers:

  • Format detection against varied heuristics (4 tests).
  • JSON scoring with corrupted string fallbacks (5 samples).
  • Markdown scoring handling lists and links (5 samples).
  • Code scoring testing comment and bracket validation (5 samples).
  • Text scoring checking repetitive vocabulary scenarios (5 samples).
  • Strict verification that all aggregate scores adhere to the float >= 0.0 and <= 1.0 boundaries.
  • Verification that the predefined weights sum precisely to 1.0.

Design Decisions

  • Zero external dependencies: Heuristics are extracted using the Python re standard library rather than relying on heavy data science packages like NumPy or Sklearn. This keeps the module extremely lightweight and easily deployable to any environment.
  • Modular Functional Design: Each dimension is completely isolated into its own dedicated function (e.g., score_completeness(), score_clarity()). This allows new rules or platforms to be smoothly integrated in the future without risking regressions in overlapping business logic.

Closes #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BOUNTY $10] Multi-Dimensional Quality Scoring for Structured Outputs

1 participant