feat: multi-dimensional quality scorer for structured outputs by lokrim · Pull Request #14 · Mint-Claw/content-split

lokrim · 2026-02-28T09:24:35Z

Summary

This pull request implements a high-performance, multi-dimensional quality scoring engine for structured submissions (JSON, Markdown, Code, Plain Text), addressing the requirements outlined in Issue #1.

Architecture

scoring.py: A pure Python implementation with zero external dependencies. It contains the core functional logic for evaluating content against the five specified dimensions. It also includes a robust detect_format() heuristics engine that utilizes Regex density checks to accurately parse and identify incoming strings.
test_scoring.py: A unified testing suite that houses both the unittest validations and the high-volume performance benchmarks.

Scoring Dimensions

Dimension	Weight	Measurement Focus
Completeness	`0.30`	Structural depth, baseline length, header count, and dictionary keys.
Format Compliance	`0.20`	Adherence to format-specific syntax conventions (e.g., valid JSON, standard Markdown spacing, code indentation).
Coverage	`0.25`	Topic breadth and unique vocabulary density.
Clarity	`0.15`	Readability, pacing, line lengths, and logical whitespace usage.
Validity	`0.10`	Detection of imbalanced brackets/quotes, trailing spaces, and structural anomalies.

Output Format

The engine returns a structured JSON response matching the requested schema:

{
  "weighted_score": 0.7926,
  "quality_rating": "B",
  "scores": {
    "completeness": 0.442,
    "format_compliance": 1.0,
    "coverage": 0.9,
    "clarity": 0.9,
    "validity": 1.0
  },
  "feedback": [
    "Detected format: JSON",
    "Completeness: Submission is too brief or lacks expected structural elements.",
    "Clarity: Clear, readable structure with appropriate spacing.",
    "Coverage: Excellent vocabulary range denoting good topic coverage.",
    "Submission meets the required quality baseline."
  ],
  "pass_threshold": true,
  "format_detected": "json"
}

Bonus Implementation: NLP Feedback Generation

I have implemented the generate_nlp_feedback() function to dynamically produce natural-language context strings. These strings are mapped specifically to the exact score threshold of the evaluated dimensions to provide clear, actionable feedback for the end-user.

Performance Benchmarking

The implementation exceeds the performance requirements:

Target: 100 submissions in <10s
Actual: 104 submissions processed in 0.0023 seconds (~0.02ms per submission)
Relying strictly on the Python standard library ensures zero cold-start latency.

Test Coverage

The included test_scoring.py suite covers:

Format detection against varied heuristics (4 tests).
JSON scoring with corrupted string fallbacks (5 samples).
Markdown scoring handling lists and links (5 samples).
Code scoring testing comment and bracket validation (5 samples).
Text scoring checking repetitive vocabulary scenarios (5 samples).
Strict verification that all aggregate scores adhere to the float >= 0.0 and <= 1.0 boundaries.
Verification that the predefined weights sum precisely to 1.0.

Design Decisions

Zero external dependencies: Heuristics are extracted using the Python re standard library rather than relying on heavy data science packages like NumPy or Sklearn. This keeps the module extremely lightweight and easily deployable to any environment.
Modular Functional Design: Each dimension is completely isolated into its own dedicated function (e.g., score_completeness(), score_clarity()). This allows new rules or platforms to be smoothly integrated in the future without risking regressions in overlapping business logic.

Closes #1

feat: multi-dimensional quality scorer for structured outputs

2dbf4fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-dimensional quality scorer for structured outputs#14

feat: multi-dimensional quality scorer for structured outputs#14
lokrim wants to merge 1 commit intoMint-Claw:mainfrom
lokrim:feat/quality-scorer

lokrim commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lokrim commented Feb 28, 2026

Summary

Architecture

Scoring Dimensions

Output Format

Bonus Implementation: NLP Feedback Generation

Performance Benchmarking

Test Coverage

Design Decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant