feat: Use path info to complete partial Skaldleita results (#127)#157
feat: Use path info to complete partial Skaldleita results (#127)#157deucebucket merged 5 commits intodevelopfrom
Conversation
When Skaldleita returns a truncated name (e.g., "James S. A" instead of "James S. A. Corey"), the file path often contains the full name. This adds _complete_result_from_path() which: - Completes truncated author names when path has a longer version that starts with the SL result - Completes truncated titles using the same starts-with logic - Extracts series info from path components when SL returned none - Gives a small confidence boost when path corroborates SL results - Never replaces a longer SL result with a shorter path fragment - Requires minimum 4 characters to avoid false matches on trivial prefixes Applied at three points in the pipeline: 1. SL requeue with partial ID 2. SL full identification after sanity check 3. AI fallback after sanity check
🔍 Vibe Check ReviewContextThis PR adds a new function Codebase Patterns I VerifiedPattern checks performed:
✅ What's Good
🚨 Issues Found
Detailed explanation of MEDIUM issue: The series extraction logic at line 100 iterates through ALL path components: parts = path_obj.parts
# Look at path components between author and book for series folders
for part in parts:This will check Suggested fix: # Check only the relevant parent directories (skip filesystem root and deep ancestors)
# Typical: [..., 'library', 'Author', 'Series', 'BookTitle']
for part in parts[-4:-1]: # Check up to 3 parent dirs above book folder📋 Scope Verification
Scope Status: SCOPE_OK All requirements from Issue #127 are addressed. The PR completes truncated names and extracts series info from paths exactly as requested. 📝 Documentation Check
Suggested CHANGELOG entry: ## [Version] - Date
### Added
- **Issue #127: Path-based completion for partial results** - When Skaldleita returns truncated names (e.g., "James S. A" instead of "James S. A. Corey"), the system now uses folder path information to complete the full name. Also extracts series information from path structure when missing from audio identification.🎯 VerdictREQUEST_CHANGES - The code is SAFE and FUNCTIONAL, but needs:
The MEDIUM issue (inefficient path iteration) is a performance concern, not a correctness issue. The code will work correctly but will waste cycles checking irrelevant path components. Given this is a pipeline optimization feature, the performance inefficiency should be addressed. Strengths:
The code quality is high. Fix the CHANGELOG and consider the path optimization, then this is ready to ship. |
🔍 Vibe Check ReviewContextThis PR adds functionality to complete truncated Skaldleita results using folder path information (e.g., "James S. A" → "James S. A. Corey"), extract missing series info from path patterns, and boost confidence when path corroborates API results. Codebase Patterns I Verified
✅ Good
🚨 Issues Found
📋 Scope Verification
Scope Status: SCOPE_OK - All requirements from Issue #127 are fully addressed. 📝 Documentation Check
🎯 VerdictREQUEST_CHANGES - Fix the confidence handling to prevent potential issues:
Suggested fix for line 149-165: try:
if isinstance(raw_conf, str):
# Try to parse as numeric first
try:
numeric_conf = float(raw_conf)
result['confidence'] = min(0.95, numeric_conf + 0.05) if numeric_conf <= 1 else min(95, numeric_conf + 5)
except ValueError:
# String confidence levels - bump up one tier
if raw_conf == 'low':
result['confidence'] = 'medium'
elif raw_conf == 'medium':
result['confidence'] = 'high'
# 'high' stays high
elif isinstance(raw_conf, (int, float)):
# Numeric confidence - small boost (5%) for path agreement, cap at 0.95
if raw_conf <= 1:
result['confidence'] = min(0.95, raw_conf + 0.05)
else:
result['confidence'] = min(95, raw_conf + 5)
except (ValueError, TypeError):
pass # Leave confidence unchanged if we can't parse it
|
🔍 Vibe Check ReviewContextPR #157 adds path-based completion for truncated Skaldleita results. When Skaldleita returns partial names like "James S. A", the system uses folder path information to complete them to "James S. A. Corey". Also extracts series info from path structure when missing. Codebase Patterns I Verified
✅ Good
🚨 Issues Found
📋 Scope Verification
Scope Status: SCOPE_OK - PR fully addresses the original problem and scope expansions. 📝 Documentation Check
🎯 VerdictAPPROVE with minor suggestions The code is safe to ship. The two issues flagged are:
Neither issue is blocking. The core logic is sound:
Recommendation: Merge as-is. The MEDIUM issue can be addressed in a follow-up if needed, but the negative slicing is Python-safe and unlikely to cause issues in practice. |
🔍 Vibe Check ReviewContextPR #157 adds path-based completion to handle truncated Skaldleita results by using folder path information to complete partial author names, titles, and extract missing series information. Codebase Patterns I VerifiedError Handling:
Import Patterns:
Validation Patterns:
Logging:
✅ Good
🚨 Issues Found
📋 Scope Verification
Scope Status: SCOPE_OK All aspects of issue #127 are addressed. The implementation is conservative (only completes, never replaces) and includes multiple safety checks. 📝 Documentation Check
🎯 VerdictAPPROVE (with recommendations) The implementation is solid and addresses issue #127 comprehensively. The code follows existing patterns and includes appropriate safety checks. Recommendations for Future Improvement:
# Complete author if truncated AND path_author is valid
sl_author = result.get('author', '') or ''
if (sl_author and path_author
and not is_placeholder_author(path_author) # <- Add this check
and _is_truncated_version(sl_author, path_author)):
try:
path_obj = Path(book_path) if book_path else None
except (ValueError, OSError):
path_obj = None
These are quality improvements, not blockers. The current implementation is safe because:
Merge Status: ✅ READY TO MERGE |
Summary
_complete_result_from_path()inlayer_audio_id.pyto fill in truncated author names, titles, and missing series info using the file pathExamples
"James S. A"+ path has"James S. A. Corey"→ uses full path version"The Expanse 01"→ extracts series info"The Way of Kings"+ path has"Way of"→ keeps longer SL version (safety check)Test plan
test-naming-issues.py(281/281 pass)ruff check --select=F821clean