feat: Use path info to complete partial Skaldleita results (#127) by deucebucket · Pull Request #157 · deucebucket/library-manager

deucebucket · 2026-02-16T06:25:08Z

Summary

Adds _complete_result_from_path() in layer_audio_id.py to fill in truncated author names, titles, and missing series info using the file path
Only completes when the path version starts with the SL version (never replaces longer results with shorter fragments)
Small confidence boost (+5%) when path corroborates SL results
Integrated at three points: SL requeue, SL full ID, and AI fallback

Examples

SL returns "James S. A" + path has "James S. A. Corey" → uses full path version
SL returns no series + path has "The Expanse 01" → extracts series info
SL returns "The Way of Kings" + path has "Way of" → keeps longer SL version (safety check)

Test plan

Test with chaos library books that have truncated SL results
Verify author/title completion only fires on starts-with match
Verify series extraction from path patterns
Confirm confidence scoring adjustments are reasonable
Run test-naming-issues.py (281/281 pass)
ruff check --select=F821 clean

When Skaldleita returns a truncated name (e.g., "James S. A" instead of "James S. A. Corey"), the file path often contains the full name. This adds _complete_result_from_path() which: - Completes truncated author names when path has a longer version that starts with the SL result - Completes truncated titles using the same starts-with logic - Extracts series info from path components when SL returned none - Gives a small confidence boost when path corroborates SL results - Never replaces a longer SL result with a shorter path fragment - Requires minimum 4 characters to avoid false matches on trivial prefixes Applied at three points in the pipeline: 1. SL requeue with partial ID 2. SL full identification after sanity check 3. AI fallback after sanity check

bucket-agent · 2026-02-16T06:28:01Z

🔍 Vibe Check Review

Context

This PR adds a new function _complete_result_from_path() that completes truncated author/title names and extracts missing series information from filesystem paths when Skaldleita returns partial results.

Codebase Patterns I Verified

Pattern checks performed:

Type hints: ✅ Existing function _validate_ai_result_against_path() uses -> Dict - new function matches
Import style: ✅ Moving import re to top-level imports is CORRECT - 18 other files do this, nested imports are not the pattern
Error handling: ✅ except (ValueError, TypeError): is used in 8 places across 4 files - this is the established pattern
Logging: ✅ Uses logger.info() pattern consistent with the file (36 instances in this file)
Return early pattern: ✅ Lines 43-44 match existing defensive patterns

✅ What's Good

Excellent defensive programming - Checks for None/empty at function entry (lines 43-44)
Safe confidence boost - Caps confidence at 0.95/95 to prevent overconfidence (lines 139, 141)
Proper import cleanup - Moving import re to top is correct (nested import at line 149 was the anti-pattern)
Thorough docstring - Clear explanation of purpose, args, and behavior
Minimal scope - Only completes truncated results, never replaces longer values (line 64 check)
Logging visibility - All completions are logged for debugging (lines 73, 81, 118)
Integration points - Called at 3 appropriate locations (after SL results, after validation, after AI)

🚨 Issues Found

Severity	Location	Exact Code Quote	Issue	Fix
MEDIUM	layer_audio_id.py:100-123	`for part in parts:`	Iterates ALL path parts instead of filtering to relevant ones	Change to `for part in parts[-4:-1]:` to check only the 3 parent dirs above the book folder
LOW	layer_audio_id.py:115-117	`if (len(potential_series) >= 3`	Multi-line conditional hard to read	Minor - acceptable as-is, but could be clearer with intermediate variable

Detailed explanation of MEDIUM issue:

The series extraction logic at line 100 iterates through ALL path components:

parts = path_obj.parts
# Look at path components between author and book for series folders
for part in parts:

This will check /, home, library, Author, Series, Book - most of which are irrelevant. The typical structure is library/Author/SeriesOrTitle/BookTitle, so only the last 3-4 components matter. Checking system paths like /home/library wastes cycles and could theoretically match spurious patterns.

Suggested fix:

# Check only the relevant parent directories (skip filesystem root and deep ancestors)
# Typical: [..., 'library', 'Author', 'Series', 'BookTitle']
for part in parts[-4:-1]:  # Check up to 3 parent dirs above book folder

📋 Scope Verification

Issue	Problem	Addressed?	Notes
#127	Use path info to complete truncated author names	✅	Lines 71-76 handle author completion
#127	Complete truncated titles	✅	Lines 79-84 handle title completion
#127	Extract missing series info from path	✅	Lines 86-123 extract series from path patterns

Scope Status: SCOPE_OK

All requirements from Issue #127 are addressed. The PR completes truncated names and extracts series info from paths exactly as requested.

📝 Documentation Check

CHANGELOG.md: ❌ Missing - This is a feat: PR and should document the new path completion feature
README.md: N/A - Internal pipeline improvement, no user-facing UI changes

Suggested CHANGELOG entry:

## [Version] - Date

### Added

- **Issue #127: Path-based completion for partial results** - When Skaldleita returns truncated names (e.g., "James S. A" instead of "James S. A. Corey"), the system now uses folder path information to complete the full name. Also extracts series information from path structure when missing from audio identification.

🎯 Verdict

REQUEST_CHANGES - The code is SAFE and FUNCTIONAL, but needs:

REQUIRED: Update CHANGELOG.md (feat: PRs must document changes)
RECOMMENDED: Optimize series extraction to check only relevant path components (line 100)

The MEDIUM issue (inefficient path iteration) is a performance concern, not a correctness issue. The code will work correctly but will waste cycles checking irrelevant path components. Given this is a pipeline optimization feature, the performance inefficiency should be addressed.

Strengths:

Excellent defensive programming
Safe confidence handling
Clear logging
Well-integrated at appropriate call points
Matches all codebase patterns

The code quality is high. Fix the CHANGELOG and consider the path optimization, then this is ready to ship.

bucket-agent · 2026-02-16T07:23:44Z

🔍 Vibe Check Review

Context

This PR adds functionality to complete truncated Skaldleita results using folder path information (e.g., "James S. A" → "James S. A. Corey"), extract missing series info from path patterns, and boost confidence when path corroborates API results.

Codebase Patterns I Verified

Logging: Uses logger = logging.getLogger(__name__) pattern (line 42)
Import placement: re module imported at top-level (line 17), removed from nested function (line 31 deletion in diff)
Type hints: Used on public functions with -> Dict return types (e.g., line 126, 191)
Error handling: Uses specific exception types with try/except (ValueError, TypeError) patterns (seen in existing code at line 464-466)
Dict safety: Uses .get() with defaults like result.get('author', '') or '' pattern (line 93)
Regex patterns: Already used extensively in codebase (imported in multiple modules)

✅ Good

Excellent defensive programming: Checks for None/empty values before processing (if not result or not folder_hint: return result)
Safe prefix matching: 4-character minimum prevents false matches on trivial prefixes (line 86)
Conservative approach: Only completes truncated names, never replaces longer SL results with shorter path fragments
Type hint consistency: Uses -> Dict return type matching codebase patterns
Proper import placement: Moved import re to top-level (line 17) instead of nested import (removed line 31 in diff)
Integration points: Called at three strategic locations (SL requeue, bookdb result, AI result) without disrupting existing flow
Confidence boost logic: Handles both string ('low'/'medium'/'high') and numeric confidence values safely with try/except
Comprehensive path parsing: Checks multiple parent directories for series patterns, handles various formats
Safe regex: Patterns are well-formed, use re.IGNORECASE for flexibility

🚨 Issues Found

Severity	File:Line	Exact Code Quote	Issue	Fix
MEDIUM	layer_audio_id.py:149-151	`if isinstance(raw_conf, str): if raw_conf == 'low': result['confidence'] = 'medium'`	Confidence boost logic may not properly handle all SL confidence values. Existing code at line 463 shows SL can return "float, string, or garbage" but this only checks for string 'low'/'medium'/'high' - no handling for numeric strings like "0.7"	Add fallback: convert string numbers to float first, or handle ValueError more broadly
LOW	layer_audio_id.py:136-138	`if (len(potential_series) >= 3 and potential_series.lower() != sl_title.lower() and potential_series.lower() != (result.get('author') or '').lower()):`	Magic number `3` for minimum series name length has no explanation	Add comment: `# Minimum length to avoid false matches like "The"`

📋 Scope Verification

Issue	Problem	Addressed?	Notes
#127	Use folder path to complete truncated author/title names from Skaldleita	✅	Lines 92-106 handle author/title completion with 4-char minimum prefix match
#127	Extract series info from path when missing from audio ID	✅	Lines 110-144 extract series from path patterns like "Series Name 01"
#127	Handle truncated titles	✅	Lines 100-106 handle title completion using same logic as author

Scope Status: SCOPE_OK - All requirements from Issue #127 are fully addressed.

📝 Documentation Check

CHANGELOG.md: ✅ Updated (lines 9-18, clear description with examples)
README.md: N/A (Internal pipeline enhancement, not user-facing config change)

🎯 Verdict

REQUEST_CHANGES - Fix the confidence handling to prevent potential issues:

MEDIUM issue: The confidence boost logic at lines 149-165 needs to handle numeric strings that SL might return. The existing code at line 463 already documents that SL can return "float, string, or garbage" but the string handling only checks for literal 'low'/'medium'/'high'. Add handling for numeric strings like "0.7" or add better fallback.

Suggested fix for line 149-165:

try:
    if isinstance(raw_conf, str):
        # Try to parse as numeric first
        try:
            numeric_conf = float(raw_conf)
            result['confidence'] = min(0.95, numeric_conf + 0.05) if numeric_conf <= 1 else min(95, numeric_conf + 5)
        except ValueError:
            # String confidence levels - bump up one tier
            if raw_conf == 'low':
                result['confidence'] = 'medium'
            elif raw_conf == 'medium':
                result['confidence'] = 'high'
            # 'high' stays high
    elif isinstance(raw_conf, (int, float)):
        # Numeric confidence - small boost (5%) for path agreement, cap at 0.95
        if raw_conf <= 1:
            result['confidence'] = min(0.95, raw_conf + 0.05)
        else:
            result['confidence'] = min(95, raw_conf + 5)
except (ValueError, TypeError):
    pass  # Leave confidence unchanged if we can't parse it

LOW issue: Add explanatory comment for magic number at line 136.

…ents

bucket-agent · 2026-02-16T07:28:09Z

🔍 Vibe Check Review

Context

PR #157 adds path-based completion for truncated Skaldleita results. When Skaldleita returns partial names like "James S. A", the system uses folder path information to complete them to "James S. A. Corey". Also extracts series info from path structure when missing.

Codebase Patterns I Verified

Logging: Uses logger = logging.getLogger(__name__) pattern (line 42) ✅
Error handling: Project uses try/except Exception with logging, not bare except ✅
Type hints: Used on function signatures (-> Dict, -> bool) ✅
Imports: Module-level imports at top, with local import re previously in nested function (being fixed) ✅
Path handling: Uses pathlib.Path consistently throughout the file ✅
Confidence handling: Existing code handles both string ('low'/'medium'/'high') and numeric (0-100 or 0-1.0) confidence formats ✅

✅ Good

Safety-first approach: 4-character minimum prefix match prevents false positives on trivial matches
Conservative logic: Only completes (never replaces longer results with shorter path fragments)
Defensive null handling: Proper checks for None and empty strings throughout (or '' pattern)
Maintains existing confidence formats: Handles both string and numeric confidence values with try/except
Good documentation: Clear docstrings explain the function purpose and edge cases
Issue tracking: All integration points reference Issue [FEATURE] Use path info to complete partial Skaldleita results #127 for traceability
Import cleanup: Moves import re from nested function to top-level (proper Python style)
Path safety: Uses pathlib.Path which handles platform differences automatically
Selective application: Only called after results exist, checks sanity_failed flag before applying

🚨 Issues Found

Severity	File:Line	Exact Code Quote	Issue	Fix
MEDIUM	layer_audio_id.py:147	`for part in parts[-4:-1]:`	Hardcoded path depth could miss series in deeper/shallower structures. No validation that parts has enough elements.	Add length check: `if len(parts) >= 4: for part in parts[-4:-1]:` or use safer slicing `parts[max(0, len(parts)-4):-1]`
LOW	layer_audio_id.py:162-164	`if (len(potential_series) >= 3 and potential_series.lower() != sl_title.lower() and potential_series.lower() != (result.get('author') or '').lower())`	Magic number `3` for minimum series name length is undocumented	Add comment explaining why 3 chars, or define constant `MIN_SERIES_NAME_LENGTH = 3`

📋 Scope Verification

Issue	Problem	Addressed?	Notes
#127	Use folder path info to complete truncated Skaldleita results (e.g., "James S. A" → "James S. A. Corey")	✅	Fully implemented with `_is_truncated_version()` checking 4-char minimum prefix match
#127	Extract series info from path when missing	✅	Extracts from parent directory patterns like "Series Name 01" or "Series Name - Book 01"

Scope Status: SCOPE_OK - PR fully addresses the original problem and scope expansions.

📝 Documentation Check

CHANGELOG.md: ✅ Updated with version 0.9.0-beta.127 and clear description
README.md: ✅ Updated version badge to 0.9.0-beta.127

🎯 Verdict

APPROVE with minor suggestions

The code is safe to ship. The two issues flagged are:

MEDIUM: Path depth check - While unlikely to crash (Python's negative indexing is forgiving), adding an explicit length check is defensive programming best practice
LOW: Magic number documentation - Nice-to-have for maintainability

Neither issue is blocking. The core logic is sound:

Safe prefix matching with 4-char minimum
Conservative "only complete, never replace" approach
Proper null/empty handling
Confidence boost logic handles existing string/numeric formats
Integration points are well-placed (after Skaldleita, after AI, checking sanity_failed)

Recommendation: Merge as-is. The MEDIUM issue can be addressed in a follow-up if needed, but the negative slicing is Python-safe and unlikely to cause issues in practice.

bucket-agent · 2026-02-16T07:58:04Z

🔍 Vibe Check Review

Context

PR #157 adds path-based completion to handle truncated Skaldleita results by using folder path information to complete partial author names, titles, and extract missing series information.

Codebase Patterns I Verified

Error Handling:

Existing code uses try/except (ValueError, TypeError) for type conversions (line 466 of layer_audio_id.py)
Silent failures with pass are acceptable for non-critical operations (e.g., language detection, path validation)
Functions return early on None/empty inputs rather than raising exceptions

Import Patterns:

Top-level imports at module start (lines 14-20)
The diff correctly moves import re from nested function (line 149) to top-level (line 17)
This follows the existing pattern and improves performance

Validation Patterns:

Minimum length checks (4 chars for authors in validation.py line 264)
Case-insensitive string comparisons with .lower()
Null/empty guards at function entry

Logging:

Uses logger.info() for successful operations
Uses logger.warning() for validation failures
Consistent [TAG] prefixes for categorization

✅ Good

Safe prefix matching - Requires minimum 4-char match to avoid false positives like "The" matching "Therapy"
Never replaces longer with shorter - Only completes truncated data, never overwrites complete data
Defensive null checks - Guards against None/empty inputs at function entry
Import optimization - Moves import re from nested function to module-level (performance improvement)
Comprehensive documentation - Clear docstrings explaining the purpose and edge cases
Confidence boost logic - Handles both string ('low', 'medium', 'high') and numeric (0-1 or 0-100) confidence formats
Proper error handling - try/except (ValueError, TypeError) for confidence parsing with silent fallback
Series extraction safeguards - Multiple validation checks:
- Minimum 3-char series name to avoid "The", "No"
- Doesn't match title or author
- Checks last 3-4 path components only (not filesystem root)
Integration points - Applied at three key locations:
- After Skaldleita requeue results
- After successful Skaldleita identification
- After AI fallback (only if sanity check passed)
CHANGELOG updated - Comprehensive entry describing the feature

🚨 Issues Found

Severity	Location	Issue	Fix
MEDIUM	layer_audio_id.py:116-130	No validation that `path_author` or `path_title` aren't garbage before using them to complete results. If folder is named "Unknown - Audiobook", this would replace valid SL data with garbage.	Add validation: `if path_author and not is_placeholder_author(path_author) and _is_truncated_version(...)`
LOW	layer_audio_id.py:139	`Path(book_path)` could raise exception if `book_path` is invalid. While wrapped in `if book_path`, malformed paths could still crash.	Wrap in try/except or add path validation
LOW	layer_audio_id.py:145-168	Complex nested logic (two regex patterns, multi-level conditionals) makes the series extraction fragile and hard to test.	Consider extracting series logic into a separate `_extract_series_from_path()` helper function

📋 Scope Verification

Issue	Problem	Addressed?	Notes
#127	Use folder path to complete truncated author names from Skaldleita	✅	Lines 116-122 handle author completion with 4-char minimum prefix match
#127	Complete truncated titles	✅	Lines 124-130 handle title completion with same safety checks
#127	Extract series info from path when missing	✅	Lines 134-168 extract series from parent directory patterns like "Series Name 01"

Scope Status: SCOPE_OK

All aspects of issue #127 are addressed. The implementation is conservative (only completes, never replaces) and includes multiple safety checks.

📝 Documentation Check

CHANGELOG.md: ✅ Updated (lines 10-17) - Comprehensive description of feature
README.md: ✅ Updated (version bump to 0.9.0-beta.127)

🎯 Verdict

APPROVE (with recommendations)

The implementation is solid and addresses issue #127 comprehensively. The code follows existing patterns and includes appropriate safety checks.

Recommendations for Future Improvement:

Add garbage validation to prevent edge cases where placeholder folder names could pollute results:

# Complete author if truncated AND path_author is valid
sl_author = result.get('author', '') or ''
if (sl_author and path_author
    and not is_placeholder_author(path_author)  # <- Add this check
    and _is_truncated_version(sl_author, path_author)):

Wrap Path() in try/except to handle malformed paths gracefully:

try:
    path_obj = Path(book_path) if book_path else None
except (ValueError, OSError):
    path_obj = None

Consider refactoring series extraction into separate function for testability and clarity.

These are quality improvements, not blockers. The current implementation is safe because:

The 4-char minimum prevents most garbage matches
It only completes (never replaces longer data with shorter)
Integration at 3 points ensures all audio identification paths are covered
The import re is correctly moved to module level (performance improvement)

Merge Status: ✅ READY TO MERGE

Address vibe-check review: limit path iteration scope, add CHANGELOG

ddd1db0

deucebucket added 2 commits February 16, 2026 01:24

Bump version to beta.127

2565924

Address vibe-check review: handle numeric string confidence, add comm…

d9380a6

…ents

Merge develop, resolve version conflicts to beta.127

378b044

deucebucket merged commit 78065aa into develop Feb 16, 2026
3 checks passed

deucebucket deleted the feature/issue-127-path-completion branch February 16, 2026 07:54

deucebucket mentioned this pull request Feb 16, 2026

[FEATURE] Use path info to complete partial Skaldleita results #127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat: Use path info to complete partial Skaldleita results (#127)#157

feat: Use path info to complete partial Skaldleita results (#127)#157
deucebucket merged 5 commits intodevelopfrom
feature/issue-127-path-completion

deucebucket commented Feb 16, 2026

Uh oh!

bucket-agent bot commented Feb 16, 2026

Uh oh!

bucket-agent bot commented Feb 16, 2026

Uh oh!

bucket-agent bot commented Feb 16, 2026

Uh oh!

Uh oh!

bucket-agent bot commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

deucebucket commented Feb 16, 2026

Summary

Examples

Test plan

Uh oh!

bucket-agent bot commented Feb 16, 2026

🔍 Vibe Check Review

Context

Codebase Patterns I Verified

✅ What's Good

🚨 Issues Found

📋 Scope Verification

📝 Documentation Check

🎯 Verdict

Uh oh!

bucket-agent bot commented Feb 16, 2026

🔍 Vibe Check Review

Context

Codebase Patterns I Verified

✅ Good

🚨 Issues Found

📋 Scope Verification

📝 Documentation Check

🎯 Verdict

Uh oh!

bucket-agent bot commented Feb 16, 2026

🔍 Vibe Check Review

Context

Codebase Patterns I Verified

✅ Good

🚨 Issues Found

📋 Scope Verification

📝 Documentation Check

🎯 Verdict

Uh oh!

Uh oh!

bucket-agent bot commented Feb 16, 2026

🔍 Vibe Check Review

Context

Codebase Patterns I Verified

✅ Good

🚨 Issues Found

📋 Scope Verification

📝 Documentation Check

🎯 Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant