feat: Folder triage for smart pre-processing (#110)#149
Conversation
Categorize folder names as clean/messy/garbage to adjust processing strategy. Messy/garbage folders skip path-derived hints and rely on audio metadata and AI instead. - New module: library_manager/folder_triage.py - Regex patterns for scene tags, torrent markers, hash names, etc. - Integrated into scanning, AI identification, Whisper hints, and queue - Stored per-book in database, exposed via API
🔍 Vibe Check ReviewContextPR #149 implements folder triage (Issue #110 Part 2) - a new system to categorize folder names as clean/messy/garbage and adjust processing strategy accordingly. Clean folders use path hints normally, messy folders skip path parsing, and garbage folders get confidence penalties. Codebase Patterns I VerifiedError Handling: Database migrations use bare Type Hints: Mixed usage - newer modules like Logging: Uses module-level Database Changes: Schema updates follow established pattern - add column with try/except, default value, inline comment explaining purpose. The folder_triage column addition matches this exactly. ✅ Good
🚨 Issues Found
📋 Scope Verification
Scope Status: SCOPE_PARTIAL - This PR correctly implements Part 2 of the 3-part feature as described in the issue. 📝 Documentation Check
🎯 VerdictREQUEST_CHANGES Required fixes:
Recommended (not blocking): Why REQUEST_CHANGES: The type hints issue is significant because this is new code that will establish patterns for future maintenance. The parentheses regex could cause false positives on common audiobook naming conventions, potentially degrading the quality of AI hints for legitimately clean folders. |
- Add type hints to all folder_triage.py functions - Narrow parentheses regex to avoid false positives on legitimate names like "Foundation (Book 1)" - now only matches known messy keywords (narrator, unabridged, rip, scene, kbps) - Remove redundant 'or clean' fallback since DB DEFAULT handles it
🔍 Vibe Check ReviewContextPR #149 implements Issue #110 Part 2: Folder triage system that categorizes folder names as clean/messy/garbage to control whether path-derived hints should be trusted during audiobook identification. Codebase Patterns I Verified
✅ Good
🚨 Issues Found
📋 Scope Verification
Scope Status: SCOPE_OK 📝 Documentation Check
🎯 VerdictAPPROVE This is clean, well-integrated code that matches existing codebase patterns:
No changes required. Ship it. |
🔍 Vibe Check ReviewContextPR #149 implements Part 2 of Issue #110 - adds folder triage system that categorizes folder names as clean/messy/garbage and uses this to decide whether to trust path-derived hints during book identification. Codebase Patterns I Verified
✅ What's Good
🚨 Issues FoundNONE - This PR is clean and ready to ship. 📋 Scope Verification
Original Problem: "Implement a smart pre-processing pipeline that validates audiobook files, triages folders, and applies corrections automatically." Part 2 (Folder Triage): ✅ COMPLETE
Part 3 (Push corrections): Scope Status: SCOPE_OK - PR fully addresses Part 2 as specified 📝 Documentation Check
🎯 VerdictAPPROVE This is excellent work. The implementation is:
No changes requested. Ship it! 🚀 |
Summary
Changes
library_manager/folder_triage.py— regex pattern matching with compiled patternsapp.py— scanning stores triage result, AI skips unreliable folder names, Whisper skips bad hintslibrary_manager/database.py—folder_triagecolumn migrationlibrary_manager/pipeline/layer_ai_queue.py— triage-aware prompt buildingTest plan
Closes #110 (Part 2)