Add comprehensive hybrid retrieval (BM25 + Vector) evaluation system for RAG pipeline optimization #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR transforms the repository into a complete hybrid retrieval showcase for RAG systems, enabling quantifiable performance claims for technical interviews and recruitment.
🚀 What's New
Hybrid Retrieval Implementation
rank-bm25library with local corpus persistencealphaparameter (hybrid_score = α × norm_vector + (1-α) × norm_bm25)ChunkResultdataclass standardizing all retrieval methodsAdvanced Evaluation Framework
relevant_substringsfor objective relevance matchingProduction-Ready Architecture
📊 Sample Output
🛠 Technical Highlights
Score Normalization & Combination
Implements min-max normalization within candidate sets to ensure fair hybrid scoring:
Flexible Evaluation
--alpha,--top-k,--eval-fileEnhanced Documentation
docs/hybrid_vs_vector.md: Deep dive into hybrid approach, scoring formulas, and tuning guidancedocs/evaluation.md: Comprehensive guide for building evaluation sets and interpreting metrics🔧 Usage Examples
📁 New Files & Structure
🎯 Business Impact
This implementation enables demonstrable, quantifiable retrieval optimization with:
Perfect for showcasing advanced RAG engineering skills in technical interviews and demonstrating systematic approach to ML system optimization.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.