Skip to content

Comments

Feat/prefilter logic v2 335#339

Open
ZohaibHassan16 wants to merge 2 commits intoHawksight-AI:mainfrom
ZohaibHassan16:feat/prefilter-logic-v2-335
Open

Feat/prefilter logic v2 335#339
ZohaibHassan16 wants to merge 2 commits intoHawksight-AI:mainfrom
ZohaibHassan16:feat/prefilter-logic-v2-335

Conversation

@ZohaibHassan16
Copy link
Collaborator

Description

This PR implements Two-Stage Scoring to fast-fail weak candidate pairs before they reach the computationally expensive multi-factor semantic scoring logic.

Type of Change

       
  • New feature (non-breaking change which adds functionality)
  •    
  • Performance improvement

Related Issues


Changes Made

1. Prefilter Gates

Added _prefilter_pair to evaluate type mismatch, name length-ratios, and token overlaps instantly.

2. Scoring Orchestration

Updated calculate_similarity to respect the prefilter_enabled configuration and bypass string/property/embedding checks if the prefilter fails.

3. Explainability

Added opt-in score_breakdown to the metadata output for better debugging and transparency.


Benchmark Results

V1 Candidate Gen vs. V2 Prefilter

                                                                                                                                                                                                               
MetricV1 (Legacy)V2 (Prefilter)Improvement
Total Suite Runtime11m 11s9m 26s~15% Faster
Hierarchical Clustering124.3s103.8s-20.5s

Testing & Quality Assurance

       
  • Tested locally: Benchmarks verified on local environment.
  •    
  • Package builds: python -m build successful.
  •    
  • Style Guidelines: Code adheres to project standards.
  •    
  • Breaking Changes: None. Prefilter defaults to False to preserve legacy behavior.


Additional Notes

This PR is part of a stacked sequence for the Deduplication v2 Epic. Please note the following for review:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant