Integrate improvements from cquest-11rc1 branch #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Integrate improvements from cquest-11rc1 branch
This PR integrates relevant changes from the long-forgotten
cquest-11rc1branch (commits from 2019-2021), with careful refinements to avoid regressions.📝 Summary
The
cquest-11rc1branch contained valuable improvements to synonyms and phonemicization rules that were never merged. After thorough analysis, we've extracted and refined the pertinent changes while preserving the superior cache implementation from main.✨ Changes
1. New Synonyms (14 additions)
Add commonly used street type abbreviations:
cd=> chemin departementalchem=> chemin (additional variant)clef=> cle,clefs=> cles (orthographic variants)dept=> departementgir=> giratoirehabit=> habitation (additional variant)periph=> peripheriqueprl=> parc residentiel de loisirs (official street type)prm=> promenade (additional variant)rd=> route departementalern=> route nationalerdpt=> rond point (additional variant)2. Enhanced Phonemicization Rules
Improved French phonemicization with targeted rules:
New transformations:
vowel+mp+consonant→vowel+n+consonant(e.g., champvallon → chanvalon)ei+gn→ei+nionly after "ei" (e.g., seigneur → senieur)je+vowel→j+vowel(e.g., georges → jorj instead of jeorj)anc$→an(e.g., blanc → blan)yat word beginning →ieim→aim(e.g., pforzheim → pforzaim)ae/ei→econversion in word contextoe/oeu→euhandling (e.g., oeufs → beu)11 new test cases covering improved patterns including:
🔍 What We Didn't Keep
Cache Implementation:
The
cquest-11rc1branch used a simple dictionary cache. We kept our superiorlru_cacheimplementation which:PHONEMICIZE_CACHE_SIZEOriginal gn→ni rule:
The original rule
gn([aeio])→ni\1was too broad and would transform "montagne"→"montani" (regression). Our version only applies after "ei" ((?<=ei)gn([aeiouy])→ni\1), preserving common French patterns while fixing specific cases.✅ Testing
📚 Related
cquest-11rc1branchcquest-11rc1branch can be archived/deleted after this merge🎯 Impact
These changes improve address search quality for: