Skip to content

Utilised an ML approach. Did supervised fuzzy matching. Implemented a TF-IDF approach for feature engineering. And created a Classification Model to return the best matching query.

License

Notifications You must be signed in to change notification settings

PuffBear/ml-fuzzy-match

Repository files navigation

ml-fuzzy-matching

📂 Files

  • eda_cleaning.ipynb: Data loading, eda, cleaning, finalizing column.
  • utils.py
  • data.py: Load and prepare training data (positive + negative pairs).
  • features.py: Extract similarity features from string pairs.
  • model.py: Train and evaluate LR and XGBoost Model. and depending on performances save the better model.
  • evaluate.py: The cli driver for comparison
  • match.py: Inference: take a user query and return best match
  • config.py: Centralized constants (thresholds, paths, etc.).
  • demo.ipynb: notebook to test everything end-to-end and essentially serves as a demo for others to see how to work with this git repo.

💻 Built with

  • Python
  • Scikit-Learn
  • XGBoost

Correct Working Pipeline:

  1. evaluate.py
  2. demo.ipynb

About

Utilised an ML approach. Did supervised fuzzy matching. Implemented a TF-IDF approach for feature engineering. And created a Classification Model to return the best matching query.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published