Duplicate Information Detection

Project Description

This project aims to detect the duplicates among the data rows provided which comprise of reported distress calls for disaster-victims, and helps out a human eye to pick out highly similar rows via classifying the rows by attributes of similarity and providing the similarity rates between rows considering the name and address information.

At this point, the similarity analysis in this project mainly utilizes term frequency–inverse document frequency (TF-IDF) measure and meticulous preprocessing, and it works. The preprocessing phase can also utilize a well-functioning named-entity recognition (NER) model.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
api		api
db		db
experiments		experiments
img		img
models		models
reference_data		reference_data
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
auth.py		auth.py
config.py		config.py
expressions.py		expressions.py
main.py		main.py
name_address_clustering.py		name_address_clustering.py
preprocess_funcs.py		preprocess_funcs.py
preprocessing.py		preprocessing.py
replacements.py		replacements.py
requirements.txt		requirements.txt
similarity.py		similarity.py
tasks.py		tasks.py
tfidf.ipynb		tfidf.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duplicate Information Detection

Project Description

High-Level Process Flow (Tentative)

High-Level Preprocess Flow (Tentative)

About

Releases

Packages

Contributors 6

Languages

License

acikyazilimagi/duplicate-info-detection

Folders and files

Latest commit

History

Repository files navigation

Duplicate Information Detection

Project Description

High-Level Process Flow (Tentative)

High-Level Preprocess Flow (Tentative)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages