Repository for CS 838 (Spring 2017) Data Science project
-
Updated
Apr 1, 2017 - Jupyter Notebook
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Repository for CS 838 (Spring 2017) Data Science project
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
Weka Comparator to match rules to test data with filtering abilites
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
A maximum-strength name parser for record linkage.
A collection of awesome resources regarding Record Linkage.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
Unstructured Record Linkage using Siamese Networks and Large Language Models (LLMs) such as LLAMA3 and ChatGPT-4o.
🔎 Finds fuzzy matches between datasets
Service for automatic matching two data sets without mapping
This projects aims to provide lists containing only great movies to users based only a gew filters and search parameters.
Crawl, matching and explore data about jobs in Viet Nam.
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Welcome to Snowman App – a Data Matching Benchmark Platform.
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
Created by Halbert L. Dunn
Released 1946