face-dataset-cleaner

An implementation to clean large scale public face dataset

This is an unofficial implementation based on the paper A Community Detection Approach to Cleaning Extremely Large Face Database

To do the experiment, first prepare your face-dataset and LFW embedding files using a pre-trained face recognition network.

Use the lfw_far_thresholding.py to determine the similarity threshold between different face images.

Then run the dataset_adjacency_build.py to save the image pair similarity information in csv files, which will then be used in dataset_cleaner.py to build the graphs and do small community cleaning.

A small tool is provided to move original images to a separate folder according to the clean data list.

A first version of cleaned VGGFace2 training and testing image lists can be downloaded at Google Drive

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
dataset_adjacency_build.py		dataset_adjacency_build.py
dataset_cleaner.py		dataset_cleaner.py
lfw_far_thresholding.py		lfw_far_thresholding.py
move_file.py		move_file.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

face-dataset-cleaner

About

Releases

Packages

Contributors 2

Languages

jimbojumbo/face-dataset-cleaner

Folders and files

Latest commit

History

Repository files navigation

face-dataset-cleaner

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages