face-dataset-cleaner

An implementation to clean large scale public face dataset

This is an unofficial implementation based on the paper A Community Detection Approach to Cleaning Extremely Large Face Database

To do the experiment, first prepare your face-dataset and LFW embedding files using a pre-trained face recognition network.

Use the lfw_far_thresholding.py to determine the similarity threshold between different face images.

Then run the dataset_adjacency_build.py to save the image pair similarity information in csv files, which will then be used in dataset_cleaner.py to build the graphs and do small community cleaning.

A small tool is provided to move original images to a separate folder according to the clean data list.

A first version of cleaned VGGFace2 training and testing image lists can be downloaded at Google Drive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

face-dataset-cleaner

Files

README.md

Latest commit

History

README.md

File metadata and controls

face-dataset-cleaner