This is the official repository of the paper: DiffProb: Data Pruning for Face Recognition (accepted at FG 2025)

Pruned Dataset Files

You can request access to the files containing the indexes of the kept samples for each pruning strategy applied in this work here. Please share your name, affiliation, and official email in the request form.

Results

Impact of Face Data Pruning

Generalizability Across Different Losses

Generalizability Across Different Network Architectures

Impact of Data Cleaning

Download CASIA-WebFace

You can download the CASIA-WebFace dataset here.

How to Run?

Run train_everything.py to train the original model (set config.is_original_train=True in config/config.py), whose predictions will be used to perform the pruning (in the paper, ResNet-50 + CosFace loss). This script will automatically generate the files necessary to perform DynUnc pruning

DynUnc

Run coreset_dynunc.py to generate the kept sample list for the selected pruning percentage
Run label_mapping.py if you want to confirm that the number of ids has not been altered (this step is not mandatory)
Run train_everything.py under the desired settings

Rand

Note: keep in mind that Rand can be applied before performing step 1 2. Run coreset_rand.py to generate the kept sample list for the selected pruning percentage 3. Run label_mapping.py if you want to confirm that the number of ids has not been altered (this step is not mandatory) 4. Run train_everything.py under the desired settings

DiffProb (ours)

Run eval_trainset.py to generate the ground truth prediction of the pre-trained FR model for each sample

Without Cleaning

Run eval_simprobs.py to generate the kept sample list for the selected pruning percentage
Run label_mapping.py if you want to confirm that the number of ids has not been altered (this step is not mandatory)
Run train_everything.py under the desired settings

With Cleaning

Run clean_trainset.py to apply our auxiliary cleaning mechanism and generate the kept sample list for the selected pruning percentage
Run generate_label_dict.py to generate a dictionary associating each identity (class label) with the indexes of its samples
Run label_mapping.py if you want to confirm that the new number of ids and to generate a label map, as some identities might be eliminated (this step is mandatory)
Run train_everything.py under the desired settings

IJB-C Evaluation

Run eval_ijbc.py to perform IJB-C evaluation

Citation

If you use any of the code, pruned datasets or models provided in this repository, please cite the following paper:

@misc{caldeira2025diffprobdatapruningface,
      title={DiffProb: Data Pruning for Face Recognition}, 
      author={Eduarda Caldeira and Jan Niklas Kolf and Naser Damer and Fadi Boutros},
      year={2025},
      eprint={2505.15272},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.15272}, 
}

License

This project is licensed under the terms of the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. 
Copyright (c) 2025 Fraunhofer Institute for Computer Graphics Research IGD Darmstadt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

This is the official repository of the paper: DiffProb: Data Pruning for Face Recognition (accepted at FG 2025)

Pruned Dataset Files

Results

Impact of Face Data Pruning

Generalizability Across Different Losses

Generalizability Across Different Network Architectures

Impact of Data Cleaning

Download CASIA-WebFace

How to Run?

DynUnc

Rand

DiffProb (ours)

IJB-C Evaluation

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
backbones		backbones
config		config
eval		eval
tables		tables
utils		utils
.gitignore		.gitignore
README.md		README.md
clean_trainset.py		clean_trainset.py
coreset_dynunc.py		coreset_dynunc.py
coreset_rand.py		coreset_rand.py
eval_ijbc.py		eval_ijbc.py
eval_simprobs.py		eval_simprobs.py
eval_simprobs_clean.py		eval_simprobs_clean.py
eval_trainset.py		eval_trainset.py
generate_label_dict.py		generate_label_dict.py
label_mapping.py		label_mapping.py
requirement.txt		requirement.txt
train_everything.py		train_everything.py

EduardaCaldeira/DiffProb

Folders and files

Latest commit

History

Repository files navigation

This is the official repository of the paper: DiffProb: Data Pruning for Face Recognition (accepted at FG 2025)

Pruned Dataset Files

Results

Impact of Face Data Pruning

Generalizability Across Different Losses

Generalizability Across Different Network Architectures

Impact of Data Cleaning

Download CASIA-WebFace

How to Run?

DynUnc

Rand

DiffProb (ours)

IJB-C Evaluation

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages