Fair Data Pruning

Code for the paper on robust data pruning.

Quick Setup

Requires Python 3+.

Create a conda environment: conda env create -f environment.yml,
Activate the environment: conda activate environment.

Usage

The project implements both active learning (AL, --strategy 0) and data pruning (DP, --strategy 1). The command line flag --auto_config fills in the appropriate hyperparameters based on the model specification and is recommended. The general flow of an experiment is as follows:

Trains a query model (possibly across multiple initializations) and retrieves sample scores;
Acquires (for AL) or deletes (for DP) samples based on scores and other factors (e.g., class-wise quotas);
Potentially repeats steps 1-2 across multiple iterations (--iterations, common for AL);
Once the ultimate dataset is determined, trains the final model and saves its metrics in a json format.

Examples

Here are a few simple usage examples. The commands should be executed from a parent directory of the project's folder.

Prune 30% of CIFAR-10 using VGG-16 and EL2N scorer:
python -m fair-data-pruning.main --auto_config --use_gpu --strategy 1 --final_frac 0.7 --model_name VGG16 --scorer_name EL2N
Randomly prune 30% of CIFAR-10 using VGG-16 and MetriQ class-wise ratios with query retrained 5 times:
python -m fair-data-pruning.main --auto_config --use_gpu --strategy 1 --final_frac 0.7 --model_name VGG16 --scorer_name Random --quoter_name MetriQ --num_inits 5
Prune 30% of CIFAR-10 using VGG-16 and Forgetting, and train the final model with a cost-sensitive optimization algorithm CDB-W :
python -m fair-data-pruning.main --auto_config --use_gpu --cdbw_final --strategy 1 --final_frac 0.7 --model_name VGG16 --scorer_name Random

Cite us

Coming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
diversifiers		diversifiers
models		models
quoters		quoters
schedulers		schedulers
scorers		scorers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
check.py		check.py
config.py		config.py
environment.yml		environment.yml
globals.py		globals.py
main.py		main.py
metrics.py		metrics.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fair Data Pruning

Quick Setup

Usage

Examples

Cite us

About

Releases

Packages

Languages

License

avysogorets/fair-data-pruning

Folders and files

Latest commit

History

Repository files navigation

Fair Data Pruning

Quick Setup

Usage

Examples

Cite us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages