More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data

A short introduction

A common heuristic in semi-supervised deep learning (SSDL) is to select unlabelled data based on a notion of semantic similarity to the labelled data. For example, labelled images of numbers should be paired with unlabelled images of numbers instead of, say, unlabelled images of cars. We refer to this practice as semantic data set matching. In this work, we demonstrate the limits of semantic data set matching. We show that it can sometimes even degrade the performance for a state of the art SSDL algorithm. We present and make available a comprehensive simulation sandbox, called non-IID-SSDL, for stress testing an SSDL algorithm under different degrees of distribution mismatch between the labelled and unlabelled data sets. In addition, we demonstrate that simple density based dissimilarity measures in the feature space of a generic classifier offer a promising and more reliable quantitative matching criterion to select unlabelled data before SSDL training.

Data access

If you wish to reproduce any of the experiments data sets are automatically downloaded by the experiment script ood_experiment_at_scale_script.sh for your convenience based on which experiment you choose to run. An overview of the different data sets can be found below. Note that we used the training split of each data set as the basis to construct our own training and test splits for each experimental run. The Gaussian and Salt and Pepper data sets were created with the following parameters: a variance of 10 and mean 0 for the Gaussian noise, and an equal Bernoulli probability for 0 and 255 pixels, in the case of the Salt and Pepper noise.

Code

We provice the experiment script ood_experiment_at_scale_script.sh for your convenience where you can select the types of experiments you would like to run.

Cite as

@misc{calderonramirez2021meets,
  title={More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data}, 
  author={Saul Calderon-Ramirez and Luis Oala},
  year={2021},
  eprint={2104.10223},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

For any questions feel free to open an issue or contact us

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
docs		docs
utilities		utilities
MixMatch_OOD_main.py		MixMatch_OOD_main.py
README.md		README.md
ood_experiment_at_scale_script.sh		ood_experiment_at_scale_script.sh
results_analysis_3.py		results_analysis_3.py
results_analysis_6.py		results_analysis_6.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data

A short introduction

Data access

Code

Cite as

About

Languages

aiaudit-org/non-iid-ssdl

Folders and files

Latest commit

History

Repository files navigation

More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data

A short introduction

Data access

Code

Cite as

About

Resources

Stars

Watchers

Forks

Languages