Implementation of semi-supervised learning techniques: UDA, MixMatch, Mean-teacher, focusing on NLP.
Notes:
- Instead of
mixup
in the original paper, I use Manifold Mixup , which is better suited for NLP application.
-
Any
encoder
can be used: transformer, LSTM, etc. The default is LSTMWeightDrop, used in AWD-LSTM, inspired byfast.ai
-v1. -
Since this repo is mainly concerned with exploring SSL techniques, using Transformer can be overkill. It could dominate the progress made by SSL, not to mention long training time.
There're many data augmentation techniques in Computer Vision, not so much in NLP. It's an open research into strong data augmentation in NLP. So far, what I found effectively is back-translation
, confirmed by UDA paper. There're many ways to perform back-translation, one simple way is to use MarianMT, shipped in the excellent huggingface-transformers
.
-
Some data augmentation techniques I would like to explore
-
TF-IDF word replacement
-
Sentence permutation
-
Nearest neighbor sentence replacement
@article{xie2019unsupervised,
title={Unsupervised Data Augmentation for Consistency Training},
author={Xie, Qizhe and Dai, Zihang and Hovy, Eduard and Luong, Minh-Thang and Le, Quoc V},
journal={arXiv preprint arXiv:1904.12848},
year={2019}
}
@article{berthelot2019mixmatch,
title={MixMatch: A Holistic Approach to Semi-Supervised Learning},
author={Berthelot, David and Carlini, Nicholas and Goodfellow, Ian and Papernot, Nicolas and Oliver, Avital and Raffel, Colin},
journal={arXiv preprint arXiv:1905.02249},
year={2019}
}