This is the code for our paper https://arxiv.org/abs/1801.07175
We are focusing on generating adversarial texts for text classification models. However, our framework can be easily extended to other text models. Basically we search for potential adversarials in the text/character embedding space, and reconstruct the texts by nearest neighbor search. The reconstructed texts are verified against the target model to filter out unsuccessful adversarial samples. Searching for potential adversarials in the embedding space is the same as generating adversarial images.
Our codes are carefully optimized and modularized to make it easy to reuse (hopefully).
These are the packages I used for my experiment.
- Python 3.6.3 https://www.python.org/
- TensorFlow v1.8 https://www.tensorflow.org/
- NLTK v3.2 http://www.nltk.org/, tokenizing.
- gensim https://radimrehurek.com/gensim/, WMD, word2vec interface.
- annoy https://github.com/spotify/annoy, fast approximate nearest neighbor search, much faster than naive nearest neighbor search.
- Numpy http://www.numpy.org/
All python files are backbone codes, while all bash files are instance codes. In bash files, we call corresponding python scripts with appropriate parameters. They all follow certain naming conventions to make the code structure clear and easy to understand. Please refer to READMEs in each sub-folder for doc.
- The core codes (i.e., model training/testing, generating adversarial texts) reside in current direction.
- The adversarial methods reside in attacks/ folder.
- The data pipeline codes reside in data_utils/ folder.
- The other miscellaneous helper codes reside in utils/ folder.
- MODEL.py contains the model definition.
- wordcnn.py defines the word-level model
WordCNN
. - charlstm.py defines character-aware model
CharLSTM
.
- wordcnn.py defines the word-level model
- run_MODEL.py contains model training code. Should have been named
train_MODEL.py.
- run_wordcnn.py trains word-level model.
- run_charlstm.py trains character-aware model.
- eval_MODEL.py contains model evaluating code. This code is almost identify
to the run_MODEL.py code except that it loads the model instead of training
it from scratch. Besides, we only need test data for evaluation.
- eval_wordcnn.py evaluates word-level model.
- eval_charlstm.py evaluates character-aware model.
- MODEL_ADV.py contains the code to generate adversarial samples for model.
For example, wordcnn_fgm.py generates adversarial texts for
WordCNN
model viaFGM
method.
- Codes to train each model on different dataset
- MODEL_DATA.bash for classification problems with more than 2 labels, e.g,
charlstm_reuters5.bash trains
CharLSTM
on Reuters5 dataset. - MODEL_DATA_sigm.bash or MODEL_DATA_tanh.bash for binary classification problems since DeepFool requires a bipolar output, i.e., [-1, 1], while other adversarial methods require an unipolar output, i.e., [0, 1]. E.g., charlstm_reuters2_sigm.bash.
- MODEL_DATA.bash for classification problems with more than 2 labels, e.g,
charlstm_reuters5.bash trains
- MODEL_DATA_ADV.bash contains code to generate adversarial samples for
different models and different dataset, e.g., charlstm_reuters2_fgsm.bash
generates adversarial texts for
CharLSTM
model on the Reuters2 dataset viaFGM
method.