This repository is an attempt to reproduce the results presented in the technical report by Microsoft Research Asia. The report describes a complex neural network called R-NET.
Till 2017, R-NET was the best single model(i.e. comparision on stand-alone models, without any ensemble) on the Stanford QA database: SQuAD.
SQuAD dataset uses two performance metrics, Exact-Match(EM) and F1-score(F1). Human performance is estimated to be EM = 82.3% and F1 = 91.2% on the test/dev set.
R-NET (March 2017) has one additional BiGRU between the self-matching attention layer and the pointer network and reaches EM=72.3% and F1=80.7% on the test/dev test.
R-Net at present (on SQUAD-explorer) reaches EM=82.136% and F1=88.126%, which means R-NET development continued after March 2017. Also, ensembling the models helped it to reach higher scores.
The best performance I got so far was
- EM = 42.16 and F1 = 51.064%
Reason for such low metrics
- Chances to be further improved as hyperparameter tuning was not carried out.
- Trained only for 29 epoch due to huge training time, about 3 hrs on Nvidia K40c.
- Further technical reasons can be found in blog.
I have attached a PDF document explaining the model architecture and the current limitations
- We need to parse and split the data
python parse_data.py data/train-v1.1.json --train_ratio 0.9 --outfile data/train_parsed.json --outfile_valid data/valid_parsed.json
python parse_data.py data/dev-v1.1.json --outfile data/dev_parsed.json
- Preprocess the data
python preprocessing.py data/train_parsed.json data/valid_parsed.json data/dev_parsed.json \
--outfile data/train_data_str.pkl data/valid_data_str.pkl data/dev_data_str.pkl --include_str
- Train the model
python train.py --hdim 45 --batch_size 50 --nb_epochs 50 --optimizer adadelta --lr 1 --dropout 0.2 --char_level_embeddings --train_data data/train_data_str.pkl --valid_data data/valid_data_str.pkl
- Predict on dev/test set samples
python predict.py --batch_size 100 --dev_data data/dev_data_str.pkl models/29-t3.742772511577511-v4.2209280522167525.model prediction.json
- Evaluate on dev/test set samples
python evaluate.py --data/dev-v1.1.json --predfile prediction.json
Best model can be downloaded from : Model