Implementation of Dynamic Memory Network+ (for question answering) using Tensorflow.
The implementation is based on the model proposed in
The original Dynamic Memory Network was introduced in
(I had to refer to this paper too).
This DMN+ Model uses:
- Word vectors in facts are positionally encoded, and added to create sentence representations.
- Bi-directional GRU is used over the sentence representations in the funsion layer. The forward and backward list of hidden states are added.
- Attention Based GRU is used in the episodic memory module.
- A linear layer with ReLu activation is used along with untied weights to update the memory for the next pass.
I also included layer normalization (Layer Normalization - Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton) before every activation, barring the pre-activation state of the final layer.
I used pre-trained GloVe embedding downloaded from here. I used the 100 dimensional embeddings.
I trained the model on basic induction tasks from bAbi-tasks dataset.
Hyperparameters are different from the original implementation.
- Hidden size = 100
- Embedding dimensions = 100
- Learning rate = 0.001
- Passes = 3
- Mini Batch Size = 128
- L2 Regularization = 0.0001
- Dropout Rate = 0.1
- Initialization = 0 for biases, Xavier for weights
(last 10% of data samples used for validation.)
I trained the model in a weakly supervised fashion. That is, the model won't be told which supporting facts are relevant for inductive reasoning in order to derive an answer.
The network starts to overfit around the 35th epoch. The validation cost starts to increase, while the training cost keeps on decreasing.
The published classification error of QA task 16 (basic induction) of bAbi Dataset of the DMN+ model (as given here: https://arxiv.org/pdf/1603.01417.pdf - page 7) is 45.3.
From the paper:
One notable deficiency in our model is that of QA16: Basic Induction. In Sukhbaatar et al. (2015), an untied model using only summation for memory updates was able to achieve a near perfect error rate of 0.4. When the memory update was replaced with a linear layer with ReLU activation, the end-to-end memory network’s overall mean error decreased but the error for QA16 rose sharply. Our model experiences the same difficulties, suggesting that the more complex memory update component may prevent convergence on certain simpler tasks.
My implementation of the model on pretrained 100 dimensional GloVe vectors seems to produce about 51% classification accuracy on Test Data for induction tasks (check DMN+.ipynb)...i.e the classification error is 49%. .
The error is less than what the original DMN model acheived (error 55.1%) as specified in the paper, but still greater than the errors achieved achieved by the original implementation of the improved versions of DMN (DMN1, DMN2, DMN3, DMN+) in the paper.
This could be due to using different hyperparameters and embeddings, or I may have missed something in my implementations.
QA_PreProcess.py\QA_PreProcess.ipynb: Converts the raw induction tasks data set to separate ndarrays containing questions, answers, and facts with all words being in the form of GloVe pre-trained vector representations.
DMN+.py\DMN+.ipynb: The DMN+ model, along with training, validation and testing.
- Tensorflow 1.4
- Numpy 1.13.3
- Python 2.7.12