Overview

This code integrates a question classifier system with the convolutional neural network architecture for learning to match question and answer sentences implemented by Aliaksei Severyn. The original repository of the question answer system can be found here.

Requirement

python 2.7+
numpy
scikit-learn (sklearn)
tensorflow
theano
keras
pandas
tqdm
fish
numba

The above package if not installed, can be installed using pip command for python : pip install <package-name>

Embedding

The pre-trained embeddings used in the QA system can be downloaded here.

Build

Download the word-embeddings from above link and place it in the folder named 'embeddings'.

To build the required train/dev/test sets in the suitable format for the network run:

$ sh run_build_datasets.sh

Deployment - without Question Classifier

To train the model in the TRAIN setting run:

$ python run_nnet.py TRAIN

in the TRAIN-ALL setting using 53,417 qa pairs:

$ python run_nnet.py TRAIN-ALL

The parameters of the trained network are dumped under the 'exp.out' folder.
TRAIN: MAP: 0.7325 MRR: 0.8018

TRAIN-ALL: MAP: 0.7538 MRR: 0.8078

Deployment - with Question Classifier

Download the pre-trained Question Classification models from here. The folder contains the pre-trained question classifier models. It contains three folders namely vocab, TREC and MT.

QC_models/vocab - contains the vocabulary files used to train the models
QC_models/TREC - contains pre-trained model trained on TREC data only
QC_models/MT - contains pre-trained model trained using multitask learning

The question classifier contains a different vocabulary and embedding, but since the input is already passed as the vocab_index and not as word in the previous model, we convert the embedding of the classifier to match the vocabulary of the existing system.

$ python convert_embeddings.py

This wil create appropriate embeddings for each of the previous embddings in the models folder.

To train the model with this question classifier models

python run_nnet.py <train_data> <trained_QC_model_path> <network_QC__was_trained_on>

example :

$ python run_nnet.py TRAIN QC_models/TREC/LSTM/ LSTM

train_data : TRAIN or TRAIN-ALL

trained_QC_model_path : the location of the pre-trained model where the embedding is present as well.

network_QC_was_trained_on : LSTM or GRU

Best result:

$ python run_nnet.py TRAIN-ALL QC_models/MT/LSTM/ LSTM

TRAIN: MAP: 0.7452 MRR: 0.8080

TRAIN-ALL: MAP: 0.7779 MRR: 0.8093

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
jacana-qa-naacl2013-data-results		jacana-qa-naacl2013-data-results
trec_eval-8.0		trec_eval-8.0
.gitignore		.gitignore
README.md		README.md
alphabet.py		alphabet.py
conv1d.py		conv1d.py
convert_embeddings.py		convert_embeddings.py
extract_embeddings.py		extract_embeddings.py
nn_layers.py		nn_layers.py
parse.py		parse.py
run_build_datasets.sh		run_build_datasets.sh
run_eval.sh		run_eval.sh
run_nnet.py		run_nnet.py
sgd_trainer.py		sgd_trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Requirement

Embedding

Build

Deployment - without Question Classifier

Deployment - with Question Classifier

Best result:

About

Releases

Packages

Languages

vevake/deep-qa

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirement

Embedding

Build

Deployment - without Question Classifier

Deployment - with Question Classifier

Best result:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages