GitHub - ddl-donglin/PERQA: A Conversational QA in PER

A Conversational Question Answering on Phrase Ellipsis Recovery

Baseline

If you want to train your own data, please follow the glove and spacy, and generate your own features.

Base projects	flowQA	coQA	DrQA	QuAC (dataset)

Modified projects	flowQA_4_PERQA	coQA_4_PERQA	QuAC_4_PERQA	DrQA_4_PERQA

QuAC

parameters:

gpu5 quac_perq0:
    CoVe_opt=1, 
    MTLSTM_path='glove/MT-LSTM.pth', 
    answer_type_num=1, 
    batch_size=3, 
    concat_rnn=False, 
    cuda=True, 
    deep_att_hidden_size_per_abstr=250, 
    deep_inter_att_do_similar=0, 
    dev_dir='QuAC_data/', 
    do_hierarchical_query=True, 
    do_not_save=False, 
    do_prealign=True, 
    do_ptr_update=True, 
    do_residual=False, 
    do_residual_everything=False, 
    do_residual_rnn=False, 
    do_seq_dropout=True, 
    dropout_emb=0.4, 
    elmo_batch_size=12, 
    elmo_lambda=0.0, 
    epoches=30, 
    eval_per_epoch=1, 
    explicit_dialog_ctx=2, 
    final_output_att_hidden=250, 
    fix_embeddings=False, 
    grad_clipping=10, 
    hidden_size=125, 
    learning_rate=0.1, 
    log_file='output.log', 
    log_per_updates=20, 
    max_len=35, 
    model_dir='models', 
    momentum=0, 
    my_dropout_p=0.4, 
    name='', 
    ner_dim=8, 
    ner_size=19, 
    no_dialog_flow=False, 
    no_em=False, 
    optimizer='adamax', 
    pos_dim=12, 
    pos_size=51, 
    prealign_hidden=300, 
    prealign_option=2, 
    pretrain='', 
    ptr_net_attn_type='Bilinear', 
    ptr_net_indep_attn=True, 
    question_merge='linear_self_attn', 
    question_normalize=True, 
    reduce_lr=0.0, 
    resume='', 
    resume_options=False, 
    rnn_layers=1, 
    rnn_type='lstm', 
    save_best_only=True, 
    save_for_predict=False, 
    seed=1023, 
    self_attention_opt=1, 
    task_name='QuAC', 
    train_dir='QuAC_data/', 
    tune_partial=400, 
    use_dialog_act=False, 
    use_elmo=True, 
    use_ner=True, 
    use_pos=True, 
    use_wemb=True, 
    weight_decay=0

result:

    Train:  F1 = 3.298,     NA = 100.000,   TH = -18.768
    Test:   F1 = 3.30,      HEQ_Q=2.74,     HEQ_D = 0.81

CoQA

parameters:

gpu1 coqa_perq0:
    CoVe_opt=1, 
    MTLSTM_path='glove/MT-LSTM.pth', 
    answer_type_num=4, 
    batch_size=1, 
    concat_rnn=False, 
    cuda=True, 
    deep_att_hidden_size_per_abstr=250, 
    deep_inter_att_do_similar=0, 
    dev_dir='CoQA/', 
    do_hierarchical_query=True, 
    do_not_save=False, 
    do_prealign=True, 
    do_ptr_update=True, 
    do_residual=False, 
    do_residual_everything=False, 
    do_residual_rnn=False, 
    do_seq_dropout=True, dropout_emb=0.4, 
    elmo_batch_size=12, 
    elmo_lambda=0.0,
    epoches=30, 
    eval_per_epoch=1, 
    explicit_dialog_ctx=1, 
    final_output_att_hidden=250, 
    fix_embeddings=False, 
    grad_clipping=10, 
    hidden_size=125, 
    learning_rate=0.1, 
    log_file='output.log', 
    log_per_updates=20, 
    max_len=15, 
    model_dir='models', 
    momentum=0, 
    my_dropout_p=0.4, 
    name='', 
    ner_dim=8, 
    ner_size=19, 
    no_dialog_flow=False, 
    no_em=False, 
    optimizer='adamax', 
    pos_dim=12, 
    pos_size=51, 
    prealign_hidden=300, 
    prealign_option=2, 
    pretrain='', 
    ptr_net_attn_type='Bilinear', 
    ptr_net_indep_attn=True, 
    question_merge='linear_self_attn', 
    question_normalize=True, 
    rationale_lambda=0.0, 
    reduce_lr=0.0, 
    resume='', 
    resume_options=False, 
    rnn_layers=1, 
    rnn_type='lstm', 
    save_best_only=True, 
    save_for_predict=False, 
    seed=1023, 
    self_attention_opt=1, 
    task_name='CoQA', 
    train_dir='CoQA/', 
    tune_partial=400, 
    use_elmo=True, 
    use_ner=True, 
    use_pos=True, 
    use_wemb=True, 
    weight_decay=0

result:

    Train:  F1 = 4.599,
    Test:   F1 = 4.605

Environment

Tensorflow1.12 + CUDA9.0 + python3.6 + Tensor2tensor

Download and install Anaconda

You can find more install details about this on my blog.

sudo bash Anaconda3-5.2.0-Linux-x86_64.sh

conda --version

conda create -n tensorflow pip python=3.6

source activate tensorflow

pip install --upgrade pip

(tensorflow)$ pip install --ignore-installed --upgrade https://download.tensorflow.google.cn/linux/gpu/tensorflow_gpu-1.8.0-cp36-cp36m-linux_x86_64.whl

Encoding

Bert

Download pre-trained encoder model: BERT_4_Chinese

If you want to extract your own proposal data, you could just run the utils/prepare.py (some paths of files need to be modified:

source activate tensorflow
pip install -r requirements.txt
python prepare.py

GloVe

You could find our modified glove for Chinese. glove-tools

If you want to extract your own proposal data, you could find more details from glove-tools

Due to lack of support for Chinese in basic projects, we adopted LTP, LTP_py

LTP can be download from Google_Drive

CoVE

CoVE project: cove

ELMo

Bidirectional LSTM-CRF and ELMo project: anago

Whole project files structure

.
├── baseline
│   ├── DrQA
│   │   ├── download.sh
│   │   ├── drqa
│   │   │   ├── __init__.py
│   │   │   ├── pipeline
│   │   │   │   ├── drqa.py
│   │   │   │   └── __init__.py
│   │   │   ├── reader
│   │   │   │   ├── config.py
│   │   │   │   ├── data.py
│   │   │   │   ├── __init__.py
│   │   │   │   ├── layers.py
│   │   │   │   ├── model.py
│   │   │   │   ├── predictor.py
│   │   │   │   ├── rnn_reader.py
│   │   │   │   ├── utils.py
│   │   │   │   └── vector.py
│   │   │   ├── retriever
│   │   │   │   ├── doc_db.py
│   │   │   │   ├── __init__.py
│   │   │   │   ├── tfidf_doc_ranker.py
│   │   │   │   └── utils.py
│   │   │   └── tokenizers
│   │   │       ├── corenlp_tokenizer.py
│   │   │       ├── __init__.py
│   │   │       ├── regexp_tokenizer.py
│   │   │       ├── simple_tokenizer.py
│   │   │       ├── spacy_tokenizer.py
│   │   │       └── tokenizer.py
│   │   ├── img
│   │   │   └── drqa.png
│   │   ├── install_corenlp.sh
│   │   ├── LICENSE
│   │   ├── PATENTS
│   │   ├── README.md
│   │   ├── requirements.txt
│   │   ├── scripts
│   │   │   ├── convert
│   │   │   │   ├── squad.py
│   │   │   │   └── webquestions.py
│   │   │   ├── distant
│   │   │   │   ├── check_data.py
│   │   │   │   ├── generate.py
│   │   │   │   └── README.md
│   │   │   ├── pipeline
│   │   │   │   ├── eval.py
│   │   │   │   ├── interactive.py
│   │   │   │   └── predict.py
│   │   │   ├── reader
│   │   │   │   ├── interactive.py
│   │   │   │   ├── predict.py
│   │   │   │   ├── preprocess.py
│   │   │   │   ├── README.md
│   │   │   │   └── train.py
│   │   │   └── retriever
│   │   │       ├── build_db.py
│   │   │       ├── build_tfidf.py
│   │   │       ├── eval.py
│   │   │       ├── interactive.py
│   │   │       ├── prep_wikipedia.py
│   │   │       └── README.md
│   │   └── setup.py
│   └── FlowQA
│       ├── CoQA_eval.py
│       ├── download.sh
│       ├── general_utils.py
│       ├── predict_CoQA.py
│       ├── predict_QuAC.py
│       ├── preprocess_CoQA.py
│       ├── preprocess_QuAC.py
│       ├── QA_model
│       │   ├── detail_model.py
│       │   ├── layers.py
│       │   ├── model_CoQA.py
│       │   ├── model_QuAC.py
│       │   └── utils.py
│       ├── README.md
│       ├── requirements.txt
│       ├── train_CoQA.py
│       └── train_QuAC.py
├── bert
│   ├── chinese_L-12_H-768_A-12
│   │   ├── bert_config.json
│   │   ├── bert_model.ckpt.data-00000-of-00001
│   │   ├── bert_model.ckpt.index
│   │   ├── bert_model.ckpt.meta
│   │   └── vocab.txt
│   ├── extract_features.py
│   ├── __init__.py
│   ├── modeling.py
│   ├── __pycache__
│   │   ├── extract_features.cpython-35.pyc
│   │   ├── extract_features.cpython-36.pyc
│   │   ├── __init__.cpython-35.pyc
│   │   ├── __init__.cpython-36.pyc
│   │   ├── modeling.cpython-35.pyc
│   │   ├── modeling.cpython-36.pyc
│   │   └── tokenization.cpython-36.pyc
│   ├── run_classifier.py
│   ├── tmp_in.txt
│   ├── tmp_out.txt
│   ├── tokenization.py
│   └── tokenization_test.py
├── data
│   ├── chzhu.pkl
│   ├── dldi.pkl
│   ├── jwhu.pkl
│   ├── lmzhang.pkl
│   ├── xfduan.pkl
│   ├── xichen.pkl
│   ├── zh_session_ano.json
│   ├── zqzhu.pkl
│   └── zyzhao.pkl
├── __init__.py
├── README.md
├── requirements.txt
├── tree.txt
└── utils
    ├── __init__.py
    ├── PERQAInstance.py
    ├── prepare.py
    ├── __pycache__
    │   ├── PERQAInstance.cpython-35.pyc
    │   └── PERQAInstance.cpython-36.pyc
    ├── tmp_in.txt
    └── tmp_out.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Conversational Question Answering on Phrase Ellipsis Recovery

Baseline

QuAC

CoQA

Environment

Encoding

Bert

Download pre-trained encoder model: BERT_4_Chinese

GloVe

CoVE

ELMo

Whole project files structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
baseline		baseline
bert		bert
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
tree.txt		tree.txt

Folders and files

Latest commit

History

Repository files navigation

A Conversational Question Answering on Phrase Ellipsis Recovery

Baseline

QuAC

CoQA

Environment

Encoding

Bert

Download pre-trained encoder model: BERT_4_Chinese

GloVe

CoVE

ELMo

Whole project files structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages