Multi-turn Dialog Generation

Dataset

The preprocess script for these datasets can be found under data folder.

DailyDialog dataset
Ubuntu corpus
PersonaChat

Metric

PPL: test perplexity
BLEU(1-4):
Embedding-based metrics: Average, Extrema, Greedy
Distinct-1/2

Requirements

Pytorch 1.2+
Python 3.6.1+
tqdm
numpy
nltk 3.4+
scipy
sklearn (optional)
GoogleNews word2vec or glove 300 word2vec (optional)
tensorboard (for PyTorch 1.2+)

Dataset format

Three multi-turn open-domain dialogue dataset (Dailydialog, PersonaChat, UbuntuV2) Dailydialog and PersonaChat can be obtained by this link UbuntuV2 can be obtained by this link The preprocess script process.py for these datasets can be found under data/ folder.

Each dataset contains 6 files

src-train.txt
tgt-train.txt
src-dev.txt
tgt-dev.txt
src-test.txt
tgt-test.txt

In all the files, one line contain only one dialogue context (src) or the dialogue response (tgt). More details can be found in the example files. Each sentence must begin with the special tokens <user0> and <user1> which denote the speaker. The __eou__ is used to separate the multiple sentences in the conversation context.

How to use

Model names: PHAED
Dataset names: DaildyDialog, PersonaChat, Ubuntu

0. Ready

Before running the following commands, make sure the essential folders are created:

mkdir -p processed/$DATASET
mkdir -p data/$DATASET
mkdir -p tblogs/$DATASET
mkdir -p ckpt/$DATASET

Variable DATASET contains the name of the dataset that you want to process

1. Generate the vocab of the dataset

./run.sh vocab <dataset>

# get the vocab of DailyDialog dataset
./run.sh vocab DailyDialog

2. Train the model on corresponding dataset

./run.sh train <dataset> <model> <cuda>

# train the PHAED model with DailyDialog dataset on 0th GPU
./run.sh train DailyDialog PHAED 0

3. Translate the test dataset:

./run.sh translate <dataset> <model> <cuda>

# generation the response. translate mode, dataset dialydialog, model PHAED on 0th GPU
./run.sh translate DailyDialog PHAED 0

4. Evaluate the result of the translated utterances

./run.sh eval <dataset> <model> <cuda>

# get the BLEU, Distinct, embedding-based metrics result of the generated sentences on 0th GPU
./run.sh eval DailyDialog PHAED 0

Acknowledgements

Builds on the MutiTurnDialogZoo, embedding_metric, and transformer_xl

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
metric		metric
model		model
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
context_cache.py		context_cache.py
data_loader.py		data_loader.py
eval.py		eval.py
run.sh		run.sh
train.py		train.py
translate.py		translate.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-turn Dialog Generation

Dataset

Metric

Requirements

Dataset format

How to use

0. Ready

1. Generate the vocab of the dataset

2. Train the model on corresponding dataset

3. Translate the test dataset:

4. Evaluate the result of the translated utterances

Acknowledgements

About

Releases

Packages

Languages

License

ZihaoW123/PHAED

Folders and files

Latest commit

History

Repository files navigation

Multi-turn Dialog Generation

Dataset

Metric

Requirements

Dataset format

How to use

0. Ready

1. Generate the vocab of the dataset

2. Train the model on corresponding dataset

3. Translate the test dataset:

4. Evaluate the result of the translated utterances

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages