Create Persona dialogue dataset from Reddit dataset

This repository provides the data process scripts.

1.Data Preparation

There are python scripts(.py scripts) and jupyter notebook(.ipynb). Both scripts process is same. You can use which you like.

python PersonaGeneration.py

Default input path is ./reddit_data//.json If you wanna change input file path, you can change INPUT_PATH. Default output path is ./outputs/.

python PreprocessOfTest.py

Input file is output of PersonaGeneration.py.