GitHub - ahclab/PersonaGeneration: Create Persona dataset from reddit en movie category comment

Create Persona dialogue dataset from Reddit dataset

This repository provides the data process scripts.

There are python scripts(.py scripts) and jupyter notebook(.ipynb). Both scripts process is same. You can use which you like.

python PersonaGeneration.py

Default input path is ./reddit_data//.json If you wanna change input file path, you can change INPUT_PATH. Default output path is ./outputs/.

python PreprocessOfTest.py

Input file is output of PersonaGeneration.py.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
PersonaGeneration.ipynb		PersonaGeneration.ipynb
PersonaGeneration.py		PersonaGeneration.py
PreprocessOfTest.ipynb		PreprocessOfTest.ipynb
PreprocessOfTest.py		PreprocessOfTest.py
README.md		README.md
requirements.txt		requirements.txt