Skip to content

This repository contains a script to convert chat conversations from a JSON file to a text file, with formatting suitable for training a language model using the FastChat framework.

License

Notifications You must be signed in to change notification settings

practical-dreamer/vicuna_fastchat_preprocess

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

FastChat Conversation Converter

This repository contains a script to convert chat conversations from a JSON file to a text file, with formatting suitable for training a language model using the FastChat framework. The script relies on the FastChat repository to preprocess and tokenize the conversations before writing them to the output file.

Dependencies

  • Python 3.6 or higher
  • IJSON library
  • FastChat repository (specifically, the train.py script located under the fastchat/train/ directory)

Installation

1. Setup Conda Environment (Optional but recommended)

conda create -n fastchat-conversation-converter python=3.10.9
conda activate fastchat-conversation-converter

2. Clone this repo and install dependencies

git clone https://github.com/practicaldreamer/fastchat-conversation-converter
cd fastchat-conversation-converter
pip install ijson

3. Clone FastChat Repo and install FastChat

mkdir repos
cd repos
git clone https://github.com/lm-sys/FastChat
cd FastChat
pip install -e .
cd ..
cd ..

Note: My script was built around FastChat commit 5ccf842

Usage

python process_conversations.py \
--model_path '/home/user/Documents/models/llama-7b' \
--input_json_path '/home/user/Downloads/ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json' \
--output_txt_path '/home/user/Documents/output.txt'

Replace the arguments with the appropriate values for your use case.

Credits

This script is built upon the FastChat project. Please refer to the original repository for more information about the framework and its usage.

About

This repository contains a script to convert chat conversations from a JSON file to a text file, with formatting suitable for training a language model using the FastChat framework.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages