GitHub - practical-dreamer/vicuna_to_alpacan: Conversion script adapting vicuna dataset into alpaca format for use with oobabooga's trainer

Description

This conversion script is designed to convert vicuna datasets to a more alpaca-like format. To be used with the trainer found here: https://github.com/oobabooga/text-generation-webui/wiki/Using-LoRAs#training-a-lora This was designed to conform to SOME of the format from the conv_vicuna_v1_1 format from the FastChat Github repo (https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) while working within the format of Ooba's Trainer. Some liberties were taken on this format adaptation...

3 Different versions. Not sure which is best. (B is probably the best for booga)

Make sure whatever version you pick matches the vicuna-format JSON

Convert vicuna json to alpaca format with python format_B.py --input <path_to_vicuna_dataset>
Copy the datasets folder to text-generation-webui/training/datasets
Copy the formats folder to text-generation-webui/training/formats

Still ~~Totally~~ Mostly Untested WIP

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
datasets		datasets
formats		formats
LICENSE		LICENSE
README.md		README.md
format_A.py		format_A.py
format_B.py		format_B.py
format_C.py		format_C.py
training_logData.diff		training_logData.diff
training_padHotFix.diff		training_padHotFix.diff