Description

This conversion script is designed to convert vicuna datasets to a more alpaca-like format. To be used with the trainer found here: https://github.com/oobabooga/text-generation-webui/wiki/Using-LoRAs#training-a-lora This was designed to conform to SOME of the format from the conv_vicuna_v1_1 format from the FastChat Github repo (https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) while working within the format of Ooba's Trainer. Some liberties were taken on this format adaptation...

Note about versions

3 Different versions. Not sure which is best. (B is probably the best for booga)

Version A - My original Script
Version B - Most Closely Matches Ooba's Vicuna format here: https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.yaml
Version C - Most Closely Matches Fastchat's Vicuna 1.1 format here: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py

Make sure whatever version you pick matches the vicuna-format JSON

Convert vicuna json to alpaca format with python format_B.py --input <path_to_vicuna_dataset>
Copy the datasets folder to text-generation-webui/training/datasets
Copy the formats folder to text-generation-webui/training/formats

Still ~~Totally~~ Mostly Untested WIP