Skip to content

Conversion script adapting vicuna dataset into alpaca format for use with oobabooga's trainer

License

Notifications You must be signed in to change notification settings

practical-dreamer/vicuna_to_alpacan

Repository files navigation

Description

This conversion script is designed to convert vicuna datasets to a more alpaca-like format. To be used with the trainer found here: https://github.com/oobabooga/text-generation-webui/wiki/Using-LoRAs#training-a-lora This was designed to conform to SOME of the format from the conv_vicuna_v1_1 format from the FastChat Github repo (https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) while working within the format of Ooba's Trainer. Some liberties were taken on this format adaptation...

Note about versions

3 Different versions. Not sure which is best. (B is probably the best for booga)

Make sure whatever version you pick matches the vicuna-format JSON

How to use Script

  1. Convert vicuna json to alpaca format with python format_B.py --input <path_to_vicuna_dataset>
  2. Copy the datasets folder to text-generation-webui/training/datasets
  3. Copy the formats folder to text-generation-webui/training/formats

Still Totally Mostly Untested WIP

Patch Training Data Logging

  1. Copy the training_logData.diff file to "text-generation-webui/modules/"
  2. Run command patch training.py training_logData.diff
  3. Logs now sent to "text-generation-webui/logs/"

About

Conversion script adapting vicuna dataset into alpaca format for use with oobabooga's trainer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages