fin-rlhf

Run SFT with:

accelerate launch sft.py

Merge SFT LORA weights with

python merge.py --checkpoint_path="results/sft-finqa/checkpoint-550" --merged_path="results/sft-finqa/final_model

Generate dataset with

accelerate launch feedback/generate.py --model_name="results/sft-finqa/final_model" --tokenizer_name="mistralai/Mistral-7B-v0.1" --dataset_name="gbharti/finance-alpaca" --save_path="feedback/finance-alpaca-unlabeled.csv" --num_steps=100

Annotate dataset with

python feedback/annotate.py --unlabeled_path="feedback/finance-alpaca-unlabeled.csv" --labels_path="feedback/finance-alpaca-labels.csv"

Merge generations with annotations and upload to huggingface with

python feedback/merge_labels.py --hf_repo="danyoung/finance-feedback"

Run DPO with

accelerate launch dpo.py

Upload model with

huggingface-cli upload danyoung/finance-qa results/sft-finqa/final_model

Evaluate a model with

accelerate launch evaluation/evaluate.py --model_name="results/sft-finqa/final_model" --tokenizer_name="mistralai/Mistral-7B-v0.1"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

fin-rlhf

Files

README.md

Latest commit

History

README.md

File metadata and controls

fin-rlhf