Skip to content

Commit

Permalink
Update README.md (#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
uralik authored Sep 27, 2024
1 parent 94894f5 commit 5b0497e
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion projects/self_taught_evaluator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ Instructions and materials presented here correspond to the [Self-Taught Evaluat

**2024-09-26**

We release the Self-Taught Evaluator model via the Hugging-Face model repo: https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B. This model is trained iteratively with supervised fine-tuning (SFT) and direct preference optimization (DPO).
We release the Self-Taught Evaluator model via the Hugging-Face model repo: https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B. This model was trained using supervised fine-tuning (SFT) and direct preference optimization (DPO).

First, the model was trained on data comprising responses and evalulation plans generated by the seed model (see Section 3 in the paper). Next, the selected SFT model was used to generate higher quality evaluation plans for preference finetuning dataset (see [section below](./README.md#synthetic-preference-data)). Finally, the released model was trained on preference finetuning data using the combination of DPO and NLL losses. The checkpoint selection was done using the pairwise judge accuracy computed over the Helpsteer2 validation set.

## Inference and Evaluation

Expand Down

0 comments on commit 5b0497e

Please sign in to comment.