Update README.md (#9)

facebookresearch · Sep 27, 2024 · 5b0497e · 5b0497e
1 parent 94894f5
commit 5b0497e
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/projects/self_taught_evaluator/README.md b/projects/self_taught_evaluator/README.md
@@ -8,7 +8,9 @@ Instructions and materials presented here correspond to the [Self-Taught Evaluat
 
 **2024-09-26**
 
-We release the Self-Taught Evaluator model via the Hugging-Face model repo: https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B. This model is trained iteratively with supervised fine-tuning (SFT) and direct preference optimization (DPO).
+We release the Self-Taught Evaluator model via the Hugging-Face model repo: https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B. This model was trained using supervised fine-tuning (SFT) and direct preference optimization (DPO).
+
+First, the model was trained on data comprising responses and evalulation plans generated by the seed model (see Section 3 in the paper). Next, the selected SFT model was used to generate higher quality evaluation plans for preference finetuning dataset (see [section below](./README.md#synthetic-preference-data)). Finally, the released model was trained on preference finetuning data using the combination of DPO and NLL losses. The checkpoint selection was done using the pairwise judge accuracy computed over the Helpsteer2 validation set.
 
 ## Inference and Evaluation