rlhf link

jonmay · Oct 6, 2024 · c3292c1 · c3292c1
1 parent 46b4efe
commit c3292c1
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/_modules/week-07.md b/_modules/week-07.md
@@ -35,7 +35,7 @@ The announcement can be made red for due dates as follows
 -->
 
 Oct 7
-: Reinforcement Learning with Human Feedback: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO)
+: [Reinforcement Learning with Human Feedback: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO)](({{site.baseurl}}assets/files/rlhf.pptx)
   : [Ziegler RLHF Paper]({{site.baseurl}}assets/files/ziegler.pdf), [DPO Paper]({{site.baseurl}}assets/files/dpo.pdf)
   : Emily Weiss - [Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration](https://arxiv.org/abs/2402.00367)
   : Questions by: Yifan Jiang