From c3292c102477df05051f956a32cd8a7e8de09730 Mon Sep 17 00:00:00 2001 From: jonmay Date: Sun, 6 Oct 2024 13:55:23 -0700 Subject: [PATCH] rlhf link --- _modules/week-07.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_modules/week-07.md b/_modules/week-07.md index dd7c431..d475068 100644 --- a/_modules/week-07.md +++ b/_modules/week-07.md @@ -35,7 +35,7 @@ The announcement can be made red for due dates as follows --> Oct 7 -: Reinforcement Learning with Human Feedback: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) +: [Reinforcement Learning with Human Feedback: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO)](({{site.baseurl}}assets/files/rlhf.pptx) : [Ziegler RLHF Paper]({{site.baseurl}}assets/files/ziegler.pdf), [DPO Paper]({{site.baseurl}}assets/files/dpo.pdf) : Emily Weiss - [Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration](https://arxiv.org/abs/2402.00367) : Questions by: Yifan Jiang