Skip to content

Commit

Permalink
rlhf link
Browse files Browse the repository at this point in the history
  • Loading branch information
jonmay committed Oct 6, 2024
1 parent 46b4efe commit c3292c1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _modules/week-07.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The announcement can be made red for due dates as follows
-->

Oct 7
: Reinforcement Learning with Human Feedback: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO)
: [Reinforcement Learning with Human Feedback: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO)](({{site.baseurl}}assets/files/rlhf.pptx)
: [Ziegler RLHF Paper]({{site.baseurl}}assets/files/ziegler.pdf), [DPO Paper]({{site.baseurl}}assets/files/dpo.pdf)
: Emily Weiss - [Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration](https://arxiv.org/abs/2402.00367)
: Questions by: Yifan Jiang
Expand Down

0 comments on commit c3292c1

Please sign in to comment.