Comparing various RLHF methods
reinforcement-learning
transformers
transformer
ppo
dpo
llm
llms
rlhf
reinforcement-learning-from-human-feedback
reinforcement-learning-from-ai-feedback
-
Updated
Sep 23, 2024 - Jupyter Notebook