[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
-
Updated
Jun 10, 2024 - Python
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO
Add a description, image, and links to the reward-model topic page so that developers can more easily learn about it.
To associate your repository with the reward-model topic, visit your repo's landing page and select "manage topics."