Skip to content
@RLHFlow

RLHFlow

Code for the Workflow of Reinforcement Learning from Human Feedback (RLHF)

Popular repositories Loading

  1. RLHF-Reward-Modeling RLHF-Reward-Modeling Public

    Recipes to train reward model for RLHF.

    Python 1.5k 102

  2. Online-RLHF Online-RLHF Public

    A recipe for online RLHF and online iterative DPO.

    Python 536 49

  3. Online-DPO-R1 Online-DPO-R1 Public

    Codebase for Iterative DPO Using Rule-based Rewards

    Python 260 33

  4. Minimal-RL Minimal-RL Public

    Python 247 12

  5. Self-rewarding-reasoning-LLM Self-rewarding-reasoning-LLM Public

    Recipes to train the self-rewarding reasoning LLMs.

    Python 227 11

  6. Reinforce-Ada Reinforce-Ada Public

    An adaptive sampling framework for Reinforce-style LLM post training.

    Python 78 15

Repositories

Showing 10 of 12 repositories
  • Reinforce-Ada Public

    An adaptive sampling framework for Reinforce-style LLM post training.

    RLHFlow/Reinforce-Ada’s past year of commit activity
    Python 78 Apache-2.0 15 0 0 Updated Oct 28, 2025
  • RLHFlow/Reinforce-Ada-Tinker’s past year of commit activity
    Python 6 Apache-2.0 0 0 0 Updated Oct 16, 2025
  • GVM Public
    RLHFlow/GVM’s past year of commit activity
    Python 15 Apache-2.0 0 1 0 Updated Jul 29, 2025
  • RLHFlow.github.io Public

    Webpage for RLHFlow

    RLHFlow/RLHFlow.github.io’s past year of commit activity
    HTML 9 0 0 0 Updated Jun 20, 2025
  • Minimal-RL Public
    RLHFlow/Minimal-RL’s past year of commit activity
    Python 247 Apache-2.0 12 6 0 Updated May 14, 2025
  • RLHF-Reward-Modeling Public

    Recipes to train reward model for RLHF.

    RLHFlow/RLHF-Reward-Modeling’s past year of commit activity
    Python 1,476 Apache-2.0 102 19 2 Updated Apr 24, 2025
  • Online-DPO-R1 Public

    Codebase for Iterative DPO Using Rule-based Rewards

    RLHFlow/Online-DPO-R1’s past year of commit activity
    Python 260 33 7 0 Updated Apr 11, 2025
  • Self-rewarding-reasoning-LLM Public

    Recipes to train the self-rewarding reasoning LLMs.

    RLHFlow/Self-rewarding-reasoning-LLM’s past year of commit activity
    Python 227 11 2 0 Updated Mar 2, 2025
  • Online-RLHF Public

    A recipe for online RLHF and online iterative DPO.

    RLHFlow/Online-RLHF’s past year of commit activity
    Python 536 49 12 0 Updated Dec 28, 2024
  • Directional-Preference-Alignment Public

    Directional Preference Alignment

    RLHFlow/Directional-Preference-Alignment’s past year of commit activity
    57 Apache-2.0 3 2 0 Updated Sep 23, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Python HTML

Most used topics