trl

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

python transformer adapters huggingface trl safetensors text-generation-inference unsloth qwen2-5 grpo

Updated Apr 7, 2025
Jupyter Notebook

YanCotta / post_training_llms

Star

Different post-training techniques for LLMs, including: SFT, DPO and Online RL

reinforcement-learning pytorch alignment fine-tuning sft dpo huggingface trl huggingface-transformers llm

Updated Sep 5, 2025
Python

torotoki / reasoning-minimal

Star

Minimal code to train reasoning model with reinforcement learning.

python reinforcement-learning transformers huggingface trl llm

Updated Aug 9, 2025
Python

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

ethicalabs-ai / BlossomTuneLLM

Sponsor

Star

Federated Supervised Fine-Tuning for Small Language Models (SLMs)

transformers language-models federated-learning trl federated-learning-framework large-language-models supervised-finetuning small-language-models

Updated Sep 13, 2025
Python

WCoetser / Trl.TermDataRepresentation

Star

The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.

syntax-tree term-rewriting trl term-database

Updated Dec 16, 2020
C#

SharathHebbar / sft_mathgpt2

Star

Supervised Fine tuning using TRL library

decoder transformers text-generation sft gpt2 trl llm mathgpt

Updated Jan 24, 2024
Jupyter Notebook

Josephrp / SmolFactory

Star

finetune gpt-oss and smollm3 on your data easily and cheaply

oss ai ml transformers publishing gpt datasets gradio deployment-automation mlops huggingface trl smollm3 gpt-oss trackio

Updated Aug 12, 2025
Python

nabeelshan78 / reinforcement-learning-human-feedback-scratch

Star

End-to-end implementation of Reinforcement Learning with Human Feedback (RLHF) to align a GPT-2 model with human preferences — covering Supervised Fine-Tuning (SFT), Reward Modeling, and PPO-based alignment — built from scratch in Python.

nlp reinforcement-learning deep-learning pytorch lora ppo peft ai-alignment gpt2 trl huggingface-transformers large-language-models rlhf supervised-finetuning reward-modeling

Updated Sep 25, 2025
Jupyter Notebook

Mikesterner87 / Nano-R1

Star

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

python build openwrt transformer adapters nanopi huggingface trl nanopi-r1s nanopi-r1 safetensors text-generation-inference unsloth grpo

Updated Nov 22, 2025
Jupyter Notebook

SharathHebbar / dpo_chatgpt2

Star

Direct Preference Optimization of ChatGPT2 using TRL Library

decoder transformers text-generation dpo gpt2 trl llm rlhf chatgpt2

Updated Jan 24, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the trl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the trl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trl

Here are 40 public repositories matching this topic...

jasonvanf / llama-trl

argilla-io / notus

GAD-cell / vlm-grpo

sugarandgugu / Simple-Trl-Training

Shekswess / tiny-reasoning-language-model

RobinSmits / Dutch-LLMs

yflyzhang / simpleR1

ssbuild / llm_rlhf

LegendLeoChen / llm-finetune

Akshint0407 / Nano-R1

YanCotta / post_training_llms

torotoki / reasoning-minimal

rasyosef / phi-2-sft-and-dpo

ethicalabs-ai / BlossomTuneLLM

WCoetser / Trl.TermDataRepresentation

SharathHebbar / sft_mathgpt2

Josephrp / SmolFactory

nabeelshan78 / reinforcement-learning-human-feedback-scratch

Mikesterner87 / Nano-R1

SharathHebbar / dpo_chatgpt2

Improve this page

Add this topic to your repo