Convert any model into a r1-like reasoning hyper-intelligent agent. Leverages TRL, Huggingface, and various other libraries. This is a work in progress. Our goal is to make it easy to train any model into a reasoning agent.
pip3 install -U agentgym
from agentgym.r1_pipeline import R1Pipeline, SFTConfig
r1_pipeline = R1Pipeline(
sft_model="Qwen/Qwen2-0.5B-Instruct",
tokenizer_name="Qwen/Qwen2-0.5B-Instruct",
sft_dataset="trl-lib/tldr",
sft_args=SFTConfig(output_dir="/tmp"),
only_grpo=True,
model_name="Qwen/Qwen2-0.5B-Instruct"
)
r1_pipeline.run()
The architecture is as follows:
- SFT: Supervised Fine-Tuning
- GRPO: Generative Reinforcement Policy Optimization
-> model -> sft -> grpo -> model
graph TD;
A[model] --> B[sft]
B --> C[grpo]
C --> D[reasoning model]
MIT