Skip to content

A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1

License

Notifications You must be signed in to change notification settings

The-Swarm-Corporation/AgentGym

Repository files navigation

Agent Gym

Agent Gym

Join our Discord Subscribe on YouTube Connect on LinkedIn Follow on X.com

Convert any model into a r1-like reasoning hyper-intelligent agent. Leverages TRL, Huggingface, and various other libraries. This is a work in progress. Our goal is to make it easy to train any model into a reasoning agent.

Installation

pip3 install -U agentgym

Usage

from agentgym.r1_pipeline import R1Pipeline, SFTConfig

r1_pipeline = R1Pipeline(
    sft_model="Qwen/Qwen2-0.5B-Instruct",
    tokenizer_name="Qwen/Qwen2-0.5B-Instruct",
    sft_dataset="trl-lib/tldr",
    sft_args=SFTConfig(output_dir="/tmp"),
    only_grpo=True,
    model_name="Qwen/Qwen2-0.5B-Instruct"
)

r1_pipeline.run()

Architecture

The architecture is as follows:

  • SFT: Supervised Fine-Tuning
  • GRPO: Generative Reinforcement Policy Optimization

-> model -> sft -> grpo -> model

graph TD;
    A[model] --> B[sft]
    B --> C[grpo]
    C --> D[reasoning model]
Loading

License

MIT

About

A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published