Skip to content

Conversation

jacobthebanana
Copy link
Collaborator

This pull request includes:

  • Minimalized TRL/VeRL to run one PPO optimization step, given token array, per-token advantage array, model checkpoint, and PPO hyperparameters.
  • Agent SDK integration- define the environment using the familiar OpenAI Agent SDK and run RL on the LLM powering the agent. Not yet tested on multi-agent setups (agent as tool or handoff)
  • Extensive typing for simplified function signatures and IDE support- static type checking, pyright lints, proper autocompletion even within the training loop.

Vec-Inf wrapper integration will come in a separate pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant