Use Hydra for run configuration? #188

dimitri-voytan · 2025-08-05T04:49:37Z

dimitri-voytan
Aug 5, 2025

hydra is a great library for managing complex applications.

I think it could help with running experiments, e.g. by providing reasonable defaults and a CLI to override learning rates, KL regularization weight, distributed setup, etc.

This would be a major change, so starting a discussion here before working on it.

Sohailm25 · 2025-10-11T02:39:25Z

Sohailm25
Oct 11, 2025

So just to make sure we're on the same page:

The current state:

GRPOConfig is a dataclass that extends HuggingFace's TrainingArguments (verifiers/trainers/grpo_config.py:12)
Training scripts are Python files that manually set config attributes (examples/grpo/train_*.py)
Theres about 11 training scripts, each ~40-60 lines, most of it is boilerplatey with parameter tweaking
It already uses HfArgumentParser through the TrainingArguments inheritance (bullet 1) (grpo_config.py:19-21)
Prime-RL uses TOML configs (README.md:271-283)

Adding Hydra in would mean:

YAML/config file support with defaults
CLI overrides via command line
Composition (combine base configs with experiment-specific ones)
Sweeps for hyperparameter tuning

The main issue imo

Not sure if it's in line with the codebase's underlying 'fail fast, fail loud' mindset - i.e. hydra will then add:

another layer of abstraction
more branching paths
silent fallbacks if misconfigured (honestly the biggest issue unless heavily logger'd out)

It's not the same but GRPOConfig already inherits from TrainingArguments, which supports HfArgumentParser i.e you can already do:

parser = HfArgumentParser(GRPOConfig)
args = parser.parse_args_into_dataclasses()

which gives you CLI overrides without adding dependencies

prime-rl (sister project) uses TOML configs, so hydra would create some fragmentation imo

verifiers will use Hydra/YAML
prime-RL will use TOML
it'll be less user friendly i.e. ppl need to learn both systems

BUT i do get the appeal because I like SWE OOP principles so the modularity is nice, plus:

Experiment tracking => sweeps across LR, beta, num_generations
Reproducibility => config files are better than "modify the .py file"

A better approach imho would be to leverage what's already there

example:

from verifiers import GRPOConfig

def get_config():
args = GRPOConfig(
run_name="wordle-grpo",
per_device_train_batch_size=8,
num_generations=16,
# add other defaults here
)
return args

examples/grpo/train_wordle.py

from configs.wordle_base import get_config
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--lr", type=float, default=None)
parser.add_argument("--beta", type=float, default=None)

add key overrides

args, unknown = parser.parse_known_args()
training_args = get_config()
if args.lr: training_args.learning_rate = args.lr
if args.beta: training_args.beta = args.beta

OR a potentially better approach is to match the TOML approach of prime-rl (thoughts? @willccbb)

i.e.

configs/wordle.toml

[training]
run_name = "wordle-grpo"
per_device_train_batch_size = 8
learning_rate = 1e-6
beta = 0.001

Anyways nothing hard and true, just some thoughts. but i like the ideas fs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Hydra for run configuration? #188

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Use Hydra for run configuration? #188

Uh oh!

dimitri-voytan Aug 5, 2025

Replies: 1 comment

Uh oh!

Sohailm25 Oct 11, 2025

examples/grpo/train_wordle.py

add key overrides

configs/wordle.toml

dimitri-voytan
Aug 5, 2025

Sohailm25
Oct 11, 2025