Skip to content

Remove beta KL divergence from training loss#607

Open
arcticfly wants to merge 1 commit intomainfrom
fix/remove-beta-kl-divergence
Open

Remove beta KL divergence from training loss#607
arcticfly wants to merge 1 commit intomainfrom
fix/remove-beta-kl-divergence

Conversation

@arcticfly
Copy link
Collaborator

Summary

  • Remove the beta parameter and Schulman KL divergence estimator (exp(r-n) - (r-n) - 1) that was added directly to the training loss
  • The kl_penalty_coef mechanism (zero-mean advantage adjustment) remains as the preferred approach for KL regularization

Changes

  • src/art/types.py: Remove beta field from TrainConfig
  • src/art/loss.py: Remove mean_kl from Loss class and the KL divergence computation
  • src/art/local/backend.py: Remove beta parameter from LocalBackend.train()
  • src/art/serverless/backend.py: Remove beta parameter from ServerlessBackend.train()
  • src/art/unsloth/train.py: Remove beta * mean_kl loss addition and kl_div metric logging
  • src/art/megatron/train.py: Remove beta * mean_kl loss addition
  • src/art/preprocessing/inputs.py: Remove beta from warmup config override

Test plan

  • uv run prek run --all-files passes locally (ruff, ruff format, ty)
  • test_backend_train_api.py passed on H200 GPU cluster — model registration, trajectory gathering, training, and logging all succeeded

🤖 Generated with Claude Code

Remove the Schulman KL estimator (beta * KL) that was added directly
to the training loss. The kl_penalty_coef mechanism (advantage
adjustment) remains as the preferred approach for KL regularization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@arcticfly arcticfly requested a review from corbt March 9, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants