adding CFPO to verl #5252

asparius · 2026-02-09T14:00:26Z

CFPO in VERL

Overview

Adds support for CFPO (Clipping-Free Policy Optimization), a clipping-free alternative to PPO-style objectives. CFPO replaces ratio clipping with a smooth quadratic penalty, removing zero-gradient regions while maintaining stable updates.

Features

No ratio clipping
Quadratic penalty objective
Fully differentiable
No additional hyperparameters
Drop-in, opt-in (default behavior unchanged)

Reference

https://arxiv.org/abs/2601.22801

CLAassistant · 2026-02-09T14:00:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request adds support for the CFPO policy loss. The implementation is a good start, but it includes some unused code, likely from a copy-paste, which should be removed for clarity. I've provided specific suggestions for this. A more critical issue is that the new 'cfpo' loss mode is not added to the configuration files (PolicyLossConfig in verl/workers/config/actor.py and verl/trainer/config/actor/actor.yaml), which will prevent users from selecting it. This should be addressed to make the feature usable.

gemini-code-assist · 2026-02-09T14:03:30Z

verl/trainer/ppo/core_algos.py

+        clip_ratio_c = config.get(  # Lower bound of the ratio for dual-clip PPO. See https://arxiv.org/pdf/1912.09729.
+            "clip_ratio_c", 3.0
+        )
+
+        cliprange = clip_ratio
+        cliprange_low = clip_ratio_low
+        cliprange_high = clip_ratio_high
+
+        assert clip_ratio_c > 1.0, (
+            "The lower bound of the clip_ratio_c for dual-clip PPO should be greater than 1.0,"
+            + f" but get the value: {clip_ratio_c}."
+        )


The clip_ratio_c parameter and its associated assertion are specific to dual-clip PPO and are not used in the CFPO loss calculation or its metrics. This code should be removed to avoid confusion and keep the implementation clean.

Suggested change

clip_ratio_c = config.get( # Lower bound of the ratio for dual-clip PPO. See https://arxiv.org/pdf/1912.09729.

"clip_ratio_c", 3.0

)

cliprange = clip_ratio

cliprange_low = clip_ratio_low

cliprange_high = clip_ratio_high

assert clip_ratio_c > 1.0, (

"The lower bound of the clip_ratio_c for dual-clip PPO should be greater than 1.0,"

+ f" but get the value: {clip_ratio_c}."

)

cliprange = clip_ratio

cliprange_low = clip_ratio_low

cliprange_high = clip_ratio_high

gemini-code-assist · 2026-02-09T14:03:30Z

verl/trainer/ppo/core_algos.py

+        ### This code is for logging purposes
+        pg_losses1 = -advantages * ratio
+        if cliprange_low is None:
+            cliprange_low = cliprange
+        if cliprange_high is None:
+            cliprange_high = cliprange
+        pg_losses2 = -advantages * torch.clamp(
+            ratio, 1 - cliprange_low, 1 + cliprange_high
+        )  # - clip(ratio, 1-cliprange, 1+cliprange) * A
+        clip_pg_losses1 = torch.maximum(
+            pg_losses1, pg_losses2
+        )  # max(-ratio * A, -clip(ratio, 1-cliprange, 1+cliprange) * A)
+        pg_clipfrac = verl_F.masked_mean(torch.gt(pg_losses2, pg_losses1).float(), response_mask)
+
+        pg_losses3 = -advantages * clip_ratio_c
+
+        pg_clipfrac_lower = verl_F.masked_mean(
+            torch.gt(clip_pg_losses1, pg_losses3) * (advantages < 0).float(), response_mask
+        )
+        ### This code is for logging purposes


This block of code, marked for "logging purposes", appears to be dead code from a copy-paste. The variables pg_clipfrac and pg_clipfrac_lower are calculated here but are immediately overwritten by a new calculation later in the function. Other variables defined in this block are not used outside of it. This block should be removed to improve clarity and avoid unnecessary computations.

adding CFPO to verl

55ef665

asparius requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners February 9, 2026 14:00

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

Update core_algos.py

18bd5cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding CFPO to verl #5252

adding CFPO to verl #5252

Uh oh!

asparius commented Feb 9, 2026

Uh oh!

CLAassistant commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adding CFPO to verl #5252

Are you sure you want to change the base?

adding CFPO to verl #5252

Uh oh!

Conversation

asparius commented Feb 9, 2026

CFPO in VERL

Overview

Features

Reference

Uh oh!

CLAassistant commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants