Skip to content

[Bug] Geo3k multi turn reward abnormal #1724

@Zhou-jiecheng

Description

@Zhou-jiecheng

Bug Description

I'm trying to reproduce the reward curve of examples/geo3k_multi_turn, but I failed; the rewards do not increase. Even the first step raw reward is around 0.1, which is expected to be around 0.3.....

Steps to Reproduce

  1. use the official Docker container;
  2. run examples/geo3k_multi_turn training script.

Expected Behavior

The first step raw reward is around 0.3

Actual Behavior

The first step raw reward is around 0.1

Environment

  • slime version:
  • Python version:
  • PyTorch version:
  • CUDA/ROCm version:
  • GPU type and count:
  • OS:
  • SGLang version (if relevant):
  • Megatron-LM version (if relevant):

Logs

Additional Context

No response

Pre-submission Checklist

  • I have read the CONTRIBUTING.md and understand the collaboration scope.
  • I have read the documentation and my issue is not addressed there.
  • I have searched for existing issues and this is not a duplicate.
  • I have provided a minimal, reproducible example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions