[Bug] Geo3k multi turn reward abnormal

### Bug Description

I'm trying to reproduce the reward curve of examples/geo3k_multi_turn, but I failed; the rewards do not increase. Even the first step raw reward is around 0.1, which is expected to be around 0.3.....

### Steps to Reproduce

1. use the official Docker container;
2. run examples/geo3k_multi_turn training script.

### Expected Behavior

The first step raw reward is around 0.3

### Actual Behavior

The first step raw reward is around 0.1

### Environment

- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:
- SGLang version (if relevant):
- Megatron-LM version (if relevant):


### Logs

```shell

```

### Additional Context

_No response_

### Pre-submission Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/THUDM/slime/blob/main/CONTRIBUTING.md) and understand the collaboration scope.
- [x] I have read the [documentation](https://thudm.github.io/slime/) and my issue is not addressed there.
- [x] I have searched for [existing issues](https://github.com/THUDM/slime/issues) and this is not a duplicate.
- [x] I have provided a minimal, reproducible example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Geo3k multi turn reward abnormal #1724

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Logs

Additional Context

Pre-submission Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Geo3k multi turn reward abnormal #1724

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Logs

Additional Context

Pre-submission Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions