RubricGroup.score_rollout and score_group maintain reward and metrics between rubrics, but not timing. After N rubrics:
- scoring_ms = the last rubric's scoring time
- each rubric sees the cumulative
total_ms, including other rubric's scoring_ms
- generation_ms + scoring_ms ≠ total_ms for a rubric group
This affects the following environments: MultiTurnEnv, ToolEnv, PythonEnv, SandboxEnv.