fixing the reward updates #15

m2kulkarni · 2025-11-30T17:01:26Z

No description provided.

greptile-apps · 2025-11-30T17:03:57Z

Greptile Overview

Greptile Summary

Implemented per-agent reward conditioning by replacing hardcoded reward values with dynamically sampled weight arrays. The changes enable agents to learn from varied reward functions during training.

Key changes:

Replaced static reward_vehicle_collision, reward_offroad_collision, and reward_goal values with per-agent collision_weights[i], offroad_weights[i], and goal_weights[i] arrays
Fixed post-respawn goal reward scaling by multiplying reward_goal_post_respawn constant by goal_weights[i]
Corrected configuration bounds: swapped offroad_weight_lb (-0.4) and offroad_weight_ub (0.0), and swapped discount_weight_lb (0.80) and discount_weight_ub (0.98)
Increased num_agents from 512 to 1024 for larger batch training
Extended SLURM array to test all 5 conditioning types (0-4 vs 1-2)

Confidence Score: 5/5

This PR is safe to merge with minimal risk
All changes are internally consistent and mathematically correct. The weight array replacements maintain the same reward computation logic while enabling per-agent conditioning. Configuration bounds were properly corrected to ensure lb < ub. No breaking changes or edge cases detected.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
pufferlib/config/ocean/drive.ini	5/5	Updated conditioning weight bounds and increased num_agents from 512 to 1024; swapped lb/ub values correctly for offroad and discount weights
pufferlib/ocean/drive/drive.h	5/5	Replaced hardcoded reward values with per-agent weight arrays for reward conditioning; correctly scaled post-respawn goal rewards
scripts/run.sh	5/5	Expanded SLURM array from 1-2 to 0-4 to test all five conditioning types (none, reward, entropy, discount, all)

Sequence Diagram

sequenceDiagram
    participant Config as drive.ini
    participant Reset as c_reset()
    participant Agent as Agent Loop
    participant Step as c_step()
    participant Rewards as Reward Calculation

    Config->>Reset: Load conditioning bounds (lb/ub)
    Reset->>Reset: Sample collision_weights[i]
    Reset->>Reset: Sample offroad_weights[i]
    Reset->>Reset: Sample goal_weights[i]
    Reset->>Agent: Initialize agents with weights
    
    Agent->>Step: Execute agent action
    Step->>Rewards: Check collision state
    alt Vehicle Collision
        Rewards->>Rewards: reward = collision_weights[i]
    else Offroad
        Rewards->>Rewards: reward = offroad_weights[i]
    end
    
    Step->>Rewards: Check goal distance
    alt Goal Reached (Post Respawn)
        Rewards->>Rewards: reward += reward_goal_post_respawn * goal_weights[i]
    else Goal Reached (Generate New)
        Rewards->>Rewards: reward += goal_weights[i]
    else Goal Reached (Stop)
        Rewards->>Rewards: reward = goal_weights[i]
    end
    
    Rewards->>Agent: Return per-agent rewards

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

charliemolony · 2025-12-09T16:54:24Z

pufferlib/config/ocean/adaptive.ini

could place the policy in the co player policies folder?

fixing the reward updates

c165cbe

greptile-apps bot reviewed Nov 30, 2025

View reviewed changes

m2kulkarni added 5 commits November 30, 2025 12:05

changed defaults

f48e0ec

pre-commit

9c03b4a

always init weight arrays:

f8206ae

script plus some config changes

05417fd

pre-commit

4e3dcc4

charliemolony reviewed Dec 9, 2025

View reviewed changes

pufferlib/config/ocean/adaptive.ini

Copy link

Collaborator

charliemolony Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could place the policy in the co player policies folder?

m2kulkarni and others added 2 commits December 10, 2025 18:19

some changes [wip]

ed25a28

moved to torch

8b013a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixing the reward updates #15

fixing the reward updates #15

Uh oh!

m2kulkarni commented Nov 30, 2025

Uh oh!

greptile-apps bot commented Nov 30, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

charliemolony Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fixing the reward updates #15

Are you sure you want to change the base?

fixing the reward updates #15

Uh oh!

Conversation

m2kulkarni commented Nov 30, 2025

Uh oh!

greptile-apps bot commented Nov 30, 2025

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

charliemolony Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants