One ego per scene #14

charliemolony · 2025-11-24T14:52:00Z

Enabled training one agent per world, with all other agents acting as co-players
The create_expert_overflow parameter now controls whether non-controlled agents are created
- When set to False, the ego agent trains without expert human logs, as overflow agents are not instantiated
  -Weights and Biases Link

greptile-apps · 2025-11-24T15:00:14Z

Greptile Overview

Greptile Summary

This PR implements a "one ego per scene" training mode where each world contains exactly one ego agent training alongside co-player agents, enabling multi-agent training scenarios. The implementation includes a new C function my_shared_one_ego_per_scene that creates environments with controlled agent allocation and placeholder slots.

Key Changes:

Added one_ego_per_scene mode that creates one ego + co-players per world with placeholder slots for unused agent indices
Introduced create_expert_overflow flag to control whether overflow agents beyond max_controlled_agents are created as experts or skipped entirely
Refactored configuration structure: moved from flat parameters to nested dict structure for co_player_policy with conditioning and rnn subsections
Enhanced C code in drive.h to skip non-controlled and overflow agents when create_expert_overflow=False
Added proper memory management for placeholder agent IDs in C bindings

Critical Issues Found:

AttributeError bugs in drive.py at lines 220, 420, and 447 when accessing self.co_player_condition_type without checking if the attribute exists (occurs when co_player_enabled=False)

Configuration Changes:

Increased num_agents from 1024 to 8192
Decreased minibatch_multiplier from 512 to 256
Swapped discount_weight_lb/ub bounds for co-player conditioning (0.98↔0.80)

Confidence Score: 2/5

This PR contains critical runtime bugs that will cause crashes when co_player_enabled=False
Score reflects three critical AttributeError bugs in drive.py that will crash the training when co_player_enabled=False in the config. The attribute self.co_player_condition_type is only initialized when self.co_player_conditioning is truthy (lines 141-142), but is accessed unconditionally at lines 220, 420, and 447. The C/C++ implementation appears solid with proper memory management, but the Python logic errors need to be fixed before merging.
Pay close attention to pufferlib/ocean/drive/drive.py - contains critical AttributeError bugs that must be fixed

Important Files Changed

File Analysis

Filename	Score	Overview
pufferlib/ocean/drive/drive.py	2/5	Refactored configuration handling for co-player and conditioning settings; contains critical AttributeError bugs when co_player_conditioning is None
pufferlib/ocean/drive/binding.h	4/5	Implemented my_shared_one_ego_per_scene function to create one ego per world with co-players; includes proper memory management and retry logic
pufferlib/ocean/drive/drive.h	4/5	Enhanced set_active_agents logic to support create_expert_overflow flag; properly skips non-controlled agents when flag is False
pufferlib/vector.py	4/5	Refactored co-player policy loading to use nested dict structure; fixed check from population_play to co_player_enabled

Sequence Diagram

sequenceDiagram
    participant Config as Config File
    participant Main as Training Script
    participant Vector as vector.py
    participant Drive as drive.py
    participant Binding as binding.h/c
    participant DriveH as drive.h

    Main->>Config: Load adaptive.ini
    Config-->>Main: one_ego_per_scene=True<br/>co_player_enabled=True<br/>create_expert_overflow=False
    
    Main->>Vector: make() with env_kwargs
    
    alt co_player_enabled == True
        Vector->>Vector: Load co-player policy from checkpoint
        Vector->>Vector: Wrap policy with LSTM if rnn config exists
        Vector->>Vector: Store policy in env_kwargs["co_player_policy"]["co_player_policy_func"]
    end
    
    Vector->>Drive: Initialize Drive environments
    
    Drive->>Drive: Parse co_player_policy dict
    alt co_player_conditioning exists
        Drive->>Drive: Set co_player_condition_type
    else co_player_conditioning is None
        Note over Drive: BUG: co_player_condition_type<br/>not initialized!
    end
    
    Drive->>Binding: my_shared_population_play()
    
    alt one_ego_per_scene == True
        Binding->>Binding: my_shared_one_ego_per_scene()
        loop For each ego agent
            Binding->>Binding: Select random map
            Binding->>DriveH: set_active_agents()
            
            DriveH->>DriveH: Iterate through entities
            
            alt create_expert_overflow == False
                DriveH->>DriveH: Skip non-controlled agents
                DriveH->>DriveH: Skip overflow agents beyond max_controlled_agents
            else create_expert_overflow == True
                DriveH->>DriveH: Create overflow agents as experts
            end
            
            Binding->>Binding: Assign 1 ego + N co-players per world
            Binding->>Binding: Calculate placeholder slots
        end
        Binding-->>Drive: Return agent_offsets, map_ids, ego_ids, coplayer_ids
    else one_ego_per_scene == False
        Binding->>Binding: my_shared_split_numerically()
        Binding-->>Drive: Split agents across worlds
    end
    
    Drive->>Drive: Store ego_ids, co_player_ids, place_holder_ids
    Drive->>Drive: Initialize C environments with parameters
    
    loop Training Loop
        Main->>Drive: Step environments
        
        alt co_player_condition_type != "none"
            Note over Drive: BUG: AttributeError if<br/>co_player_condition_type not defined
            Drive->>Drive: Add conditioning to co-player obs
        end
        
        Drive->>Drive: Forward ego policy
        Drive->>Drive: Forward co-player policy
        Drive-->>Main: Return observations, rewards, dones
    end

greptile-apps

Additional Comments (1)

pufferlib/ocean/drive/drive.py, line 447 (link)

logic: AttributeError when co_player_conditioning is None

self.co_player_condition_type only set when self.co_player_conditioning is truthy (line 141-142). Need to check attribute exists first.

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

pufferlib/ocean/drive/drive.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

charliemolony59@gmail.com and others added 11 commits November 20, 2025 10:19

fixing population with new config

53f474f

running pre-commit

4d65bad

running pre-commit

d457778

Making it a little prettier

b73327f

fixing the tests

cbd5da9

Fixing tests

8632ab3

fixing tests (again)

9f56d0e

lets try one more time

d00a105

lets try one more time

5f3578d

beginning implementation of 1 ego per sceneeee

16b82c2

Training with one ego per world working

284c9d8

greptile-apps bot reviewed Nov 24, 2025

View reviewed changes

pufferlib/ocean/drive/drive.py Outdated Show resolved Hide resolved

pufferlib/ocean/drive/drive.py Outdated Show resolved Hide resolved

charliemolony and others added 2 commits November 24, 2025 11:18

Update pufferlib/ocean/drive/drive.py

52f3884

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Update pufferlib/ocean/drive/drive.py

bc52eb7

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One ego per scene #14

One ego per scene #14

Uh oh!

charliemolony commented Nov 24, 2025

Uh oh!

greptile-apps bot commented Nov 24, 2025

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

One ego per scene #14

Are you sure you want to change the base?

One ego per scene #14

Uh oh!

Conversation

charliemolony commented Nov 24, 2025

Uh oh!

greptile-apps bot commented Nov 24, 2025

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot left a comment •

edited

Loading