Skip to content

Conversation

@charliemolony
Copy link
Collaborator

  • Enabled training one agent per world, with all other agents acting as co-players
  • The create_expert_overflow parameter now controls whether non-controlled agents are created
    • When set to False, the ego agent trains without expert human logs, as overflow agents are not instantiated
      -Weights and Biases Link

@greptile-apps
Copy link

greptile-apps bot commented Nov 24, 2025

Greptile Overview

Greptile Summary

This PR implements a "one ego per scene" training mode where each world contains exactly one ego agent training alongside co-player agents, enabling multi-agent training scenarios. The implementation includes a new C function my_shared_one_ego_per_scene that creates environments with controlled agent allocation and placeholder slots.

Key Changes:

  • Added one_ego_per_scene mode that creates one ego + co-players per world with placeholder slots for unused agent indices
  • Introduced create_expert_overflow flag to control whether overflow agents beyond max_controlled_agents are created as experts or skipped entirely
  • Refactored configuration structure: moved from flat parameters to nested dict structure for co_player_policy with conditioning and rnn subsections
  • Enhanced C code in drive.h to skip non-controlled and overflow agents when create_expert_overflow=False
  • Added proper memory management for placeholder agent IDs in C bindings

Critical Issues Found:

  • AttributeError bugs in drive.py at lines 220, 420, and 447 when accessing self.co_player_condition_type without checking if the attribute exists (occurs when co_player_enabled=False)

Configuration Changes:

  • Increased num_agents from 1024 to 8192
  • Decreased minibatch_multiplier from 512 to 256
  • Swapped discount_weight_lb/ub bounds for co-player conditioning (0.98↔0.80)

Confidence Score: 2/5

  • This PR contains critical runtime bugs that will cause crashes when co_player_enabled=False
  • Score reflects three critical AttributeError bugs in drive.py that will crash the training when co_player_enabled=False in the config. The attribute self.co_player_condition_type is only initialized when self.co_player_conditioning is truthy (lines 141-142), but is accessed unconditionally at lines 220, 420, and 447. The C/C++ implementation appears solid with proper memory management, but the Python logic errors need to be fixed before merging.
  • Pay close attention to pufferlib/ocean/drive/drive.py - contains critical AttributeError bugs that must be fixed

Important Files Changed

File Analysis

Filename Score Overview
pufferlib/ocean/drive/drive.py 2/5 Refactored configuration handling for co-player and conditioning settings; contains critical AttributeError bugs when co_player_conditioning is None
pufferlib/ocean/drive/binding.h 4/5 Implemented my_shared_one_ego_per_scene function to create one ego per world with co-players; includes proper memory management and retry logic
pufferlib/ocean/drive/drive.h 4/5 Enhanced set_active_agents logic to support create_expert_overflow flag; properly skips non-controlled agents when flag is False
pufferlib/vector.py 4/5 Refactored co-player policy loading to use nested dict structure; fixed check from population_play to co_player_enabled

Sequence Diagram

sequenceDiagram
    participant Config as Config File
    participant Main as Training Script
    participant Vector as vector.py
    participant Drive as drive.py
    participant Binding as binding.h/c
    participant DriveH as drive.h

    Main->>Config: Load adaptive.ini
    Config-->>Main: one_ego_per_scene=True<br/>co_player_enabled=True<br/>create_expert_overflow=False
    
    Main->>Vector: make() with env_kwargs
    
    alt co_player_enabled == True
        Vector->>Vector: Load co-player policy from checkpoint
        Vector->>Vector: Wrap policy with LSTM if rnn config exists
        Vector->>Vector: Store policy in env_kwargs["co_player_policy"]["co_player_policy_func"]
    end
    
    Vector->>Drive: Initialize Drive environments
    
    Drive->>Drive: Parse co_player_policy dict
    alt co_player_conditioning exists
        Drive->>Drive: Set co_player_condition_type
    else co_player_conditioning is None
        Note over Drive: BUG: co_player_condition_type<br/>not initialized!
    end
    
    Drive->>Binding: my_shared_population_play()
    
    alt one_ego_per_scene == True
        Binding->>Binding: my_shared_one_ego_per_scene()
        loop For each ego agent
            Binding->>Binding: Select random map
            Binding->>DriveH: set_active_agents()
            
            DriveH->>DriveH: Iterate through entities
            
            alt create_expert_overflow == False
                DriveH->>DriveH: Skip non-controlled agents
                DriveH->>DriveH: Skip overflow agents beyond max_controlled_agents
            else create_expert_overflow == True
                DriveH->>DriveH: Create overflow agents as experts
            end
            
            Binding->>Binding: Assign 1 ego + N co-players per world
            Binding->>Binding: Calculate placeholder slots
        end
        Binding-->>Drive: Return agent_offsets, map_ids, ego_ids, coplayer_ids
    else one_ego_per_scene == False
        Binding->>Binding: my_shared_split_numerically()
        Binding-->>Drive: Split agents across worlds
    end
    
    Drive->>Drive: Store ego_ids, co_player_ids, place_holder_ids
    Drive->>Drive: Initialize C environments with parameters
    
    loop Training Loop
        Main->>Drive: Step environments
        
        alt co_player_condition_type != "none"
            Note over Drive: BUG: AttributeError if<br/>co_player_condition_type not defined
            Drive->>Drive: Add conditioning to co-player obs
        end
        
        Drive->>Drive: Forward ego policy
        Drive->>Drive: Forward co-player policy
        Drive-->>Main: Return observations, rewards, dones
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. pufferlib/ocean/drive/drive.py, line 447 (link)

    logic: AttributeError when co_player_conditioning is None

    self.co_player_condition_type only set when self.co_player_conditioning is truthy (line 141-142). Need to check attribute exists first.

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

charliemolony and others added 2 commits November 24, 2025 11:18
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants