Fix: Move RNAD/NFSP agent creation outside game loop #8

gsmithline · 2026-01-16T01:49:43Z

Summary

Fixed performance bug where RNAD/NFSP agents were being re-instantiated on every game iteration
Agents are now created once before the game loop and reused

Problem

The RNaDAgentWrapper and NFSPAgentWrapper were created inside the for gi in range(games) loop in pyspiel_runner.py. This caused:

JAX/Haiku model re-initialization on every game
Repeated JIT compilation overhead
50 games × 2 agents = 100 model loads instead of 2

On GitHub Actions (AMD EPYC CPUs), JAX compilation is slow, causing runs to take hours instead of minutes.

Solution

Move agent instantiation before the game loop so models are loaded once and reused.

Test plan

Tested locally with docker compose - games progress smoothly
RNAD models now load only once at startup
Rebuild Docker image and retest on GitHub Actions

🤖 Generated with Claude Code

…ndle non-JSON responses in remote negotiator

- Fix syntax error in nfsp.py (class indentation) - Fix allocation logic in pyspiel_runner.py (correct who gets what on accept) - Fix indentation in run_entire_matrix.py - Remove dead code in pyspiel_runner.py - Add missing __init__.py files for proper package structure - Add GitHub Actions workflow for Docker image publishing to GHCR - Add meta-game framework documentation to README - Add SUBMISSION.md for competition abstract - Update .gitignore to exclude runtime data directories 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Simplifies Docker build by using pre-built wheel from PyPI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- abseil-cpp -> open_spiel/abseil-cpp - pybind11 -> pybind11 - pybind11_abseil -> open_spiel/pybind11_abseil 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add double_dummy_solver clone from jblespiau/dds repo - Add pybind11 Python binding source files - Fix PYTHONPATH to remove undefined variable warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Rewrite main README to focus on meta-game evaluation framework - Add quick start with local, Cloud Run, and platform options - Document assessment configuration and purple agent requirements - Enhance SUBMISSION.md with competition compliance details - Add evaluation metrics explanation and result format 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Correct regret formula: Regret(π) = max(0, u(π, σ*) - u(σ*)) (matches code implementation in mene_solver.py) - Clarify M[i][j] = agent i's average payoff against agent j 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Added required dependencies for MENE solver: - cvxpy>=1.4.0 - numpy>=1.26.0 - clarabel (cvxpy solver backend) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Removed the 'Interpreting Results' section from the README.

- Compute standard error = std / sqrt(n) for regrets and agent metrics - Add SE to per-agent output in run_metagame_analysis - Update SUBMISSION.md with CI format in result schema - Following paper methodology from Wiedenbeck et al. 2014 Results now include mean±SE format matching IJCAI paper Figure 1. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Bootstrap standard error is the standard deviation of the bootstrap distribution, not divided by sqrt(n). The division by sqrt(n) is for standard error of the mean with independent samples, but in bootstrap resampling the std of bootstrap estimates directly estimates the SE. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add results/ directory with baseline_evaluation.json containing metrics for 6 baseline agents (soft, tough, aspiration, walk, nfsp, rnad) - Add scenario.toml for leaderboard configuration - Results include MENE regret, welfare metrics (UW, NW, NWA), and EF1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove nested results structure that caused DuckDB binding errors - Create separate JSON file per agent with flat structure - All fields (agent_name, mene_regret, etc.) now at top level Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ated results Results should be generated by CI workflow in a dedicated leaderboard repository, not manually committed to the agent repo. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The MILP solver can return solutions that are very close to Nash equilibria but fail the strict 1e-6 regret check due to numerical precision issues. Increasing tolerance to 1e-4 provides more robustness. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Instead of failing when regret exceeds tolerance, warn and return the best solution found. This handles numerical precision issues without failing the entire evaluation pipeline. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When a remote agent named "challenger" raises RemoteNegotiatorError (e.g., by returning {"action": "WALK"} without an allocation), the fallback behavior now correctly treats it as a walk policy instead of the default balanced policy. This fixes a bug where challenger agents would inadvertently make offers and accept deals when the remote agent failed to provide a valid allocation, leading to incorrect NWA% and EF1% metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The previous fix only checked for "challenger" in _propose_allocation and _accepts, but _policy_kind("challenger") returned "balanced" because none of the conditions matched. Now _policy_kind recognizes "challenger" as a walk policy, so when the remote agent fails, the fallback uses walk behavior (always reject, zero allocation). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The analysis code in main.py looks for traces at {pair_key}.jsonl, but walk baseline traces were written to walk_baseline.jsonl only. This caused all metrics (NW, UW, MENE regret) to show 0% for walk-type agents because load_records() couldn't find the files. Fix: Create symlinks from pair-specific paths to walk_baseline.jsonl so the analysis code can find and process the walk traces correctly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove 'challenger' from walk policy detection in _policy_kind() - Challenger now goes through run_pyspiel_pair_nfsp_with_traces() for real games - This enables validation with actual game data from the reject-agent

- Check if agents are in remote_agent_urls before using walk baseline - This ensures challenger/purple agents play actual games via A2A protocol - Synthetic walk baseline only used for local walk agent vs local agents

Previously, RNaDAgentWrapper and NFSPAgentWrapper were instantiated inside the `for gi in range(games)` loop, causing model re-initialization on every game. This triggered repeated JAX/Haiku compilation, which is extremely slow on certain CPUs (e.g., GitHub Actions AMD EPYC runners). With 50 games, this meant 100 model loads instead of 2, causing runs that should take minutes to take hours. The fix moves agent creation before the loop so models are loaded once and reused across all games. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gsmithline and others added 30 commits November 29, 2025 14:25

initial commit

12492df

cleaned up

4537101

inital test working

190073b

add in rl agents and prompt circles

7a35340

add in google cli

872c866

adding files

710f6ee

adding

86a1df6

Add walk baseline synthesis and caching in bargaining environment; ha…

ab31277

…ndle non-JSON responses in remote negotiator

add docker file

311cf45

Add missing OpenSpiel source files for Docker build

dd963e9

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix GitHub Actions checkout and add debug step

c2c9634

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use pip install open-spiel instead of building from source

98f238d

Simplifies Docker build by using pre-built wheel from PyPI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix OpenSpiel build: download abseil-cpp and pybind11 dependencies

fe247fa

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix dependency paths: clone to correct subdirectories

bb5777e

- abseil-cpp -> open_spiel/abseil-cpp - pybind11 -> pybind11 - pybind11_abseil -> open_spiel/pybind11_abseil 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update uv.lock with cvxpy and numpy dependencies

1ec17a3

Added required dependencies for MENE solver: - cvxpy>=1.4.0 - numpy>=1.26.0 - clarabel (cvxpy solver backend) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove 'Interpreting Results' section from README

cbd29c0

Removed the 'Interpreting Results' section from the README.

commit extra files

c702c7d

adding fixes

924f570

fix

b3ed329

Add results README

aba8496

Remove manual results - leaderboard needs separate repo with CI-gener…

7a65666

…ated results Results should be generated by CI workflow in a dedicated leaderboard repository, not manually committed to the agent repo. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix Dockerfile: use ENTRYPOINT for compose compatibility

f6cad41

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gsmithline and others added 10 commits January 13, 2026 10:45

Make Nash equilibrium solver more robust

4c76c72

Instead of failing when regret exceeds tolerance, warn and return the best solution found. This handles numerical precision issues without failing the entire evaluation pipeline. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add links to leaderboard and example purple agent

d19398d

update readme

3c8f9bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Move RNAD/NFSP agent creation outside game loop #8

Fix: Move RNAD/NFSP agent creation outside game loop #8

Uh oh!

gsmithline commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Move RNAD/NFSP agent creation outside game loop #8

Are you sure you want to change the base?

Fix: Move RNAD/NFSP agent creation outside game loop #8

Uh oh!

Conversation

gsmithline commented Jan 16, 2026

Summary

Problem

Solution

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant