feat: multi-agent environment integration #79

Benjamin-eecs · 2025-08-31T09:48:18Z

This pull request introduces a comprehensive integration of the TAU-BENCH Retail benchmark into the GEM (Gym for LLM Agents) framework, providing a clean, GEM-native implementation for evaluating LLM agents on realistic retail customer service tasks. The changes include new documentation, data loaders, tool interfaces, and evaluation scripts, as well as the addition of mock data assets and rule definitions to support the benchmark.

Key additions and improvements:

Documentation and Usage:

Added a detailed README.md in both the root and tau_bench_retail directories, describing the integration, architecture, usage instructions, model support, and research experiment setup for the TAU-BENCH Retail benchmark in GEM. [1] [2]

Benchmark Assets and Data Handling:

Introduced a new assets/data/__init__.py module for loading mock user, order, and product data from JSON files, supporting the benchmark's environment and tools.
Added a readme.md in assets/data/ explaining the mock data generation process and schema design philosophy.
Provided a .gitignore to exclude experiment results and Python cache files from version control.

Tooling and Evaluation Environment:

Implemented a minimal Tool base class in assets/base_tool.py to standardize tool interfaces for the TAU-bench environment.
Added assets/rules.py with explicit agent behavior and evaluation rules for the retail environment, ensuring agent actions are consistent with benchmark requirements.

These changes collectively enable robust, reproducible benchmarking of LLM agents in a realistic retail scenario, supporting multiple providers and facilitating research on user instruction clarity and tool-based reasoning.

examples/multiagent/collaboration.py

gem/multiagent/parallel_env.py

N00bcak · 2025-09-19T10:01:46Z

Hi there, nice work!

There are just two conceptual details that I think may improve the design of the API a bit and I hope you can consider them here.

…e/multi_agent

lkevinzc

LGTM, great work!

Benjamin-eecs added 12 commits August 31, 2025 08:16

feat(multi-agent): add multi-agent design docs

58b2019

feat(multi-agent): init multi-agent env

2c10414

feat(multi-agent): add multi-agent example

ec9342e

feat(multi-agent): add multi-agent env testing code

986273e

feat(multi-agent): add multi-agent env testing code

f8e8c58

feat(multi-agent): init multi-agent env

32ec41e

feat(multi-agent): add multi-agent env example

228ba43

feat(multi-agent): init multi-agent env

78644e6

feat(multi-agent): add multi-agent env example

07d31a1

feat(multi-agent): add multi-agent env testing code

c8d670d

feat(multi-agent): add multi-agent env docs

8890677

chore: clean code base

139b862

Benjamin-eecs marked this pull request as ready for review August 31, 2025 09:48

Benjamin-eecs added 6 commits August 31, 2025 10:00

chore: clean code base

d2ab9ff

chore: clean code base

cafb7d9

chore: add license

5f909ee

chore: add license

d34c74d

docs: update design docs

66c5f3a

docs: update README

2ec9c7d

lkevinzc reviewed Sep 4, 2025

View reviewed changes

examples/multiagent/collaboration.py Outdated Show resolved Hide resolved

gem/multiagent/parallel_env.py Outdated Show resolved Hide resolved

Benjamin-eecs added 2 commits September 4, 2025 19:08

fix: gem multi-agent env

4234f6c

fix: gem multi-agent example

0ee8ef8

Benjamin-eecs requested a review from lkevinzc September 4, 2025 19:36

Benjamin-eecs added 6 commits September 10, 2025 02:19

fix: update tests

fa7d4fe

refactor: unified multi-agent env api design

e3948f1

refactor: unified multi-agent env api design

1cc46b8

Merge remote-tracking branch 'upstream/main' into feature/multi_agent

8bad403

fix: update tests

4e52e70

refactor: update multi-agent env example

f018879

Benjamin-eecs added 9 commits September 29, 2025 07:11

feat: multi-agent env code

017d0b9

Merge remote-tracking branch 'upstream/main' into feature/multi_agent

a9e081e

feat: multi-agent env code

d36f2f1

feat: multi-agent env code

7969822

merge: resolve conflicts

b3f2de2

feat: multi-agent env example

7e14108

merge: resolve conflicts

d32732c

chore: make format

194a473

Merge branch 'axon-rl:main' into feature/multi_agent

df03390

lkevinzc linked an issue Oct 11, 2025 that may be closed by this pull request

feat: Add the TAU-bench retail Environment #106

Closed

Benjamin-eecs added 3 commits October 12, 2025 20:29

Merge remote-tracking branch 'upstream/main' into feature/multi_agent

3b0324e

Merge remote-tracking branch 'origin/feature/multi_agent' into featur…

29e9157

…e/multi_agent

feat: multi-agent env code

501c415

Benjamin-eecs requested a review from cameron-chen October 12, 2025 20:49

lkevinzc approved these changes Oct 13, 2025

View reviewed changes

lkevinzc merged commit e2f5556 into axon-rl:main Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multi-agent environment integration #79

feat: multi-agent environment integration #79

Uh oh!

Benjamin-eecs commented Aug 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

N00bcak commented Sep 19, 2025

Uh oh!

lkevinzc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: multi-agent environment integration #79

feat: multi-agent environment integration #79

Uh oh!

Conversation

Benjamin-eecs commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

N00bcak commented Sep 19, 2025

Uh oh!

lkevinzc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Benjamin-eecs commented Aug 31, 2025 •

edited

Loading