Skip to content

Conversation

@Benjamin-eecs
Copy link
Collaborator

@Benjamin-eecs Benjamin-eecs commented Aug 31, 2025

This pull request introduces a comprehensive integration of the TAU-BENCH Retail benchmark into the GEM (Gym for LLM Agents) framework, providing a clean, GEM-native implementation for evaluating LLM agents on realistic retail customer service tasks. The changes include new documentation, data loaders, tool interfaces, and evaluation scripts, as well as the addition of mock data assets and rule definitions to support the benchmark.

Key additions and improvements:

Documentation and Usage:

  • Added a detailed README.md in both the root and tau_bench_retail directories, describing the integration, architecture, usage instructions, model support, and research experiment setup for the TAU-BENCH Retail benchmark in GEM. [1] [2]

Benchmark Assets and Data Handling:

  • Introduced a new assets/data/__init__.py module for loading mock user, order, and product data from JSON files, supporting the benchmark's environment and tools.
  • Added a readme.md in assets/data/ explaining the mock data generation process and schema design philosophy.
  • Provided a .gitignore to exclude experiment results and Python cache files from version control.

Tooling and Evaluation Environment:

  • Implemented a minimal Tool base class in assets/base_tool.py to standardize tool interfaces for the TAU-bench environment.
  • Added assets/rules.py with explicit agent behavior and evaluation rules for the retail environment, ensuring agent actions are consistent with benchmark requirements.

These changes collectively enable robust, reproducible benchmarking of LLM agents in a realistic retail scenario, supporting multiple providers and facilitating research on user instruction clarity and tool-based reasoning.

@Benjamin-eecs Benjamin-eecs marked this pull request as ready for review August 31, 2025 09:48
@N00bcak
Copy link
Contributor

N00bcak commented Sep 19, 2025

Hi there, nice work!

There are just two conceptual details that I think may improve the design of the API a bit and I hope you can consider them here.

@lkevinzc lkevinzc linked an issue Oct 11, 2025 that may be closed by this pull request
Copy link
Contributor

@lkevinzc lkevinzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

@lkevinzc lkevinzc merged commit e2f5556 into axon-rl:main Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add the TAU-bench retail Environment

3 participants