ML Research Benchmark Baseline Agent

The ML Research Benchmark Baseline Agent is an agentic system designed to serve as a baseline for various AI and machine learning tasks. This agent provides a foundation for comparing and evaluating machine learning research and development tasks that agents can perform.

Features

Supports multiple AI/ML tasks
Compatible with different LLM providers (OpenAI, Anthropic)
Dockerized for easy deployment and reproducibility

Available Tasks

The baseline agent can perform the following tasks:

LLM Efficiency
Baby Language Model (LM)
Mini Pile
LLM Merging
Edge LLM Compression
Edge LLM Training
Math Reasoning (Autoformalization, Autoinformalization, Autotheorem Generation)

Mini versions of several tasks are also available for quick testing and development.

Please find the full list of tasks along with their prompts and descriptions here: ML-Research-Agent-Tasks

Available Tools

The AI Research Benchmark Baseline Agent comes equipped with a variety of tools to assist in different AI and machine learning tasks:

Bash Tool: Executes bash commands and scripts.
Code Tool: Manages code operations including writing, inserting, replacing, and deleting code.
GitHub Tool: Interacts with GitHub repositories to get README files, list files, and retrieve file contents.
Semantic Scholar Tool: Searches for academic papers, retrieves paper details, citations, and downloads papers.
Python Tool: Executes Python code.
Return Function Tool: Handles task completion.
Scratchpad Tool: Provides a scratchpad for experiment note-taking and temporary storage.
Thought Tool: Allows the agent to process and record thoughts.
Long-Term Memory Tool: Manages long-term memory storage and retrieval.

These tools can be used individually or in combination to tackle a wide range of AI research and benchmark tasks. The agent can seamlessly switch between tools as needed for complex operations.

Prerequisites

Python 3.x
Docker (for containerized execution)

Installation

Clone this repository:

git clone https://github.com/AlgorithmicResearchGroup/ML-Research-Agent.git
cd ML-Research-Agent

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Running without Docker

To run the agent without Docker, use the following command:

python3 run.py --task_name llm_efficiency --benchmark full_benchmark --provider openai

Running with Docker

bash run.sh <image_name> <benchmark> <provider> <gpu_ids> <task_name> <time_limit> <huggingface_token> <env_file_path>

Example:

bash run.sh ghcr.io/algorithmicresearchgroup/ml-research-agent full_benchmark \
    openai \
    0 \
    math_reasoning \
    24h \
    <huggingface_token> \
    /home/ubuntu/.env

Available Tasks

For a full list of available tasks and their corresponding Docker run commands, please refer to tasks repo here: ML-Research-Agent-Tasks

Contributing

Contributions to improve the baseline agent or add new tasks are welcome. Please submit a pull request or open an issue to discuss proposed changes.

License

AGPL-3.0

Contact

For questions or support, please contact Algorithmic Research Group at matt@algorithmicresearchgroup.com

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
agent		agent
img		img
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Research Benchmark Baseline Agent

Features

Available Tasks

Available Tools

Prerequisites

Installation

Usage

Running without Docker

Running with Docker

Available Tasks

Contributing

License

Contact

About

Releases

Languages

License

AlgorithmicResearchGroup/ML-Research-Agent

Folders and files

Latest commit

History

Repository files navigation

ML Research Benchmark Baseline Agent

Features

Available Tasks

Available Tools

Prerequisites

Installation

Usage

Running without Docker

Running with Docker

Available Tasks

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages