This repo provides the source code of our paper: MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents. [PDF][Twitter] If you discuss or use MLR-Copilot in your research, please cite us!
@misc{li2024mlrcopilotautonomousmachinelearning,
title={MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents},
author={Ruochen Li and Teerth Patel and Qingyun Wang and Xinya Du},
year={2024},
eprint={2408.14033},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2408.14033},
}MLR-Copilot is a framework where LLMs mimic researchers’ thought processes, designed to enhance the productivity of machine learning research by automating the generation and implementation of research ideas.
It begins with a research paper, autonomously generating and validating these ideas, while incorporating human feedback to help reach executable research outcomes.
MLR-Copilot operates in three integrated phases:
- Research Idea Generation: LLM-powered agents generate research hypotheses and experimental plans based on existing research papers.
- Experiment Implementation: Translates experimental plans into executable experiments using retrieved prototype code and models.
- Implementation Execution: Runs the experiments with mechanisms for human feedback and iterative debugging.

GUI Demo with Pre-defined Examples
demo_rec_compress.mov
Begin by cloning this repository.
-
Place the following in a
.envfile at the root of this project:CLAUDE_API_KEYOPENAI_API_KEY
-
Configure the Hugging Face Token as needed so that
huggingface_hub.login()works if you intend to use Llama.
- Install requirements:
pip install -r requirements.txt
- Obtain the Docker image
tortcode/nlp-coresearcher:- Build:
docker build . -t 'tortcode/nlp-coresearcher' - Or pull from Docker Hub:
docker pull 'tortcode/nlp-coresearcher'
- Build:
- Run
bash container.shto start the container.
- Place the research idea in the file
problems/<task_name>. - Run any preparation scripts as needed.
- Place all starter code in the directory
workspaces/<task_name>.
- To run the agent with a specific task and LLM (Claude, GPT-4, or Llama), execute
bash run_demo.sh <task_name> <llm_name>.- You must have access to the Meta Llama 3.1 models in Hugging Face to run Llama.
- To ignore error logging, redirect stderr to
/dev/null:bash run_demo.sh <task_name> <llm_name> 2>/dev/null.
- Full logs are under
logs/<task_name>/<start_timestamp>/agent_log/full_log.jsonl. - Other logs are under
logs/<task_name>/<start_timestamp>/env_log/.
Figure 1: The autonomous machine learning research task. We take the research paper as input and output the research idea (i.e., research hypothesis and experiment plan) with execution results.
Figure 2: Our MLR-Copilot Framework. LLM IdeaAgent (leftmost grey component) performs research idea generation, including hypothesis and experimental design (Stage 1). ExperimentAgent implements and executes the experiments.

MLR-Copilot incorporate some of the components from MLAgentBench, under the MIT License Prompt2Model, under the Apache License 2.0, where files and API calls have been modified.