An analysis of Voyager for coding tasks

About

This repo contains experiments with the Voyager agent for coding tasks.

Reference: "Voyager: An Open-Ended Embodied Agent with Large Language Models" by Wang et. al. (2023) https://github.com/MineDojo/Voyager

The report can be found here

Installation

Setup Instructions

Clone the repository with submodules:

git clone --recursive https://github.com/nicholaschenai/voyager_coder_experiments.git
cd voyager_coder_experiments

Create and activate the conda environment using the provided environment.yml:

conda env create -f environment.yml
conda activate voyager_coder

Set up API keys as described below

Reproducing experiments

Set API keys

For OpenAI:

export OPENAI_API_KEY="YOUR_KEY_HERE"

Azure OpenAI:

export AZURE_OPENAI_API_KEY="YOUR_KEY_HERE"
export AZURE_OPENAI_ENDPOINT="ENDPOINT_URL_HERE"
export OPENAI_API_VERSION="2023-07-01-preview"
export AZURE_OPENAI_DEPLOYMENT_NAME="{'gpt-4-1106-preview': 'YOUR_DEPLOYMENT_NAME_HERE', 'gpt-4o-mini-2024-07-18': 'YOUR_DEPLOYMENT_NAME_HERE'}"

Run scripts

ReAct

GPT-4o-mini:

bash scripts/react_test_4o_mini.sh

GPT-4-1106:

bash scripts/react_test_4_1106.sh

Voyager

First run training to get the skills library:

bash scripts/voyager_train_4_1106.sh

Then proceed to use the skills library for evaluation: GPT-4-1106:

bash scripts/voyager_proc_4_1106_test_4_1106.sh

GPT-4o-mini:

bash scripts/voyager_proc_4_1106_test_4o_mini.sh

Data analysis

Plot accuracies, get error analysis:

python scripts/results_analysis.py

To check for multi-function solutions:

python scripts/find_multi_fn_solns.py

Data outputs

Experiment outputs will be saved in the following locations:

Logs: logs/
Results: results/

Example structure of a result folder (e.g. voyager_proc_4_1106_test_4_1106/):

voyager_proc_4_1106_test_4_1106/
├── args.json                  # Experiment configuration and hyperparameters
├── samples_eval_results.json # MBPP Plus evaluation results
├── samples.jsonl              # Agent output used for evaluation
├── result_dict.json            # Public test case execution result
├── ckpt/                      # Checkpoint directory containing:
│   ├── curriculum/            # QA db, completed and failed tasks
│   └── skill/                 # Learned skills in code, description and vectordb
├── test_outputs/              # logs and final output per problem

Accuracy plot saved in report/assets/accuracy_comparison.png Error analysis saved in report/assets/4o_mini_status_differences.csv and report/assets/4_1106_status_differences.csv

Project Structure

.
├── cognitive_base/      # Cognitive architecture primitives
├── agent_expt_suite/    # tools for running experiments with agents
├── voyager_coder/       # Voyager agent architecture for coding tasks
├── scripts/            # Running scripts
├── report/             # Documentation and analysis
└── final_results/      # Experiment outputs

License

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agent_expt_suite @ ecf6797		agent_expt_suite @ ecf6797
cognitive_base @ 5811363		cognitive_base @ 5811363
final_results		final_results
report		report
scripts		scripts
voyager_coder @ 985a514		voyager_coder @ 985a514
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An analysis of Voyager for coding tasks

About

Installation

Setup Instructions

Reproducing experiments

Set API keys

Run scripts

ReAct

Voyager

Data analysis

Data outputs

Project Structure

License

About

Uh oh!

Releases

Packages

Languages

nicholaschenai/voyager_coder_experiments

Folders and files

Latest commit

History

Repository files navigation

An analysis of Voyager for coding tasks

About

Installation

Setup Instructions

Reproducing experiments

Set API keys

Run scripts

ReAct

Voyager

Data analysis

Data outputs

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages