This repo contains experiments with the Voyager agent for coding tasks.
Reference: "Voyager: An Open-Ended Embodied Agent with Large Language Models" by Wang et. al. (2023) https://github.com/MineDojo/Voyager
The report can be found here
- Clone the repository with submodules:
git clone --recursive https://github.com/nicholaschenai/voyager_coder_experiments.git
cd voyager_coder_experiments- Create and activate the conda environment using the provided environment.yml:
conda env create -f environment.yml
conda activate voyager_coder- Set up API keys as described below
For OpenAI:
export OPENAI_API_KEY="YOUR_KEY_HERE"Azure OpenAI:
export AZURE_OPENAI_API_KEY="YOUR_KEY_HERE"
export AZURE_OPENAI_ENDPOINT="ENDPOINT_URL_HERE"
export OPENAI_API_VERSION="2023-07-01-preview"
export AZURE_OPENAI_DEPLOYMENT_NAME="{'gpt-4-1106-preview': 'YOUR_DEPLOYMENT_NAME_HERE', 'gpt-4o-mini-2024-07-18': 'YOUR_DEPLOYMENT_NAME_HERE'}"GPT-4o-mini:
bash scripts/react_test_4o_mini.shGPT-4-1106:
bash scripts/react_test_4_1106.shFirst run training to get the skills library:
bash scripts/voyager_train_4_1106.shThen proceed to use the skills library for evaluation: GPT-4-1106:
bash scripts/voyager_proc_4_1106_test_4_1106.shGPT-4o-mini:
bash scripts/voyager_proc_4_1106_test_4o_mini.shPlot accuracies, get error analysis:
python scripts/results_analysis.pyTo check for multi-function solutions:
python scripts/find_multi_fn_solns.pyExperiment outputs will be saved in the following locations:
- Logs:
logs/ - Results:
results/
Example structure of a result folder (e.g. voyager_proc_4_1106_test_4_1106/):
voyager_proc_4_1106_test_4_1106/
├── args.json # Experiment configuration and hyperparameters
├── samples_eval_results.json # MBPP Plus evaluation results
├── samples.jsonl # Agent output used for evaluation
├── result_dict.json # Public test case execution result
├── ckpt/ # Checkpoint directory containing:
│ ├── curriculum/ # QA db, completed and failed tasks
│ └── skill/ # Learned skills in code, description and vectordb
├── test_outputs/ # logs and final output per problem
Accuracy plot saved in report/assets/accuracy_comparison.png
Error analysis saved in report/assets/4o_mini_status_differences.csv and report/assets/4_1106_status_differences.csv
.
├── cognitive_base/ # Cognitive architecture primitives
├── agent_expt_suite/ # tools for running experiments with agents
├── voyager_coder/ # Voyager agent architecture for coding tasks
├── scripts/ # Running scripts
├── report/ # Documentation and analysis
└── final_results/ # Experiment outputs
TODO