🥰 If you are interested in our work, feel free to star ⭐ or watch 👓 our repo for the latest updates🤗!!
[2024-04-03] 🔥🔥🔥 MineDreamer code is released. Let's enjoy the Imagination ability of the embodied agent!
[2024-03-19] MineDreamer is released on arxiv.
[2024-03-15] The Project page is set up at here.
The code and checkpoints are released and the open-source contents include the following:
-
✅ MineDreamer agent and Baseline Code (i.e., VPT, STEVE-1, Multi-Modal Memory)
-
✅ MineDreamer Goal Drift Dataset and MineDreamer weights, including MineDreamer-7B of Imaginator and Prompt Generator.
-
✅ MineDreamer Training Scripts, including The Imaginator training stages 2 and 3.
-
Note: For Imaginator training stage 1, we only provide pre-trained Q-Former weights. For Prompt Generator, we only provide the weights and if you want to train your own Prompt Generator, please refer to STEVE-1 to collect data and train it.
.
├── README.md
├── minedreamer
│ ├── All agent code, including baseline and MineDreamer.
├── imaginator
│ ├── All imaginator code including training and inference.
│
├── play: Scripts for running the agent for all evaluations.
│ ├── programmatic: run the inference code of Programmatic Evaluation
│ │
│ ├── chaining: run the inference code of Command-Switching Evaluation
│
├── scripts
│ ├── Scripts for training and inference of Imaginator.
│
├── download_baseline_weights.sh: download baseline weights.
│
├── download_minedreamer_weights.sh: download minedreamer and other pre-trained weights for Imaginator training.
We provide MineDreamer models for you to play with, including all three training stages checkpoints, and datasets. You can be downloaded from the following links:
model | training stage | size | HF weights🤗 | HF dataset 🤗 |
---|---|---|---|---|
Pre-trained Q-Former | 1 | 261MB | Pretrained-QFormer | |
InstructPix2Pix U-Net | 2 | 3.44GB | InstructPix2Pix-Unet | Goal-Drift-Dataset |
MineDreamer-Imaginator-7B | 3 | 17.7GB | MineDreamer-7B | Goal-Drift-Dataset |
It's worth noting that if you wish only to train or test the Imaginator, you can skip Step 1.
-
We provide two methods for installing the MineRL environment. Detailed instructions can be found in this repo. Please ensure you complete the final test, otherwise the Agent will not function correctly.
-
Download the weights (Baseline weights + Prompt Generator weights):
sh download_baseline_weights.sh
-
Run Baseline. If you use cluster like slurm, replace
sudo
withsrun -p <your virtual partition> --gres=gpu:1
.# If you use the Normal Installation Procedure to install MineRL Env and the server is headful sh play/programmatic/steve1_play_w_text_prompt.sh mine_block_wood # If you use the Normal Installation Procedure to install MineRL Env and the server is headless sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood # If you use the container to install MineRL Env sudo apptainer exec -w --nv --bind /path/to/MineDreamer:/path/to/MineDreamer vgl-env sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood # If you use the container to install MineRL Env and run by GPU rendering sudo apptainer exec -w --nv --bind /path/to/MineDreamer:/path/to/MineDreamer vgl-env bash setupvgl.sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood
Then, you will see in
data/play
the intermediate processes and the final video of Agent acting according to the instructions.
This codebase has strict environmental requirements; we recommend you follow the tutorial below step by step.
- We recommend running on Linux using a conda environment, with python 3.9:
conda create -n imaginator python=3.9
. - Install pytorch for cuda-118:
pip install --pre torch==2.2.0.dev20231010+cu118 torchvision==0.17.0.dev20231010+cu118 torchaudio==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118
- Note: The version of the torch may change over time. If you encounter an error that means the following version does not exist, please change the right version by using the error information.
- Install additional packages:
pip install -r requirements.txt
- Install DeepSpeed:
DS_BUILD_AIO=1 DS_BUILD_FUSED_LAMB=1 pip install deepspeed
- Note: This step often fails due to the requirement of specific versions of CUDA and GCC. It is expected that
cuda118
andgcc-7.5.0
are used. To ensure error-free script execution in the future, the commands to activate these versions should be added to the~/.bashrc
file. Below is a reference for the content to be included in the~/.bashrc
:Upon installation, you can enter... export LD_LIBRARY_PATH=/mnt/petrelfs/share/gcc/gcc-7.5.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export PATH=/mnt/petrelfs/share/gcc/gcc-7.5.0/bin:$PATH export PATH=/mnt/petrelfs/share/cuda-11.8/bin:$PATH export LD_LIBRARY_PATH=/mnt/petrelfs/share/cuda-11.8/lib64:$LD_LIBRARY_PAT ...
ds_report
. If the output appears as shown below, it indicates the installation is correct:fused_adam ............. [YES] ...... [OKAY]
- Note: This step often fails due to the requirement of specific versions of CUDA and GCC. It is expected that
- Download the weights (Imaginator weights + pre-trained weights for training):
sh download_minedreamer_weights.sh
and remove the original LoRA parameters from Huggingface's LLaVA with:bash scripts/pre_llava.sh
. - Try inferencing the Imaginator and (InstructPix2Pix). You can find generated images in
inference_valid_*
folder.# InstructPix2Pix bash scripts/inference_IP2P.sh # Imaginator bash scripts/inference_MineDreamer.sh
- To run the MineDreamer agent, first you need to launch the backend service of Imaginator.
At this point, you'll receive a backend IP address similar to
# InstructPix2Pix bash scripts/minedreamer_backend_IP2P.sh # Imaginator bash scripts/minedreamer_backend_MLLMSD.sh
Running on http://10.140.1.104:25547 (Press CTRL+C to quit)
. Then, you should insert this IP address into thedreamer_url
field within theminedreamer/play/config/programmatic/mine_block_wood.yaml
file, similar to:dreamer_url: http://10.140.1.104:25547/
- Run the MineDreamer Agent. The process is consistent with running the baseline in Step 1, but this time you should execute the
*_dreamer_play_w_text_prompt.sh
script.
- First, download the Goal Drift Dataset and place it in the
data/mllm_diffusion_dataset
directory and unzip it. - To train the Unet parameters of InstructPix2Pix, execute:
bash scripts/train_InstructPix2Pix_minecraft.sh
. This checkpoint can also be used as baseline. - Train Imaginator-7B by running:
bash scripts/train_MineDreamer.sh
.
More demo videos and Imagination visual results are on our project webpage.
A generalist embodied agent should have a high-level planner capable of perception and planning in an open world, as well as a low-level controller able to act in complex environments. The MineDreamer agent can steadily follow short-horizon text instructions, making it suitable as a low-level controller for generating control signals. For high-level planner, including perception and task planning in an open world, one can look to the methods presented in CVPR2024's MP5, whose code is also released! It is adept at planning for tasks that require long-horizon sequencing and extensive environmental awareness. Therefore, combining MP5 with MineDreamer presents a promising approach to developing more generalist embodied agents.
This repository is built upon the codebase of LLaVA, STEVE-1 and SmartEdit.
If you find MineDreamer and MP5 useful for your research and applications, please cite using this BibTeX:
@article{zhou2024minedreamer,
title={MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control},
author={Zhou, Enshen and Qin, Yiran and Yin, Zhenfei and Huang, Yuzhou and Zhang, Ruimao and Sheng, Lu and Qiao, Yu and Shao, Jing},
journal={arXiv preprint arXiv:2403.12037},
year={2024}
}
@inproceedings{qin2024mp5,
title={MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception},
author={Qin, Yiran and Zhou, Enshen and Liu, Qichang and Yin, Zhenfei and Sheng, Lu and Zhang, Ruimao and Qiao, Yu and Shao, Jing},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={16307--16316},
year={2024}
}