Skip to content

Latest commit

 

History

History
206 lines (150 loc) · 11.8 KB

README.md

File metadata and controls

206 lines (150 loc) · 11.8 KB

LogoMineDreamer : Learning to Follow Instructions via
Chain-of-Imagination for Simulated-World Control

🥰 If you are interested in our work, feel free to star ⭐ or watch 👓 our repo for the latest updates🤗!!

arXiv  project page

huggingface weights  huggingface weights 

🔥 Updates

[2024-04-03] 🔥🔥🔥 MineDreamer code is released. Let's enjoy the Imagination ability of the embodied agent!

[2024-03-19] MineDreamer is released on arxiv.

[2024-03-15] The Project page is set up at here.

😋 Try MineDreamer

The code and checkpoints are released and the open-source contents include the following:

  • MineDreamer agent and Baseline Code (i.e., VPT, STEVE-1, Multi-Modal Memory)

  • MineDreamer Goal Drift Dataset and MineDreamer weights, including MineDreamer-7B of Imaginator and Prompt Generator.

  • MineDreamer Training Scripts, including The Imaginator training stages 2 and 3.

  • Note: For Imaginator training stage 1, we only provide pre-trained Q-Former weights. For Prompt Generator, we only provide the weights and if you want to train your own Prompt Generator, please refer to STEVE-1 to collect data and train it.

Directory Structure:

.
├── README.md
├── minedreamer
│   ├── All agent code, including baseline and MineDreamer.
├── imaginator
│   ├── All imaginator code including training and inference.
│ 
├── play: Scripts for running the agent for all evaluations.
│   ├── programmatic: run the inference code of Programmatic Evaluation
│   │
│   ├── chaining: run the inference code of Command-Switching Evaluation
│ 
├── scripts
│   ├── Scripts for training and inference of Imaginator.
│  
├── download_baseline_weights.sh: download baseline weights.
│  
├── download_minedreamer_weights.sh: download minedreamer and other pre-trained weights for Imaginator training.

Model Zoo and Dataset

We provide MineDreamer models for you to play with, including all three training stages checkpoints, and datasets. You can be downloaded from the following links:

model training stage size HF weights🤗 HF dataset 🤗
Pre-trained Q-Former 1 261MB Pretrained-QFormer
InstructPix2Pix U-Net 2 3.44GB InstructPix2Pix-Unet Goal-Drift-Dataset
MineDreamer-Imaginator-7B 3 17.7GB MineDreamer-7B Goal-Drift-Dataset

Step 1: Install MineRL Env and Run Baseline

It's worth noting that if you wish only to train or test the Imaginator, you can skip Step 1.

  1. We provide two methods for installing the MineRL environment. Detailed instructions can be found in this repo. Please ensure you complete the final test, otherwise the Agent will not function correctly.

  2. Download the weights (Baseline weights + Prompt Generator weights): sh download_baseline_weights.sh

  3. Run Baseline. If you use cluster like slurm, replace sudo with srun -p <your virtual partition> --gres=gpu:1.

    # If you use the Normal Installation Procedure to install MineRL Env and the server is headful
    sh play/programmatic/steve1_play_w_text_prompt.sh mine_block_wood
    
    # If you use the Normal Installation Procedure to install MineRL Env and the server is headless
    sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood
    
    # If you use the container to install MineRL Env
    sudo apptainer exec -w --nv --bind /path/to/MineDreamer:/path/to/MineDreamer vgl-env sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood
    
    # If you use the container to install MineRL Env and run by GPU rendering
    sudo apptainer exec -w --nv --bind /path/to/MineDreamer:/path/to/MineDreamer vgl-env bash setupvgl.sh play/programmatic/XVFB_steve1_play_w_text_prompt.sh mine_block_wood

    Then, you will see in data/play the intermediate processes and the final video of Agent acting according to the instructions.

Step 2: Install Imaginator Env and Run MineDreamer Agent

This codebase has strict environmental requirements; we recommend you follow the tutorial below step by step.

  1. We recommend running on Linux using a conda environment, with python 3.9: conda create -n imaginator python=3.9.
  2. Install pytorch for cuda-118:
    pip install --pre torch==2.2.0.dev20231010+cu118 torchvision==0.17.0.dev20231010+cu118 torchaudio==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118
    
    • Note: The version of the torch may change over time. If you encounter an error that means the following version does not exist, please change the right version by using the error information.
  3. Install additional packages: pip install -r requirements.txt
  4. Install DeepSpeed: DS_BUILD_AIO=1 DS_BUILD_FUSED_LAMB=1 pip install deepspeed
    • Note: This step often fails due to the requirement of specific versions of CUDA and GCC. It is expected that cuda118 and gcc-7.5.0 are used. To ensure error-free script execution in the future, the commands to activate these versions should be added to the ~/.bashrc file. Below is a reference for the content to be included in the ~/.bashrc:
      ...
      export LD_LIBRARY_PATH=/mnt/petrelfs/share/gcc/gcc-7.5.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
      export PATH=/mnt/petrelfs/share/gcc/gcc-7.5.0/bin:$PATH
      
      export PATH=/mnt/petrelfs/share/cuda-11.8/bin:$PATH
      export LD_LIBRARY_PATH=/mnt/petrelfs/share/cuda-11.8/lib64:$LD_LIBRARY_PAT
      ...
      
      Upon installation, you can enter ds_report. If the output appears as shown below, it indicates the installation is correct:
      fused_adam ............. [YES] ...... [OKAY]
      
  5. Download the weights (Imaginator weights + pre-trained weights for training): sh download_minedreamer_weights.sh and remove the original LoRA parameters from Huggingface's LLaVA with: bash scripts/pre_llava.sh.
  6. Try inferencing the Imaginator and (InstructPix2Pix). You can find generated images in inference_valid_* folder.
    # InstructPix2Pix 
    bash scripts/inference_IP2P.sh
    
    # Imaginator
    bash scripts/inference_MineDreamer.sh
  7. To run the MineDreamer agent, first you need to launch the backend service of Imaginator.
    # InstructPix2Pix 
    bash scripts/minedreamer_backend_IP2P.sh
    
    # Imaginator
    bash scripts/minedreamer_backend_MLLMSD.sh
    
    At this point, you'll receive a backend IP address similar to Running on http://10.140.1.104:25547 (Press CTRL+C to quit). Then, you should insert this IP address into the dreamer_url field within the minedreamer/play/config/programmatic/mine_block_wood.yaml file, similar to:
    dreamer_url: http://10.140.1.104:25547/
    
  8. Run the MineDreamer Agent. The process is consistent with running the baseline in Step 1, but this time you should execute the *_dreamer_play_w_text_prompt.sh script.

Step 3: Train your own Imaginator

  1. First, download the Goal Drift Dataset and place it in the data/mllm_diffusion_dataset directory and unzip it.
  2. To train the Unet parameters of InstructPix2Pix, execute: bash scripts/train_InstructPix2Pix_minecraft.sh. This checkpoint can also be used as baseline.
  3. Train Imaginator-7B by running: bash scripts/train_MineDreamer.sh.

🕶️Overview

The Overview of Chain-of-Imagination within MineDreamer

Logo

The Overview Framework of Imaginator within MineDreamer

Logo

📹 Demo video and Imagination Visual Results

More demo videos and Imagination visual results are on our project webpage.

Imagination Visual Results on Evaluation Set Compared to the Baseline

Logo

Imagination Visual Results During Agent Solving Open-ended Tasks

Logo Logo

Building a more generalist embodied agent

A generalist embodied agent should have a high-level planner capable of perception and planning in an open world, as well as a low-level controller able to act in complex environments. The MineDreamer agent can steadily follow short-horizon text instructions, making it suitable as a low-level controller for generating control signals. For high-level planner, including perception and task planning in an open world, one can look to the methods presented in CVPR2024's MP5, whose code is also released! It is adept at planning for tasks that require long-horizon sequencing and extensive environmental awareness. Therefore, combining MP5 with MineDreamer presents a promising approach to developing more generalist embodied agents.

Acknowledgment

This repository is built upon the codebase of LLaVA, STEVE-1 and SmartEdit.

📑 Citation

If you find MineDreamer and MP5 useful for your research and applications, please cite using this BibTeX:

@article{zhou2024minedreamer,
  title={MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control},
  author={Zhou, Enshen and Qin, Yiran and Yin, Zhenfei and Huang, Yuzhou and Zhang, Ruimao and Sheng, Lu and Qiao, Yu and Shao, Jing},
  journal={arXiv preprint arXiv:2403.12037},
  year={2024}
}

@inproceedings{qin2024mp5,
  title={MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception},
  author={Qin, Yiran and Zhou, Enshen and Liu, Qichang and Yin, Zhenfei and Sheng, Lu and Zhang, Ruimao and Qiao, Yu and Shao, Jing},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16307--16316},
  year={2024}
}