To generate captions, setup the baselines using the following commands:
git clone https://github.com/haotian-liu/LLaVA parent-folder
mv parent-folder/llava ./
rm -rf parent-folder
Please download the preprocessed weights for vicuna-13b
git clone https://github.com/Vision-CAIR/MiniGPT-4 parent-folder
mv parent-folder/minigpt4 ./
rm -rf parent-folder
Please download the preprocessed weights for Vicuna. After downloading the weights, you change the following line in minigpt4/configs/models/minigpt4.yaml
.
16: llama_model: "path-to-llama-preprocessed-weights"
Please download the minigpt4 weights here and change the link in eval_configs/minigpt4_eval.yaml
:
11: ckpt: 'path-to-prerained_minigpt4_7b-weights'
git clone https://github.com/CASIA-IVA-Lab/FastSAM parent-folder
mv parent-folder/FastSAM/fastsam ./
rm -rf parent-folder
Download the weights from here
pip3 install segment-anything
Download the sam weights from here.
To generate the base, please run the following commands:
cd data
python3 generate_base.py --data_path <path-to-nuscenes-v1.0-trainval> --save_path <path-to-save> --bev pred/gt
To generate the captions for each scene object, please run the following commands:
python3 generate_captions.py --model <captioning-model> --data_path <path-to-base-folder> --json_name pred/gt --start <start_index> --end <end-index>