This is a fork of BEVFusion that can be trained and evaluated on data from the CARLA Simulator generated by SimBEV.
BEVFusion requires the following libraries:
- Python >= 3.8, <3.9
- OpenMPI = 4.0.3 and mpi4py = 3.0.3 (needed for torchpack)
- Pillow = 8.4.0 (see here)
- PyTorch >= 1.9, <= 1.10.2
- tqdm
- torchpack
- mmcv = 1.4.0
- mmdetection = 2.20.0
- nuscenes-dev-kit
After installing these dependencies, run
python setup.py developto install the codebase.
- Install Docker on your system.
- Install the Nvidia Container Toolkit. It exposes your Nvidia graphics card to Docker containers.
- Install the Nvidia Container Runtime and set it as the default runtime.
- In the
dockerfolder, run
docker build --no-cache --rm -t bevfusion:develop .You may need to replace libnvidia-gl-550 and libnvidia-common-550 packages in the Dockerfile with ones that are compatible with your Nvidia driver version.
The following build arguments (ARG) are available:
USER: username inside each container, set tobfby default.
Launch a container by running
docker run --privileged --gpus all --network=host -e DISPLAY=$DISPLAY
-v [path/to/BEVFusion]:/home/bevfusion
-v [path/to/dataset]:/dataset
--shm-size 32g -it bevfusion:develop /bin/bashThen, in /home/bevfusion, run
python setup.py developto install the codebase.
For camera-only 3D object detection, run
torchpack dist-run -np 8 python tools/train.py configs/simbev/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pthFor camera-only BEV segmentation, run
torchpack dist-run -np 8 python tools/train.py configs/simbev/seg/camera-bev256d2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pthFor lidar-only 3D object detection, run
torchpack dist-run -np 8 python tools/train.py configs/simbev/det/transfusion/secfpn/lidar/voxelnet_0p075.yamlFor lidar-only BEV segmentation, run
torchpack dist-run -np 8 python tools/train.py configs/simbev/seg/lidar-centerpoint-bev128.yamlFor BEVFusion 3D object detection, run
torchpack dist-run -np 8 python tools/train.py configs/simbev/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/simbev-lidar-only-det.pth For BEVFusion BEV segmentation, run
torchpack dist-run -np 8 python tools/train.py configs/simbev/seg/fusion-bev256d2-lss.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pthReplace 8 in -np 8 with the number of your GPUs. You can change samples_per_gpu and workers_per_gpu values in configs/simbev/default.yaml based on your available GPU memory and number of CPU cores.
If you want to evaluate on/visualize the test set instead of the val set, change data.test.ann_file in configs/simbev/default.yaml to ${dataset_root + "infos/simbev_infos_test.json"}.
For evaluation, run
torchpack dist-run -np [number of gpus] python tools/test.py [config file path] [checkpoint name] --eval [evaluation type]For example, to evaluate the camera-only 3D object detection model, run
torchpack dist-run -np 8 python tools/test.py configs/simbev/det/centerhead/lssfpn/camera/256x704/swint/default.yaml pretrained/simbev-camera-only-det.pth --eval bboxOr, to evaluate the BEVFusion BEV segmentation model, run
torchpack dist-run -np 8 python tools/test.py configs/simbev/seg/fusion-bev256d2-lss.yaml pretrained/simbev-bevfusion-seg.pth --eval mapFor visualization, run
torchpack dist-run -np 8 python tools/visualize.py [config file path] --mode [mode] --checkpoint [checkpoint name] --split [data split] --out-dir [output directory path]mode can be gt-simbev (to visualize the ground truth) or pred-simbev (to visualize model predictions). For example, to visualize the lidar-only 3D object detection model predictions, run
torchpack dist-run -np 8 python tools/visualize.py configs/simbev/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml --mode pred-simbev --checkpoint pretrained/simbev-lidar-only-det.pth --split test --bbox-score 0.1 --out-dir 'viz/lidar-only-det'Or, to visualize the BEVFusion BEV segmentation model predictions, run
torchpack dist-run -np 8 python tools/visualize.py configs/simbev/seg/fusion-bev256d2-lss.yaml --mode pred-simbev --checkpoint pretrained/simbev-bevfusion-seg.pth --split test --map-score 0.5 --out-dir 'viz/bevfusion-seg'| Class | AP (%) | ATE (m) | AOE (rad) | ASE | AVE (m/s) |
|---|---|---|---|---|---|
| Car | 23.3 | 0.824 | 0.896 | 0.217 | 4.95 |
| Truck | 20.4 | 0.751 | 0.695 | 0.148 | 5.55 |
| Bus | 18.7 | 0.829 | 1.185 | 0.022 | 5.54 |
| Motorcycle | 26.5 | 0.604 | 0.841 | 0.140 | 6.64 |
| Bicycle | 25.1 | 0.574 | 1.117 | 0.219 | 4.12 |
| Pedestrian | 18.9 | 0.883 | 1.529 | 0.073 | 1.10 |
| mean | 22.1 | 0.744 | 1.044 | 0.137 | 4.65 |
SDS: 25.1% / Checkpoint
| Class | AP (%) | ATE (m) | AOE (rad) | ASE | AVE (m/s) |
|---|---|---|---|---|---|
| Car | 46.1 | 0.165 | 0.109 | 0.127 | 1.44 |
| Truck | 46.3 | 0.162 | 0.045 | 0.110 | 1.75 |
| Bus | 34.1 | 0.169 | 0.049 | 0.072 | 2.40 |
| Motorcycle | 51.9 | 0.114 | 0.118 | 0.159 | 1.79 |
| Bicycle | 55.5 | 0.115 | 0.087 | 0.213 | 1.53 |
| Pedestrian | 54.5 | 0.141 | 0.392 | 0.120 | 0.47 |
| mean | 48.1 | 0.144 | 0.133 | 0.134 | 1.56 |
SDS: 56.4% / Checkpoint
| Class | AP (%) | ATE (m) | AOE (rad) | ASE | AVE (m/s) |
|---|---|---|---|---|---|
| Car | 46.5 | 0.162 | 0.106 | 0.125 | 1.40 |
| Truck | 46.2 | 0.168 | 0.049 | 0.106 | 1.74 |
| Bus | 34.3 | 0.176 | 0.040 | 0.063 | 2.44 |
| Motorcycle | 51.6 | 0.113 | 0.105 | 0.153 | 1.65 |
| Bicycle | 55.3 | 0.115 | 0.073 | 0.207 | 1.52 |
| Pedestrian | 54.8 | 0.141 | 0.362 | 0.109 | 0.47 |
| mean | 48.1 | 0.146 | 0.122 | 0.127 | 1.54 |
SDS: 56.6% / Checkpoint
Results are provided for different IoU thresholds.
| Class | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
|---|---|---|---|---|---|---|---|---|---|
| road | 59.5 | 67.1 | 71.5 | 74.5 | 76.0 | 75.2 | 72.6 | 68.9 | 62.3 |
| car | 3.5 | 8.0 | 18.8 | 22.4 | 17.2 | 11.3 | 9.7 | 8.7 | 6.3 |
| truck | 2.1 | 6.7 | 11.7 | 9.8 | 5.1 | 2.1 | 0.4 | 0.0 | 0.0 |
| bus | 2.1 | 9.0 | 19.9 | 24.6 | 22.9 | 16.8 | 10.3 | 6.0 | 1.1 |
| motorcycle | 0.3 | 0.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| bicycle | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| rider | 0.3 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| pedestrian | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| mIoU | 8.5 | 11.4 | 15.2 | 16.4 | 15.2 | 13.2 | 11.6 | 10.5 | 8.7 |
| IoU | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
|---|---|---|---|---|---|---|---|---|---|
| road | 48.6 | 55.9 | 66.1 | 85.1 | 87.7 | 87.2 | 84.9 | 81.1 | 74.6 |
| car | 5.3 | 37.8 | 56.5 | 67.1 | 70.6 | 63.6 | 51.8 | 37.6 | 18.8 |
| truck | 11.4 | 44.7 | 61.2 | 70.6 | 73.5 | 67.4 | 55.2 | 39.5 | 16.3 |
| bus | 19.1 | 56.9 | 72.0 | 79.7 | 81.5 | 78.1 | 69.7 | 59.4 | 44.1 |
| motorcycle | 4.8 | 13.7 | 23.6 | 32.7 | 32.5 | 15.8 | 0.8 | 0.0 | 0.0 |
| bicycle | 1.8 | 5.1 | 10.0 | 13.3 | 3.6 | 0.0 | 0.0 | 0.0 | 0.0 |
| rider | 4.6 | 11.7 | 20.8 | 30.4 | 18.4 | 0.5 | 0.0 | 0.0 | 0.0 |
| pedestrian | 3.1 | 9.6 | 17.6 | 28.4 | 18.9 | 0.1 | 0.0 | 0.0 | 0.0 |
| mIoU | 12.4 | 29.4 | 41.0 | 50.9 | 48.3 | 39.1 | 32.8 | 27.2 | 19.2 |
| IoU | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
|---|---|---|---|---|---|---|---|---|---|
| road | 59.7 | 72.0 | 80.0 | 85.5 | 88.4 | 88.1 | 85.9 | 82.4 | 76.3 |
| car | 11.7 | 39.4 | 58.6 | 69.4 | 72.7 | 65.5 | 54.0 | 40.1 | 20.5 |
| truck | 12.3 | 47.0 | 61.4 | 70.9 | 74.5 | 69.2 | 57.6 | 43.2 | 20.6 |
| bus | 19.7 | 56.8 | 70.3 | 78.2 | 80.0 | 77.2 | 68.9 | 59.0 | 44.1 |
| motorcycle | 5.1 | 13.5 | 23.6 | 34.6 | 36.3 | 18.3 | 1.5 | 0.0 | 0.0 |
| bicycle | 1.9 | 5.6 | 11.1 | 12.7 | 3.6 | 0.0 | 0.0 | 0.0 | 0.0 |
| rider | 4.8 | 11.9 | 21.0 | 31.0 | 23.3 | 1.2 | 0.0 | 0.0 | 0.0 |
| pedestrian | 3.1 | 9.9 | 18.3 | 28.7 | 20.2 | 0.2 | 0.0 | 0.0 | 0.0 |
| mIoU | 14.8 | 32.0 | 43.0 | 51.3 | 50.0 | 40.0 | 33.5 | 28.1 | 20.2 |
