July 2022
tl;dr: Multitask multicam with improved LSS.
Describe the overall impression of the paper.
- Summaries of the key ideas
- Joint training slightly hurts the performance of each task. We observe that the location distribution of objects and maps do not have strong correlation, e.g. many cars are not in the drivable area. --> This is also observed in BEVFusion and PETRv2.
- Voxel Pooling is improved to boost efficiency and memory usage. Sinilar improvement has also been seen in BEVDepth and BEVFusion.
- Questions and notes on how to improve/revise the current work