MVRA: Multi-View Reprojection Architecture for Orientation Estimation

November 2019

tl;dr: Build the 2D/3D constraints optimization into neural network and use iterative method to refine cropped cases.

Overall impression

This paper is heavily based on deep3Dbox and adds a few improvement to handle corner cases.

The paper has a very good introduction to mono 3DOD methods.

Key ideas

3D reconstruction layer: instead of solving an over-constrained equation, MVRA used a reconstruction layer to lift 2D to 3D.
- IoU loss in perspective view, between the reprojected 3D bbox and the 2d bbox in IoU.
- L2 loss in BEV loss between estimated distance and gt distance.
Iterative orientation refinement for truncated bbox: use only 3 constraints instead of 4, excluding the xmin (for left truncated) or xmax (for right truncated) cars. Try pi/8 interval and find best, then try pi/32 interval to find best. After two iterations, the performance is good enough.

Technical details

Bbox jitter to make the 3D reconstruction layer more robust.

Notes

The use of IoU to pick the best configuration is proposed before in Shift RCNN.
The BEV loss term can be used to incorporate radar into training process.