January 2020
tl;dr: Camera + radar fusion based on AVOD.
The architecture is largely based on AVOD. It converts radar into height and intensity maps and uses the pseudo image and camera image for region proposal.
- Bbox encoding has 10 dim (4 pts + 2 z-values) in the original AVOD paper. However this paper said it used 14 dim.
- Radar+camera does not detect perpendicular cars well. However it detects cars that align with the direction of the ego car much better.