Skip to content

Latest commit

 

History

History
26 lines (17 loc) · 1.62 KB

File metadata and controls

26 lines (17 loc) · 1.62 KB

September 2020

tl;dr: Use variational autoencoder for semantic occpuancy grid map prediction.

Overall impression

Variational encoder-decoder (VED) encodes the front-view visual information for the driving scene and subsequently decodes it into a BEV semantic occupancy grid.

The proposed method beats a vanilla SegNet (a relatively strong baseline for conventional semantic segmentation). There was a 2x1 pooling layer in order to accommodate the different aspect ratio of input and output.

GT generation uses disparity map from stereo matching. This process may be noisy.

Key ideas

  • View transformation: VAE with sampling.
  • Binary occupancy grid is a decades old concept, but semantic occupancy grid is more powerful and enables more efficient and reliable navigation.
  • Variational AutoEncoder (VAE, or VED as referred to in this paper) forces the latent space to be a normal distribution. Thus we can add a KL-divergence loss to encourage the latent distribution to be a normal distribution. The paper mainly wants to exploits VED's sampling robustness to imperfect GT.
  • VED exhibits intrinsic invariance wrt pitch and roll perturbations, compared to monocular baseline and flat ground assumption.

Technical details

  • It is more robust to pitch and roll perturbation. It can also generalize better to unseen scenario.
  • The PCA components of the latent space does encode some interpretable results.

Notes

  • Questions and notes on how to improve/revise the current work