January 2023
tl;dr: Large scale visual pretraining for policy learning.
The idea is interesting: how to use large scale pretraining to extract driving relevant information. The
- Step1: use large scale uncalibrated driving video to train depthNet and poseNet, a la SfMLearner. The input is two consecutive frames at 1 Hz.
- Step2: from a single image, predict the ego motion. --> This is highly questionable. It would be better to feed in multiple historical frames, and also historical ego motion information. If historical information is important for prediction tasks, why not for planning?
- Summary of technical details
- Questions and notes on how to improve/revise the current work