This repository contains my paper reading notes on deep learning and machine learning. It is inspired by Denny Britz and Daniel Takeshi. A minimalistic webpage generated with Github io can be found here.
My name is Patrick Langechuan Liu. After about a decade of education and research in physics, I found my passion in deep learning and autonomous driving.
If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into this list of papers. I did so (see my notes) and it served me well.
Here is a list of trustworthy sources of papers in case I ran out of papers to read.
I regularly update my blog in Toward Data Science.
- BEV Perception in Mass Production Autonomous Driving
- Challenges of Mass Production Autonomous Driving in China
- Vision-centric Semantic Occupancy Prediction for Autonomous Driving (related paper notes)
- Drivable Space in Autonomous Driving — The Industry
- Drivable Space in Autonomous Driving — The Academia
- Drivable Space in Autonomous Driving — The Concept
- Monocular BEV Perception with Transformers in Autonomous Driving (related paper notes)
- Illustrated Differences between MLP and Transformers for Tensor Reshaping in Deep Learning
- Monocular 3D Lane Line Detection in Autonomous Driving (related paper notes)
- Deep-Learning based Object detection in Crowded Scenes (related paper notes)
- Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving (related paper notes)
- Deep Learning in Mapping for Autonomous Driving
- Monocular Dynamic Object SLAM in Autonomous Driving
- Monocular 3D Object Detection in Autonomous Driving — A Review
- Self-supervised Keypoint Learning — A Review
- Single Stage Instance Segmentation — A Review
- Self-paced Multitask Learning — A Review
- Convolutional Neural Networks with Heterogeneous Metadata
- Lifting 2D object detection to 3D in autonomous driving
- Multimodal Regression
- Paper Reading in 2019
- LINGO-1: Exploring Natural Language for Autonomous Driving [Notes] [Wayve, open-loop world model]
- LINGO-2: Driving with Natural Language [Notes] [Wayve, closed-loop world model]
- OpenVLA: An Open-Source Vision-Language-Action Model [open source RT-2]
- Parting with Misconceptions about Learning-based Vehicle Motion Planning CoRL 2023 [Simple non-learning based baseline]
- QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving [Waabi]
- MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving [Notes] ICRA 2015 [Behavior planning, UMich, May Autonomy]
- MPDM2: Multipolicy Decision-Making for Autonomous Driving via Changepoint-based Behavior Prediction [Notes] RSS 2015 [Behavior planning]
- MPDM3: Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment RSS 2017 [Behavior planning]
- EUDM: Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching [Notes] ICRA 2020 [Wenchao Ding, Shaojie Shen, Behavior planning]
- TPP: Tree-structured Policy Planning with Learned Behavior Models ICRA 2023 [Marco Pavone, Nvidia, Behavior planning]
- MARC: Multipolicy and Risk-aware Contingency Planning for Autonomous Driving [Notes] RAL 2023 [Shaojie Shen, Behavior planning]
- EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments TRO 2021 [Wenchao Ding, encyclopedia of pnc]
- trajdata: A Unified Interface to Multiple Human Trajectory Datasets NeurIPS 2023 [Marco Pavone, Nvidia]
- Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization [Xpeng]
- Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles [Notes] IROS 2019 Oral [Uber ATG, behavioral planning, motion planning]
- Enhancing End-to-End Autonomous Driving with Latent World Model
- OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [Jiwen Lu]
- RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision ICRA 2024
- EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [Sanja, Marco, NV]
- FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
- Trajeglish: Traffic Modeling as Next-Token Prediction ICLR 2024
- Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks ITSC 2021
- Learning-Based Approach for Online Lane Change Intention Prediction IV 2013 [SVM, LC intention prediction]
- Traffic Flow-Based Crowdsourced Mapping in Complex Urban Scenario RAL 2023 [Wenchao Ding, Huawei, crowdsourced map]
- FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow ICRA 2023
- Hybrid A-star: Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments IJRR 2010 [Dolgov, Thrun, Searching]
- Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame ICRA 2010 [Werling, Thrun, Sampling] [MUST READ for planning folks]
- Autonomous Driving on Curvy Roads Without Reliance on Frenet Frame: A Cartesian-Based Trajectory Planning Method TITS 2022
- Baidu Apollo EM Motion Planner [Notes][Optimization]
- 基于改进混合A*的智能汽车时空联合规划方法 汽车工程: 规划&决策2023年 [Joint optimization, search]
- Enable Faster and Smoother Spatio-temporal Trajectory Planning for Autonomous Vehicles in Constrained Dynamic Environment JAE 2020 [Joint optimization, search]
- Focused Trajectory Planning for Autonomous On-Road Driving IV 2013 [Joint optimization, Iteration]
- SSC: Safe Trajectory Generation for Complex Urban Environments Using Spatio-Temporal Semantic Corridor RAL 2019 [Joint optimization, SSC, Wenchao Ding, Motion planning]
- AlphaGo: Mastering the game of Go with deep neural networks and tree search [Notes] Nature 2016 [DeepMind, MTCS]
- AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play Science 2017 [DeepMind]
- MuZero: Mastering Atari, Go, chess and shogi by planning with a learned model Nature 2020 [DeepMind]
- Grandmaster-Level Chess Without Search [DeepMind]
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving [MobileEye, desire and traj optimization]
- Comprehensive Reactive Safety: No Need For A Trajectory If You Have A Strategy IROS 2022 [Da Fang, Qcraft]
- BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning AAAI 2024
- LLM-MCTS: Large Language Models as Commonsense Knowledge for Large-Scale Task Planning NeurIPS 2023
- Hivt: Hierarchical vector transformer for multi-agent motion prediction CVPR 2022 [Zikang Zhou, agent-centric, motion prediction]
- QCNet: Query-Centric Trajectory Prediction [Notes] CVPR 2023 [Zikang Zhou, scene-centric, motion prediction]
- Genie: Generative Interactive Environments [Notes] [DeepMind, World Model]
- DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving [Notes] [Jiwen Lu, World Model]
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Notes] [Jiwen Lu, World Model]
- VideoPoet: A Large Language Model for Zero-Shot Video Generation [Like sora, but LLM, NOT world model]
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models [Notes] CVPR 2023 [Sanja, Nvidia, VideoLDM, Video prediction]
- Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos NeurIPS 2022 [Notes] [OpenAI]
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge NeurIPS 2022 [NVidia, Outstanding paper award]
- Humanoid Locomotion as Next Token Prediction [Notes] [Berkeley, EAI]
- RPT: Robot Learning with Sensorimotor Pre-training [Notes] CoRL 2023 Oral [Berkeley, EAI]
- MVP: Real-World Robot Learning with Masked Visual Pre-training [Notes] CoRL 2022 [Berkeley, EAI]
- BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [Notes] CoRL 2021 [Eric Jang, 1X]
- GenAD: Generalized Predictive Model for Autonomous Driving [Notes] CVPR 2024
- HG-DAgger: Interactive Imitation Learning with Human Experts [DAgger]
- DriveGAN: Towards a Controllable High-Quality Neural Simulation [Notes] CVPR 2021 oral [Nvidia, Sanja]
- VideoGPT: Video Generation using VQ-VAE and Transformers [Notes] [Pieter Abbeel]
- LLM, Vision Tokenizer and Vision Intelligence, by Lu Jiang [Notes] [Interview Lu Jiang]
- AV2.0: Reimagining an autonomous vehicle [Notes] [Wayve, Alex Kendall]
- Simulation for E2E AD [Wayve, Tech Sharing, E2E]
- E2E lateral planning [Comma.ai, E2E planning]
- Learning and Leveraging World Models in Visual Representation Learning [LeCun, JEPA series]
- LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models [Large Vision Models, Jitendra Malik]
- LWM: World Model on Million-Length Video And Language With RingAttention [Pieter Abbeel]
- OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving [Jiwen Lu, World Model]
- GenAD: Generative End-to-End Autonomous Driving
- TCP: Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline NeurIPS 2022 [E2E planning, Hongyang]
- Transfuser: Multi-Modal Fusion Transformer for End-to-End Autonomous Driving CVPR 2021 [E2E planning, Geiger]
- Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving [Wayve, LLM + AD]
- LingoQA: Video Question Answering for Autonomous Driving [Wayve, LLM + AD]
- Panacea: Panoramic and Controllable Video Generation for Autonomous Driving CVPR 2024 [Megvii]
- PlanT: Explainable Planning Transformers via Object-Level Representations CoRL 2022
- Scene as Occupancy ICCV 2023
- AD-MLP: Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes [Baidu]
- The Shift from Models to Compound AI Systems
- Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach ICCV 2021
- Learning from All Vehicles CVPR 2022
- LBC: Learning by Cheating CoRL 2019
- Learning to drive from a world on rails ICCV 2021 oral [Philipp Krähenbühl]
- Learning from All Vehicles CVPR 2022 [Philipp Krähenbühl]
- VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning [Horizon]
- VQ-VAE: Neural Discrete Representation Learning NeurIPS 2017 [Image Tokenizer]
- VQ-GAN: Taming Transformers for High-Resolution Image Synthesis CVPR 2021 [Image Tokenizer]
- ViT-VQGAN: Vector-quantized Image Modeling with Improved VQGAN ICLR 2022 [Image Tokenizer]
- MaskGIT: Masked Generative Image Transformer CVPR 2022 [LLM, non-autoregressive]
- MAGVIT: Masked Generative Video Transformer CVPR 2023 highlight [Video Tokenizer]
- MAGVIT-v2: Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation ICLR 2024 [Video Tokenizer]
- Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models [Reverse Engineering of Sora]
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts ICML 2022 [MoE, LLM]
- Lifelong Language Pretraining with Distribution-Specialized Experts ICML 2023 [MoE, LLM]
- DriveLM: Drive on Language [Hongyang Li]
- MotionLM: Multi-Agent Motion Forecasting as Language Modeling ICCV 2023 [Waymo, LLM + AD]
- AD-MLP: Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes [No perception]
- CubeLLM: align 2D/3D with language
- EmerNeRF: ICLR 2024
- A Language Agent for Autonomous Driving
- [Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal]
- DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
- DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving CVPR 2024 [Zheng Zhu]
- Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [Zheng Zhu]
- End-to-end Autonomous Driving: Challenges and Frontiers [Notes] [Hongyang Li, Shanghai AI labs]
- DriveVLM: The convergence of Autonomous Driving and Large Vision-Language Models [Notes] [Hang Zhao]
- DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [Notes] [HKU]
- GAIA-1: A Generative World Model for Autonomous Driving [Notes] [Wayve, vision foundation model]
- ADriver-I: A General World Model for Autonomous Driving [Notes] [Megvii, Xiangyu]
- Drive-WM: Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving [Notes]
- X [Notes] [E2E planning]
- ChatGPT for Robotics: Design Principles and Model Abilities [Notes] [Microsoft, LLM for robotics]
- RoboVQA: Multimodal Long-Horizon Reasoning for Robotics [Notes] [Google DeepMind, LLM for robotics]
- ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application [Microsoft Robotics]
- GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration [Notes] [LLM for robotics, Microsoft Robotics]
- LLM-Brain: LLM as A Robotic Brain: Unifying Egocentric Memory and Control [Notes]
- Voyager: An Open-Ended Embodied Agent with Large Language Models [Notes] [Reasoning Critique, Linxi Jim Fan]
- RetNet: Retentive Network: A Successor to Transformer for Large Language Models [Notes] [MSRA]
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [Notes] ICML 2020 [Linear attention]
- AFT: An Attention Free Transformer [Notes] [Apple]
- RT-1: Robotics Transformer for Real-World Control at Scale [Notes] [DeepMind]
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [Notes] [DeepMind, end-to-end visuomotor]
- RWKV: Reinventing RNNs for the Transformer Era [Notes]
- MILE: Model-Based Imitation Learning for Urban Driving [Notes] NeurIPS 2022 [Alex Kendall]
- PaLM-E: An embodied multimodal language model [Notes] [Google Robotics]
- VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models [Notes] [Feifei Li]
- CaP: Code as Policies: Language Model Programs for Embodied Control [Notes] [Project]
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models ICRA 2023
- TidyBot: Personalized Robot Assistance with Large Language Models [Notes] [Project]
- SayCan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Notes] [Project]
- End-to-end review by Shanghai AI Labs
- Pix2seq v2: A Unified Sequence Interface for Vision Tasks [Notes] NeurIPS 2022 [Geoffrey Hinton]
- 🦩 Flamingo: a Visual Language Model for Few-Shot Learning [Notes] NeurIPS 2022 [DeepMind]
- 😼 Gato: A Generalist Agent [Notes] TMLR 2022 [DeepMind]
- BC-SAC: Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios [Notes] NeurIPS 2022 [Waymo]
- MGAIL-AD: Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving [Notes] IROS 2022 [Waymo]
- SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [Notes] [Occupancy Network, Wei Yi, Jiwen Lu]
- Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving [Notes] [Occupancy Network, Zhao Hang]
- Occupancy Networks: Learning 3D Reconstruction in Function Space CVPR 2019 [Notes] [Andreas Geiger]
- OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction [Occupancy Network, PhiGent]
- Pix2seq: A Language Modeling Framework for Object Detection [Notes] ICLR 2022 [Geoffrey Hinton]
- VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks [Notes] [Jifeng Dai]
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [Notes]
- UniAD: Planning-oriented Autonomous Driving [Notes] [BEV, e2e, Hongyang Li]
- GPT-4 Technical Report [Notes] [OpenAI, GPT]
- OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception [Notes] [Occupancy Network, Jiwen Lu]
- VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion [Note] CVPR 2023 highlight [Occupancy Network, Nvidia]
- MonoScene: Monocular 3D Semantic Scene Completion CVPR 2022 [Notes] [Occupancy Network, single cam]
- CoReNet: Coherent 3D scene reconstruction from a single RGB image [Notes] ECCV 2020 oral
- Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning [Notes] [Epoch.ai industry report]
- Codex: Evaluating Large Language Models Trained on Code [Notes] [GPT, OpenAI]
- InstructGPT: Training language models to follow instructions with human feedback [Notes] [GPT, OpenAI]
- TPVFormer: Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [Notes] CVPR 2023 [Occupancy Network, Jiwen Lu]
- PPGeo: Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [Notes] ICLR 2023
- nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles [Notes]
- Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [Notes] [PJLab]
- ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries [Notes] [BEV, perception + prediction, Hang Zhao]
- MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction [Notes] [Horizon, BEVNet]
- StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving ICRA 2022
- MOTR: End-to-End Multiple-Object Tracking with Transformer ECCV 2022 [Megvii, MOT]
- Anchor DETR: Query Design for Transformer-Based Object Detection [Notes] AAAI 2022 [Megvii]
- HOME: Heatmap Output for future Motion Estimation [Notes] ITSC 2021 [behavior prediction, Huawei Paris]
- PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark [Notes] [BEVNet, lane line]
- VectorMapNet: End-to-end Vectorized HD Map Learning [Notes] [BEVNet, LLD, Hang Zhao]
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection [Notes] ECCV 2022 [BEVNet]
- PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images [Notes] [BEVNet, MegVii]
- M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation [Notes] [BEVNet, nvidia]
- BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection [Notes] [BEVNet, NuScenes SOTA, Megvii]
- CVT: Cross-view Transformers for real-time Map-view Semantic Segmentation [Notes] CVPR 2022 oral [UTAustin, Philipp]
- Wayformer: Motion Forecasting via Simple & Efficient Attention Networks [Notes] [Behavior prediction, Waymo]
- BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection [Notes] [BEVNet]
- BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving [Notes] [Jiwen Lu, BEVNet, perception + prediction]
- BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [Notes] [BEVNet, Han Song]
- BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers [Notes] ECCV 2022 [BEVNet, Hongyang Li, Jifeng Dai]
- TNT: Target-driveN Trajectory Prediction [Notes] CoRL 2020 [prediction, Waymo, Hang Zhao]
- DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets [Notes] ICCV 2021 [prediction, Waymo, 1st place winner WOMD]
- Manydepth: The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth [Notes] CVPR 2021 [monodepth, Niantic]
- DEKR: Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [Notes] CVPR 2021
- BN-FFN-BN: Leveraging Batch Normalization for Vision Transformers [Notes] ICCVW 2021 [BN, transformers]
- PowerNorm: Rethinking Batch Normalization in Transformers [Notes] ICML 2020 [BN, transformers]
- MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction [Notes] ICRA 2022 [Waymo, behavior prediction]
- BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View [Notes]
- Translating Images into Maps [Notes] ICRA 2022 [BEVNet, transformers]
- DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [Notes] CoRL 2021 [BEVNet, transformers]
- Robust-CVD: Robust Consistent Video Depth Estimation CVPR 2021 oral [website]
- MAE: Masked Autoencoders Are Scalable Vision Learners [Notes] [Kaiming He, unsupervised learning]
- SimMIM: A Simple Framework for Masked Image Modeling [Notes] [MSRA, unsupervised learning, MAE]
- iBOT: Image BERT Pre-Training with Online Tokenizer
- STSU: Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images [Notes] ICCV 2021 [BEV feat stitching, Luc Van Gool]
- PanopticBEV: Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images [Notes] RAL 2022 [BEVNet, vertical/horizontal features]
- NEAT: Neural Attention Fields for End-to-End Autonomous Driving [Notes] ICCV 2021 [supplementary] [BEVNet]
- DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? [Notes] ICCV 2021 [mono3D, Toyota]
- EfficientDet: Scalable and Efficient Object Detection [Notes] CVPR 2020 [BiFPN, Tesla AI day]
- PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [Notes] CVPR 2020 [Uber ATG]
- MP3: A Unified Model to Map, Perceive, Predict and Plan [Notes] CVPR 2021 [Uber, planning]
- BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning [Notes] ICCV 2021 [BEVNet, surveillance]
- LiDAR R-CNN: An Efficient and Universal 3D Object Detector [Notes] CVPR 2021 [TuSimple, Naiyan Wang]
- Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches [Notes] [corner cases]
- Systematization of Corner Cases for Visual Perception in Automated Driving [Notes] IV 2020 [corner cases]
- An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving [Notes] IV 2021 [corner cases]
- PYVA: Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation [Notes] CVPR 2021 [Supplementary] [BEVNet]
- YOLOF: You Only Look One-level Feature [Notes] CVPR 2021 [megvii]
- Perceiving Humans: from Monocular 3D Localization to Social Distancing [Notes] TITS 2021 [monoloco++]
- PifPaf: Composite Fields for Human Pose Estimation CVPR 2019
- Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images [BEVNet]
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation CVPR 2021
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving CVPR 2021
- Conditional DETR for Fast Training Convergence
- Probabilistic and Geometric Depth: Detecting Objects in Perspective CoRL 2021
- EgoNet: Exploring Intermediate Representation for Monocular Vehicle Pose Estimation [Notes] CVPR 2021 [mono3D]
- MonoEF: Monocular 3D Object Detection: An Extrinsic Parameter Free Approach [Notes] CVPR 2021 [mono3D]
- GAC: Ground-aware Monocular 3D Object Detection for Autonomous Driving [Notes] RAL 2021 [mono3D]
- FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [Notes] NeurIPS 2020 [mono3D, senseTime]
- GUPNet: Geometry Uncertainty Projection Network for Monocular 3D Object Detection [Notes] ICCV 2021 [mono3D, Wanli Ouyang]
- DARTS: Differentiable Architecture Search [Notes] ICLR 2019 [VGG author]
- FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search [Notes] CVPR 20219 [DARTS]
- FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions CVPR 2020
- FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining CVPR 2021
- Perceiver: General Perception with Iterative Attention [Notes] ICML 2021 [transformers, multimodal]
- Perceiver IO: A General Architecture for Structured Inputs & Outputs [Notes]
- PillarMotion: Self-Supervised Pillar Motion Learning for Autonomous Driving [Notes] CVPR 2021 [Qcraft, Alan Yuille]
- SimTrack: Exploring Simple 3D Multi-Object Tracking for Autonomous Driving [Notes] ICCV 2019 [QCraft, Alan Yuille]
- HDMapNet: An Online HD Map Construction and Evaluation Framework [Notes] CVPR 2021 workshop [youtube video only, Li Auto]
- FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras [Notes] ICCV 2021 [BEVNet, perception + prediction]
- Baidu's CNN seg [Notes]
- Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation [Notes] CVPR 2021 [megvii]
- CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark CVPR 2019
- The Overlooked Elephant of Object Detection: Open Set WACV 2021
- Class-Agnostic Object Detection WACV 2021
- OWOD: Towards Open World Object Detection [Notes] CVPR 2021 oral
- FsDet: Frustratingly Simple Few-Shot Object Detection ICML 2020
- MonoFlex: Objects are Different: Flexible Monocular 3D Object Detection [Notes] CVPR 2021 [mono3D, Jiwen Lu, cropped]
- monoDLE: Delving into Localization Errors for Monocular 3D Object Detection [Notes] CVPR 2021 [mono3D]
- Exploring 2D Data Augmentation for 3D Monocular Object Detection
- OCM3D: Object-Centric Monocular 3D Object Detection [mono3D]
- FSM: Full Surround Monodepth from Multiple Cameras [Notes] ICRA 2021 [monodepth, Xnet]
- CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection [Notes] CVPR 2021 oral [mono3D, BEVNet]
- DSNT: Numerical Coordinate Regression with Convolutional Neural Networks [Notes] [differentiable spatial to numerical transform]
- Soft-Argmax: Human pose regression by combining indirect part detection and contextual information
- INSTA-YOLO: Real-Time Instance Segmentation [Notes] ICML workshop 2020 [single stage instance segmentation]
- CenterNet2: Probabilistic two-stage detection [Notes] [CenterNet, two-stage]
- Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection [Notes] [NMS]
- BoxInst: High-Performance Instance Segmentation with Box Annotations [Notes] CVPR 2021 [Chunhua Shen, Tian Zhi]
- 3DSSD: Point-based 3D Single Stage Object Detector [Notes] CVPR 2020
- RepVGG: Making VGG-style ConvNets Great Again [Notes] [Megvii, Xiangyu Zhang, ACNet]
- ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [Notes] ICCV 2019
- BEV-Feat-Stitching: Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera [Notes] [BEVNet, mono3D, Luc Van Gool]
- PSS: Object Detection Made Simpler by Eliminating Heuristic NMS [Notes] [Transformer, DETR]
- DeFCN: End-to-End Object Detection with Fully Convolutional Network [Notes] [Transformer, DETR]
- OneNet: End-to-End One-Stage Object Detection by Classification Cost [Notes] [Transformer, DETR]
- Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles [Notes] ICRA 2011 [traffic light, Sebastian Thrun]
- Towards lifelong feature-based mapping in semi-static environments [Notes] ICRA 2016
- How to Keep HD Maps for Automated Driving Up To Date [Notes] ICRA 2020 [BMW]
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection [Notes] CVPR 2021 [focal loss]
- Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning [Notes] CVPR 2018 workshop
- Centroid Voting: Object-Aware Centroid Voting for Monocular 3D Object Detection [Notes] IROS 2020 [mono3D, geometry + appearance = distance]
- Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras [Notes] [GM Israel, mono3D]
- DeepPS: Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset TIP 2018 [Parking slot detection, PS2.0 dataset]
- PSDet: Efficient and Universal Parking Slot Detection [Notes] IV 2020 [Zongmu, Parking slot detection]
- PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [Notes] ASPLOS 2020 [pruning]
- Scaled-YOLOv4: Scaling Cross Stage Partial Network [Notes] [yolo]
- Yolov5 by Ultralytics [Notes] [yolo, spatial2channel]
- PP-YOLO: An Effective and Efficient Implementation of Object Detector [Notes] [yolo, paddle-paddle, baidu]
- PointPainting: Sequential Fusion for 3D Object Detection [Notes] [nuscenece]
- MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps [Notes] CVPR 2020 [Unseen moving objects, BEV]
- Locating Objects Without Bounding Boxes [Notes] CVPR 2019 [weighted Haussdorf distance, NMS-free]
- TSP: Rethinking Transformer-based Set Prediction for Object Detection [Notes] ICCV 2021 [DETR, transformers, Kris Kitani]
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals [Notes] CVPR 2020 [DETR, Transformer]
- Unsupervised Monocular Depth Learning in Dynamic Scenes [Notes] CoRL 2020 [LearnK improved ver, Google]
- MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time [Notes] ICML 2020 [Mono3D, pairwise relationship]
- Argoverse: 3D Tracking and Forecasting with Rich Maps [Notes] CVPR 2019 [HD maps, dataset, CV lidar]
- The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes [Notes] ICRA 2019
- Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection CVPRW 2020 [dataset, Daimler, mono3D]
- NYC3DCars: A Dataset of 3D Vehicles in Geographic Context ICCV 2013
- Towards Fully Autonomous Driving: Systems and Algorithms IV 2011
- Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding [Notes] [mono3D, LID+DepJoint]
- ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection AAAI 2020 oral [mono3D]
- CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection [Notes] WACV 2021 [early fusion, camera, radar]
- 3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation [Notes] NeurIPS 2020 workshop [GM Israel, 3D LLD]
- LSTR: End-to-end Lane Shape Prediction with Transformers [Notes] WACV 2021 [LLD, transformers]
- PIXOR: Real-time 3D Object Detection from Point Clouds [Notes] CVPR 2018 (birds eye view)
- HDNET/PIXOR++: Exploiting HD Maps for 3D Object Detection [Notes] CoRL 2018
- CPNDet: Corner Proposal Network for Anchor-free, Two-stage Object Detection ECCV 2020 [anchor free, two stage]
- MVF: End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds [Notes] CoRL 2019 [Waymo, VoxelNet 1st author]
- Pillar-based Object Detection for Autonomous Driving [Notes] ECCV 2020
- Training-Time-Friendly Network for Real-Time Object Detection AAAI 2020 [anchor-free, fast training]
- Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies [Review of autonomous stack, Yu Huang]
- Dense Monocular Depth Estimation in Complex Dynamic Scenes CVPR 2016
- Probabilistic Future Prediction for Video Scene Understanding
- AB3D: A Baseline for 3D Multi-Object Tracking IROS 2020 [3D MOT]
- Spatial-Temporal Relation Networks for Multi-Object Tracking ICCV 2019 [MOT, feature location over time]
- Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking ICRA 2018 [MOT, IIT, 3D shape]
- ST-3D: Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking CVPR 2020 [Peilinag LI, author of VINS and S3DOT]
- Augment Your Batch: Improving Generalization Through Instance Repetition CVPR 2020
- RetinaTrack: Online Single Stage Joint Detection and Tracking CVPR 2020 [MOT]
- Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots
- Gradient Centralization: A New Optimization Technique for Deep Neural Networks ECCV 2020 oral
- Depth Completion via Deep Basis Fitting WACV 2020
- BTS: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation [monodepth, supervised]
- The Edge of Depth: Explicit Constraints between Segmentation and Depth CVPR 2020 [monodepth, Xiaoming Liu]
- On the Continuity of Rotation Representations in Neural Networks CVPR 2019 [rotational representation]
- VDO-SLAM: A Visual Dynamic Object-aware SLAM System IJRR 2020
- Dynamic SLAM: The Need For Speed
- Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction ECCV 2020
- Traffic Light Mapping and Detection [Notes] ICRA 2011 [traffic light, Google, Chris Urmson]
- Traffic light recognition exploiting map and localization at every stage [Notes] Expert Systems 2017 [traffic light, 鲜于明镐,徐在圭,郑浩奇]
- Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars [Notes] IJCNN 2019 [traffic light, Espirito Santo Brazil]
- TSM: Temporal Shift Module for Efficient Video Understanding [Notes] ICCV 2019 [Song Han, video, object detection]
- WOD: Waymo Dataset: Scalability in Perception for Autonomous Driving: Waymo Open Dataset [Notes] CVPR 2020
- Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection [Notes] NeurIPS 2020 [classification as regression]
- A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection NeurIPS 2020 spotlight
- Rethinking the Value of Labels for Improving Class-Imbalanced Learning NeurIPS 2020
- RepLoss: Repulsion Loss: Detecting Pedestrians in a Crowd [Notes] CVPR 2018 [crowd detection, Megvii]
- Adaptive NMS: Refining Pedestrian Detection in a Crowd [Notes] CVPR 2019 oral [crowd detection, NMS]
- AggLoss: Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd [Notes] ECCV 2018 [crowd detection]
- CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions [Notes] CVPR 2020 oral [crowd detection, Megvii, Earth mover's distance]
- R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing [Notes] CVPR 2020
- Double Anchor R-CNN for Human Detection in a Crowd [Notes] [head-body bundle]
- Review: AP vs MR
- SKU110K: Precise Detection in Densely Packed Scenes [Notes] CVPR 2019 [crowd detection, no occlusion]
- GossipNet: Learning non-maximum suppression CVPR 2017
- TLL: Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation ECCV 2018
- Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels GCPR 2020 [mono3D, Daniel Cremers, TUM]
- CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection [Notes] [mono3D, depth AE pretraining]
- Deformable DETR: Deformable Transformers for End-to-End Object Detection [Notes] ICLR 2021 [Jifeng Dai, DETR]
- ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Notes] ICLR 2021
- BYOL: Bootstrap your own latent: A new approach to self-supervised Learning [self-supervised]
- SDFLabel: Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors [Notes] CVPR 2020 oral [TRI, differentiable rendering]
- DensePose: Dense Human Pose Estimation In The Wild [Notes] CVPR 2018 oral [FAIR]
- NOCS: Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation CVPR 2019
- monoDR: Monocular Differentiable Rendering for Self-Supervised 3D Object Detection [Notes] ECCV 2020 [TRI, mono3D]
- Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D [Notes] ECCV 2020 [BEV-Net, Utoronto, Sanja Fidler]
- Implicit Latent Variable Model for Scene-Consistent Motion Forecasting ECCV 2020 [Uber ATG, Rachel Urtasun]
- FISHING Net: Future Inference of Semantic Heatmaps In Grids [Notes] CVPRW 2020 [BEV-Net, Mapping, Zoox]
- VPN: Cross-view Semantic Segmentation for Sensing Surroundings [Notes] RAL 2020 [Bolei Zhou, BEV-Net]
- VED: Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks [Notes] ICRA 2019 [BEV-Net]
- Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View [Notes] ITSC 2020 [BEV-Net]
- Learning to Look around Objects for Top-View Representations of Outdoor Scenes [Notes] ECCV 2018 [BEV-Net, UCSD, Manmohan Chandraker]
- A Parametric Top-View Representation of Complex Road Scenes CVPR 2019 [BEV-Net, UCSD, Manmohan Chandraker]
- FTM: Understanding Road Layout from Videos as a Whole CVPR 2020 [BEV-Net, UCSD, Manmohan Chandraker]
- KM3D-Net: Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training [Notes] RAL 2021 [RTM3D, Peixuan Li]
- InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving [Notes] IROS 2020 [motion segmentation]
- MPV-Nets: Monocular Plan View Networks for Autonomous Driving [Notes] IROS 2019 [BEV-Net]
- Class-Balanced Loss Based on Effective Number of Samples [Notes] CVPR 2019 [Focal loss authors]
- Geometric Pretraining for Monocular Depth Estimation [Notes] ICRA 2020
- Robust Traffic Light and Arrow Detection Using Digital Map with Spatial Prior Information for Automated Driving [Notes] Sensors 2020 [traffic light, 金沢]
- Feature-metric Loss for Self-supervised Learning of Depth and Egomotion [Notes] ECCV 2020 [feature-metric, local minima, monodepth]
- Depth-VO-Feat: Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction CVPR 2018 [feature-metric, monodepth]
- MonoResMatch: Learning monocular depth estimation infusing traditional stereo knowledge [Notes] CVPR 2019 [monodepth, local minima, cheap stereo GT]
- SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance [Notes] ECCV 2020 [Moving objects]
- Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding ECCV 2018 [dynamic objects, rigid and dynamic motion]
- Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding TPAMI 2018
- CC: Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation [Notes] CVPR 2019
- ObjMotionNet: Self-supervised Object Motion and Depth Estimation from Video [Notes] CVPRW 2020 [object motion prediction, velocity prediction]
- Instance-wise Depth and Motion Learning from Monocular Videos
- Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation
- Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues
- DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency ECCV 2018
- LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments [mapping]
- Road-SLAM: Road Marking based SLAM with Lane-level Accuracy [Notes] [HD mapping]
- AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot [Notes] IROS 2020 [Huawei, HD mapping, Tong Qin, VINS author, autonomous valet parking]
- AVP-SLAM-Late-Fusion: Mapping and Localization using Semantic Road Marking with Centimeter-level Accuracy in Indoor Parking Lots [Notes] ITSC 2019
- Lane markings-based relocalization on highway ITSC 2019
- DeepRoadMapper: Extracting Road Topology from Aerial Images [Notes] ICCV 2017 [Uber ATG, NOT HD maps]
- RoadTracer: Automatic Extraction of Road Networks from Aerial Images CVPR 2018 [NOT HD maps]
- PolyMapper: Topological Map Extraction From Overhead Images [Notes] ICCV 2019 [mapping, polygon, NOT HD maps]
- HRAN: Hierarchical Recurrent Attention Networks for Structured Online Maps [Notes] CVPR 2018 [HD mapping, highway, polyline loss, Chamfer distance]
- Deep Structured Crosswalk: End-to-End Deep Structured Models for Drawing Crosswalks [Notes] ECCV 2018
- DeepBoundaryExtractor: Convolutional Recurrent Network for Road Boundary Extraction [Notes] CVPR 2019 [HD mapping, boundary, polyline loss]
- DAGMapper: Learning to Map by Discovering Lane Topology [Notes] ICCV 2019 [HD mapping, highway, forks and merges, polyline loss]
- Sparse-HD-Maps: Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization [Notes] IROS 2019 oral [Uber ATG, metadata, mapping, localization]
- Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks IEEE TGRS 2018
- Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs Sensors 2020 [Tsinghua, 3D HD maps]
- PatchNet: Rethinking Pseudo-LiDAR Representation [Notes] ECCV 2020 [SenseTime, Wanli Ouyang]
- D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection [Notes] CVPR 2020 [mono3D]
- MfS: Learning Stereo from Single Images [Notes] ECCV 2020 [mono for stereo, learn stereo matching with mono]
- BorderDet: Border Feature for Dense Object Detection ECCV 2020 oral [Megvii]
- Scale-Aware Trident Networks for Object Detection ICCV 2019 [different heads for different scales]
- Learning Depth from Monocular Videos using Direct Methods
- Vid2Depth: Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints CVPR 2018 [Google]
- NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
- Supervising the new with the old: learning SFM from SFM [Notes] ECCV 2018
- Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera CVPR 2019 [multi-frame monodepth]
- Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [multi-frame monodepth, RNN]
- Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth [multi-frame monodepth, RNN]
- Exploiting temporal consistency for real-time video depth estimation ICCV 2019 [multi-frame monodepth, RNN, indoor]
- SfM-Net: Learning of Structure and Motion from Video [dynamic object, SfM]
- MB-Net: MergeBoxes for Real-Time 3D Vehicles Detection [Notes] IV 2018 [mono3D: Daimler]
- BS3D: Beyond Bounding Boxes: Using Bounding Shapes for Real-Time 3D Vehicle Detection from Monocular RGB Images [Notes] IV 2019 [mono3D, Daimler]
- 3D-GCK: Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometrically Constrained Keypoints in Real-Time [Notes] IV 2020 [[mono3D, Daimler]
- UR3D: Distance-Normalized Unified Representation for Monocular 3D Object Detection [Notes] ECCV 2020 [mono3D]
- DA-3Det: Monocular 3D Object Detection via Feature Domain Adaptation [Notes] ECCV 2020 [mono3D]
- RAR-Net: Reinforced Axial Refinement Network for Monocular 3D Object Detection [Notes] ECCV 2020 [mono3D]
- CenterTrack: Tracking Objects as Points [Notes] ECCV 2020 spotlight [camera based 3D MOD, MOT SOTA, CenterNet, video based object detection, Philipp Krähenbühl]
- CenterPoint: Center-based 3D Object Detection and Tracking [Notes] CVPR 2021 [lidar based 3D MOD, CenterNet]
- Tracktor: Tracking without bells and whistles [Notes] ICCV 2019 [Tracktor/Tracktor++, Laura Leal-Taixe@TUM]
- FairMOT: A Simple Baseline for Multi-Object Tracking [Notes]
- DeepMOT: A Differentiable Framework for Training Multiple Object Trackers [Notes] CVPR 2020 [trainable Hungarian, Laura Leal-Taixe@TUM]
- MPNTracker: Learning a Neural Solver for Multiple Object Tracking CVPR 2020 oral [trainable Hungarian, Laura Leal-Taixe@TUM]
- nuScenes: A multimodal dataset for autonomous driving [Notes] CVPR 2020 [dataset, point cloud, radar]
- CBGS: Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection [Notes] CVPRW 2019 [Megvii, lidar, WAD challenge winner]
- AFDet: Anchor Free One Stage 3D Object Detection and Competition solution [Notes] CVPRW 2020 [Horizon robotics, lidar, winning for Waymo challenge]
- Review of MOT and SOT [Notes]
- CrowdHuman: A Benchmark for Detecting Human in a Crowd [Notes] [megvii, pedestrian, dataset]
- WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild [Notes] TMM 2019 [dataset, pedestrian]
- Tsinghua-Daimler Cyclists: A New Benchmark for Vison-Based Cyclist Detection [Notes] IV 2016 [dataset, cyclist Detection]
- Specialized Cyclist Detection Dataset: Challenging Real-World Computer Vision Dataset for Cyclist Detection Using a Monocular RGB Camera [Notes] IV 2019 [Extention to KITTI]
- PointTrack: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation [Notes] ECCV 2020 oral [MOTS]
- PointTrack++ for Effective Online Multi-Object Tracking and Segmentation [Notes] CVPR 2020 workshop [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]
- SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth [Notes] ICCV 2019 [one-stage, instance segmentation]
- BA-Net: Dense Bundle Adjustment Networks [Notes] ICLR 2019 [Bundle adjustment, multi-frame monodepth, feature-metric]
- DeepSFM: Structure From Motion Via Deep Bundle Adjustment ECCV 2020 oral [multi-frame monodepth, indoor scene]
- CVD: Consistent Video Depth Estimation [Notes] SIGGRAPH 2020 [multi-frame monodepth, online finetune]
- DeepV2D: Video to Depth with Differentiable Structure from Motion [Notes] ICLR 2020 [multi-frame monodepth, Jia Deng]
- GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [Notes] CVPR 2018 [residual optical flow, monodepth, rigid and dynamic motion]
- GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera [Notes] ICCV 2019 [online finetune, rigid and dynamic motion]
- Depth Hints: Self-Supervised Monocular Depth Hints [Notes] ICCV 2019 [monodepth, local minima, cheap stereo GT]
- MonoUncertainty: On the uncertainty of self-supervised monocular depth estimation [Notes] CVPR 2020 [depth uncertainty]
- Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment [Notes] [Bundle adjustment, xmotors.ai, multi-frame monodepth]
- Kinematic 3D Object Detection in Monocular Video [Notes] ECCV 2020 [multi-frame mono3D, Xiaoming Liu]
- VelocityNet: Camera-based vehicle velocity estimation from monocular video [Notes] CVPR 2017 workshop [monocular velocity estimation, CVPR 2017 challenge winner]
- Vehicle Centric VelocityNet: End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera [Notes] [monocular velocity estimation, monocular distance, SOTA]
- LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain [Notes] IROS 2018 [lidar, mapping]
- PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction [Notes] ICCV 2019
- JAAD: Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior ICCV 2017
- Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs BMVC 2019
- Is the Pedestrian going to Cross? Answering by 2D Pose Estimation IV 2018
- Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation ITSC 2019 [skeleton, pedestrian, cyclist intention]
- Attentive Single-Tasking of Multiple Tasks CVPR 2019
- DETR: End-to-End Object Detection with Transformers [Notes] ECCV 2020 oral [FAIR]
- Transformer: Attention Is All You Need [Notes] NIPS 2017
- SpeedNet: Learning the Speediness in Videos [Notes] CVPR 2020 oral
- MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships [Notes] CVPR 2020 [Mono3D, pairwise relationship]
- SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation [Notes] CVPRW 2020 [Mono3D, Zongmu]
- Vehicle Re-ID for Surround-view Camera System [Notes] CVPRW 2020 [tireline, vehicle ReID, Zongmu]
- End-to-End Lane Marker Detection via Row-wise Classification [Notes] [Qualcomm Korea, LLD as cls]
- Reliable multilane detection and classification by utilizing CNN as a regression network ECCV 2018 [LLD as reg]
- SUPER: A Novel Lane Detection System [Notes]
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation ICCV 2019
- StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation BMVC 2015
- StixelNetV2: Real-time category-based and general obstacle detection for autonomous driving [Notes] ICCV 2017 [DS]
- Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [Notes] CVPR 2016 [channel-to-pixel]
- Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints [mono3D]
- Self-Mono-SF: Self-Supervised Monocular Scene Flow Estimation [Notes] CVPR 2020 oral [scene-flow, Stereo input]
- MEBOW: Monocular Estimation of Body Orientation In the Wild [Notes] CVPR 2020
- VG-NMS: Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes [Notes] NeurIPS 2019 workshop [Crowded scene, NMS, Daimler]
- WYSIWYG: What You See is What You Get: Exploiting Visibility for 3D Object Detection [Notes] CVPR 2020 oral [occupancy grid]
- Real-Time Panoptic Segmentation From Dense Detections [Notes] CVPR 2020 oral [bbox + semantic segmentation = panoptic segmentation, Toyota]
- Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving [Notes] CVPRW 2020 [efficient annotation]
- SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving [Notes] CVPR 2020 oral [Waymo, auto data generation, surfel]
- LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World [Notes] CVPR 2020 oral [Uber ATG, auto data generation, surfel]
- SuMa++: Efficient LiDAR-based Semantic SLAM IROS 2019 [semantic segmentation, lidar, SLAM]
- PON/PyrOccNet: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks [Notes] CVPR 2020 oral [BEV-Net, OFT]
- MonoLayout: Amodal scene layout from a single image [Notes] WACV 2020 [BEV-Net]
- BEV-Seg: Bird’s Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud [Notes] CVPR 2020 workshop [BEV-Net, Mapping]
- A Geometric Approach to Obtain a Bird's Eye View from an Image ICCVW 2019 [mapping, geometry, Andrew Zisserman]
- FrozenDepth: Learning the Depths of Moving People by Watching Frozen People [Notes] CVPR 2019 oral
- ORB-SLAM: a Versatile and Accurate Monocular SLAM System TRO 2015
- ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras TRO 2016
- CubeSLAM: Monocular 3D Object SLAM [Notes] TRO 2019 [dynamic SLAM, orb slam + mono3D]
- ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings [Notes] CVPR 2020 [general dynamic SLAM]
- S3DOT: Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving [Notes] ECCV 2018 [Peiliang Li]
- Multi-object Monocular SLAM for Dynamic Environments [Notes] IV 2020 [monolayout authors]
- PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume [Notes] CVPR 2018 oral [Optical flow]
- LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation CVPR 2018 [Optical flow]
- FlowNet: Learning Optical Flow With Convolutional Networks ICCV 2015 [Optical flow]
- FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks CVPR 2017 [Optical flow]
- ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network CVPR 2019 [semantic segmentation, lightweight]
- Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes ICCV 2019 [depth uncertainty]
- Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems [Notes] [Honda] ICRA 2019
- PackNet: 3D Packing for Self-Supervised Monocular Depth Estimation [Notes] CVPR 2020 oral [Scale aware depth]
- PackNet-SG: Semantically-Guided Representation Learning for Self-Supervised Monocular Depth [Notes] ICLR 2020 [TRI, infinite-depth problem]
- TrianFlow: Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Notes] CVPR 2020 [Scale aware]
- Understanding the Limitations of CNN-based Absolute Camera Pose Regression [Notes] CVPR 2019 [Drawbacks of PoseNet, MapNet, Laura Leal-Taixe@TUM]
- To Learn or Not to Learn: Visual Localization from Essential Matrices [Notes] ICRA 2020 [SIFT + 5 pt solver >> others for VO, Laura Leal-Taixe@TUM]
- DF-VO: Visual Odometry Revisited: What Should Be Learnt? [Notes] ICRA 2020 [Depth and Flow for accurate VO]
- D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry [Notes] CVPR 2020 oral [Daniel Cremers, TUM, depth uncertainty]
- Network Slimming: Learning Efficient Convolutional Networks through Network Slimming [Notes] ICCV 2017
- BatchNorm Pruning: Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers [Notes] ICLR 2018
- Direct Sparse Odometry PAMI 2018
- Train in Germany, Test in The USA: Making 3D Object Detectors Generalize [Notes] CVPR 2020
- PseudoLidarV3: End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [Notes] CVPR 2020
- ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection [Notes] CVPR 2020 oral
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression AAAI 2020
- Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [Journal version]
- YOLOv4: Optimal Speed and Accuracy of Object Detection [Notes]
- CBN: Cross-Iteration Batch Normalization [Notes]
- Stitcher: Feedback-driven Data Provider for Object Detection [Notes]
- SKNet: Selective Kernel Networks [Notes] CVPR 2019
- CBAM: Convolutional Block Attention Module [Notes] ECCV 2018
- ResNeSt: Split-Attention Networks [Notes]
- ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst [Notes] RSS 2019 [Waymo]
- IntentNet: Learning to Predict Intention from Raw Sensor Data [Notes] CoRL 2018 [Uber ATG, perception and prediction, Lidar+Map]
- RoR: Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions [Notes] CVPR 2019 [Zoox]
- MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction [Notes] CoRL 2019 [Waymo, authors from RoR and ChauffeurNet]
- NMP: End-to-end Interpretable Neural Motion Planner [Notes] CVPR 2019 oral [Uber ATG]
- Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks [Notes] ICRA 2019 [Henggang Cui, Multimodal, Uber ATG Pittsburgh]
- Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving WACV 2020 [Uber ATG Pittsburgh]
- TensorMask: A Foundation for Dense Object Segmentation [Notes] ICCV 2019 [single-stage instance seg]
- BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [Notes] CVPR 2020 oral
- Mask Encoding for Single Shot Instance Segmentation [Notes] CVPR 2020 oral [single-stage instance seg, Chunhua Shen]
- PolarMask: Single Shot Instance Segmentation with Polar Representation [Notes] CVPR 2020 oral [single-stage instance seg]
- SOLO: Segmenting Objects by Locations [Notes] ECCV 2020 [single-stage instance seg, Chunhua Shen]
- SOLOv2: Dynamic, Faster and Stronger [Notes] [single-stage instance seg, Chunhua Shen]
- CondInst: Conditional Convolutions for Instance Segmentation [Notes] ECCV 2020 oral [single-stage instance seg, Chunhua Shen]
- CenterMask: Single Shot Instance Segmentation With Point Representation [Notes]CVPR 2020
- VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition [Notes] ICCV 2017
- Which Tasks Should Be Learned Together in Multi-task Learning? [Notes] [Stanford, MTL] ICML 2020
- MGDA: Multi-Task Learning as Multi-Objective Optimization NeurIPS 2018
- Taskonomy: Disentangling Task Transfer Learning [Notes] CVPR 2018
- Rethinking ImageNet Pre-training [Notes] ICCV 2019 [Kaiming He]
- UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor [Notes] [superpoint]
- KP2D: Neural Outlier Rejection for Self-Supervised Keypoint Learning [Notes] ICLR 2020 (pointNet)
- KP3D: Self-Supervised 3D Keypoint Learning for Ego-motion Estimation [Notes] CoRL 2020 [Toyota, superpoint]
- NG-RANSAC: Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses [Notes] ICCV 2019 [pointNet]
- Learning to Find Good Correspondences [Notes] CVPR 2018 Oral (pointNet)
- RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving [Notes] [Huawei, Mono3D]
- DSP: Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation [Notes] AAAI 2020 (SenseTime, Mono3D)
- Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks (LLD, LSTM)
- LaneNet: Towards End-to-End Lane Detection: an Instance Segmentation Approach [Notes] IV 2018 (LaneNet)
- 3D-LaneNet: End-to-End 3D Multiple Lane Detection [Notes] ICCV 2019
- Semi-Local 3D Lane Detection and Uncertainty Estimation [Notes] [GM Israel, 3D LLD]
- Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection [Notes] ECCV 2020 [Apollo, 3D LLD]
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty CVPR 2018 [Egocentric prediction]
- It’s Not All About Size: On the Role of Data Properties in Pedestrian Detection ECCV 2018 [pedestrian]
- Associative Embedding: End-to-End Learning for Joint Detection and Grouping [Notes] NIPS 2017
- Pixels to Graphs by Associative Embedding [Notes] NIPS 2017
- Social LSTM: Human Trajectory Prediction in Crowded Spaces [Notes] CVPR 2017
- Online Video Object Detection using Association LSTM [Notes] [single stage, recurrent]
- SuperPoint: Self-Supervised Interest Point Detection and Description [Notes] CVPR 2018 (channel-to-pixel, deep SLAM, Magic Leap)
- PointRend: Image Segmentation as Rendering [Notes] CVPR 2020 Oral [Kaiming He, FAIR]
- Multigrid: A Multigrid Method for Efficiently Training Video Models [Notes] CVPR 2020 Oral [Kaiming He, FAIR]
- GhostNet: More Features from Cheap Operations [Notes] CVPR 2020
- FixRes: Fixing the train-test resolution discrepancy [Notes] NIPS 2019 [FAIR]
- MoVi-3D: Towards Generalization Across Depth for Monocular 3D Object Detection [Notes] ECCV 2020 [Virtual Cam, viewport, Mapillary/Facebook, Mono3D]
- Amodal Completion and Size Constancy in Natural Scenes [Notes] ICCV 2015 (Amodal completion)
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning [Notes] CVPR 2020 Oral [FAIR, Kaiming He]
- Double Descent: Reconciling modern machine learning practice and the bias-variance trade-of [Notes] PNAS 2019
- Deep Double Descent: Where Bigger Models and More Data Hurt [Notes]
- Visualizing the Loss Landscape of Neural Nets NIPS 2018
- The ApolloScape Open Dataset for Autonomous Driving and its Application CVPR 2018 (dataset)
- ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving [Notes] CVPR 2019
- Part-level Car Parsing and Reconstruction from a Single Street View [Notes] [Baidu]
- 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images [Notes] CVPR 2019
- RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving [Notes] ECCV 2020 spotlight
- DORN: Deep Ordinal Regression Network for Monocular Depth Estimation [Notes] CVPR 2018 [monodepth, supervised]
- D&T: Detect to Track and Track to Detect [Notes] ICCV 2017 (from Feichtenhofer)
- CRF-Net: A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection [Notes] SDF 2019 (radar detection)
- RVNet: Deep Sensor Fusion of Monocular Camera and Radar for Image-based Obstacle Detection in Challenging Environments [Notes] PSIVT 2019
- RRPN: Radar Region Proposal Network for Object Detection in Autonomous Vehicles [Notes] ICIP 2019
- ROLO: Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking [Notes] ISCAS 2016
- Recurrent SSD: Recurrent Multi-frame Single Shot Detector for Video Object Detection [Notes] BMVC 2018 (Mitsubishi)
- Recurrent RetinaNet: A Video Object Detection Model Based on Focal Loss [Notes] ICONIP 2018 (single stage, recurrent)
- Actions as Moving Points [Notes] [not suitable for online]
- The PREVENTION dataset: a novel benchmark for PREdiction of VEhicles iNTentIONs [Notes] ITSC 2019 [dataset, cut-in]
- Semi-Automatic High-Accuracy Labelling Tool for Multi-Modal Long-Range Sensor Dataset [Notes] IV 2018
- Astyx dataset: Automotive Radar Dataset for Deep Learning Based 3D Object Detection [Notes] EuRAD 2019 (Astyx)
- Astyx camera radar: Deep Learning Based 3D Object Detection for Automotive Radar and Camera [Notes] EuRAD 2019 (Astyx)
- How Do Neural Networks See Depth in Single Images? [Notes] ICCV 2019
- Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera ICRA 2019 (depth completion)
- DC: Depth Coefficients for Depth Completion [Notes] CVPR 2019 [Xiaoming Liu, Multimodal]
- Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation [Notes] ICRA 2017
- VO-Monodepth: Enhancing self-supervised monocular depth estimation with traditional visual odometry [Notes] 3DV 2019 (sparse to dense)
- Probabilistic Object Detection: Definition and Evaluation [Notes]
- The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation [Notes] ICCV 2019
- On Calibration of Modern Neural Networks [Notes] ICML 2017 (Weinberger)
- Extreme clicking for efficient object annotation [Notes] ICCV 2017
- Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems [Notes] NeurIPS 2019 (radar)
- Deep Active Learning for Efficient Training of a LiDAR 3D Object Detector [Notes] IV 2019
- C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion [Notes] ICCV 2019
- YOLACT: Real-time Instance Segmentation [Notes] ICCV 2019 [single-stage instance seg]
- YOLACT++: Better Real-time Instance Segmentation [single-stage instance seg]
- Review of Image and Feature Descriptors
- Vehicle Detection With Automotive Radar Using Deep Learning on Range-Azimuth-Doppler Tensors [Notes] ICCV 2019
- GPP: Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road [Notes] IV 2020 [UCSD, Trevidi, mono 3DOD]
- MVRA: Multi-View Reprojection Architecture for Orientation Estimation [Notes] ICCV 2019
- YOLOv3: An Incremental Improvement
- Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving [Notes] ICCV 2019 (Detection with Uncertainty)
- Bayesian YOLOv3: Uncertainty Estimation in One-Stage Object Detection [Notes] [DriveU]
- Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [Notes] ITSC 2018 (DriveU)
- Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [Notes] IV 2019 (DriveU)
- Can We Trust You? On Calibration of a Probabilistic Object Detector for Autonomous Driving [Notes] IROS 2019 (DriveU)
- LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [Notes] CVPR 2019 (uncertainty)
- LaserNet KL: Learning an Uncertainty-Aware Object Detector for Autonomous Driving [Notes] [LaserNet with KL divergence]
- IoUNet: Acquisition of Localization Confidence for Accurate Object Detection [Notes] ECCV 2018
- gIoU: Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression [Notes] CVPR 2019
- The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks CVPR 2018 [IoU as loss]
- KL Loss: Bounding Box Regression with Uncertainty for Accurate Object Detection [Notes] CVPR 2019
- CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth [Notes] CVPR 2019
- BayesOD: A Bayesian Approach for Uncertainty Estimation in Deep Object Detectors [Notes]
- TW-SMNet: Deep Multitask Learning of Tele-Wide Stereo Matching [Notes] ICIP 2019
- Accurate Uncertainties for Deep Learning Using Calibrated Regression [Notes] ICML 2018
- Calibrating Uncertainties in Object Localization Task [Notes] NIPS 2018
- SMWA: On the Over-Smoothing Problem of CNN Based Disparity Estimation [Notes] ICCV 2019 [Multimodal, depth estimation]
- Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image [Notes] ICRA 2018 (depth completion)
- Review of monocular object detection
- Review of 2D 3D contraints in Mono 3DOD
- MonoGRNet 2: Monocular 3D Object Detection via Geometric Reasoning on Keypoints [Notes] [estimates depth from keypoints]
- Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [Notes] CVPR 2017
- SS3D: Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss [Notes] [rergess distance from images, centernet like]
- GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving [Notes] CVPR 2019
- M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [Notes] ICCV 2019 oral [3D anchors, cyclists, Xiaoming Liu]
- TLNet: Triangulation Learning Network: from Monocular to Stereo 3D Object Detection [Notes] CVPR 2019
- A Survey on 3D Object Detection Methods for Autonomous Driving Applications [Notes] TITS 2019 [Review]
- BEV-IPM: Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image [Notes] IV 2019
- ForeSeE: Task-Aware Monocular Depth Estimation for 3D Object Detection [Notes] AAAI 2020 oral [successor to pseudo-lidar, mono 3DOD SOTA]
- Obj-dist: Learning Object-specific Distance from a Monocular Image [Notes] ICCV 2019 (xmotors.ai + NYU) [monocular distance]
- DisNet: A novel method for distance estimation from monocular camera [Notes] IROS 2018 [monocular distance]
- BirdGAN: Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles [Notes] IROS 2019
- Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints [Notes] ICIP 2019
- 3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare [Notes] CVPR 2018
- Deep Optics for Monocular Depth Estimation and 3D Object Detection [Notes] ICCV 2019
- MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation [Notes] ICCV 2019
- Joint Monocular 3D Vehicle Detection and Tracking [Notes] ICCV 2019 (Berkeley DeepDrive)
- CasGeo: 3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results [Notes]
- Slimmable Neural Networks [Notes] ICLR 2019
- Universally Slimmable Networks and Improved Training Techniques [Notes] ICCV 2019
- AutoSlim: Towards One-Shot Architecture Search for Channel Numbers
- Once for All: Train One Network and Specialize it for Efficient Deployment
- DOTA: A Large-scale Dataset for Object Detection in Aerial Images [Notes] CVPR 2018 (rotated bbox)
- RoiTransformer: Learning RoI Transformer for Oriented Object Detection in Aerial Images [Notes] CVPR 2019 (rotated bbox)
- RRPN: Arbitrary-Oriented Scene Text Detection via Rotation Proposals TMM 2018
- R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection (rotated bbox)
- TI white paper: Webinar: mmWave Radar for Automotive and Industrial applications [Notes] [TI, radar]
- Federated Learning: Strategies for Improving Communication Efficiency [Notes] NIPS 2016
- sort: Simple Online and Realtime Tracking [Notes] ICIP 2016
- deep-sort: Simple Online and Realtime Tracking with a Deep Association Metric [Notes]
- MT-CNN: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks [Notes] SPL 2016 (real time, facial landmark)
- RetinaFace: Single-stage Dense Face Localisation in the Wild [Notes] CVPR 2020 [joint object and landmark detection]
- SC-SfM-Learner: Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video [Notes] NIPS 2019
- SiamMask: Fast Online Object Tracking and Segmentation: A Unifying Approach CVPR 2019 (tracking, segmentation, label propagation)
- Review of Kálmán Filter (from Tim Babb, Pixar Animation) [Notes]
- R-FCN: Object Detection via Region-based Fully Convolutional Networks [Notes] NIPS 2016
- Guided backprop: Striving for Simplicity: The All Convolutional Net [Notes] ICLR 2015
- Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks [Notes] CVPR 2019
- Boxy Vehicle Detection in Large Images [Notes] ICCV 2019
- FQNet: Deep Fitting Degree Scoring Network for Monocular 3D Object Detection [Notes] CVPR 2019 [Mono 3DOD, Jiwen Lu]
- Mono3D: Monocular 3D Object Detection for Autonomous Driving [Notes] CVPR2016
- MonoDIS: Disentangling Monocular 3D Object Detection [Notes] ICCV 2019
- Pseudo lidar-e2e: Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud [Notes] ICCV 2019 (pseudo-lidar with 2d and 3d consistency loss, better than PL and worse than PL++, SOTA for pure mono3D)
- MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization [Notes] AAAI 2019 (SOTA of Mono3DOD, MLF < MonoGRNet < Pseudo-lidar)
- MLF: Multi-Level Fusion based 3D Object Detection from Monocular Images [Notes] CVPR 2018 (precursor to pseudo-lidar)
- ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape [Notes] CVPR 2019
- AM3D: Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving [Notes] ICCV 2019 [similar to pseudo-lidar, color-enhanced]
- Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors [Notes] (from Stefano Soatto) AAAI 2019
- Deep Metadata Fusion for Traffic Light to Lane Assignment [Notes] IEEE RA-L 2019 (traffic lights association)
- Automatic Traffic Light to Ego Vehicle Lane Association at Complex Intersections ITSC 2019 (traffic lights association)
- Distant Vehicle Detection Using Radar and Vision[Notes] ICRA 2019 [radar, vision, radar tracklets fusion]
- Distance Estimation of Monocular Based on Vehicle Pose Information [Notes]
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics [Notes] CVPR 2018 (Alex Kendall)
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks [Notes] ICML 2018 (multitask)
- DTP: Dynamic Task Prioritization for Multitask Learning [Notes] ECCV 2018 [multitask, Stanford]
- Will this car change the lane? - Turn signal recognition in the frequency domain [Notes] IV 2014
- Complex-YOLO: Real-time 3D Object Detection on Point Clouds [Notes] (BEV detection only)
- Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds CVPR 2019 (sensor fusion and tracking)
- An intriguing failing of convolutional neural networks and the CoordConv solution [Notes] NIPS 2018
- Deep Parametric Continuous Convolutional Neural Networks [Notes] CVPR 2018 (@Uber, sensor fusion)
- ContFuse: Deep Continuous Fusion for Multi-Sensor 3D Object Detection [Notes] ECCV 2018 [Uber ATG, sensor fusion, BEV]
- Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [Notes] CVPR 2018 oral [lidar only, perception and prediction]
- LearnK: Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras [Notes] ICCV 2019 [monocular depth estimation, intrinsic estimation, SOTA]
- monodepth: Unsupervised Monocular Depth Estimation with Left-Right Consistency [Notes] CVPR 2017 oral (monocular depth estimation, stereo for training)
- Struct2depth: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos [Notes] AAAI 2019 [monocular depth estimation, estimating movement of dynamic object, infinite depth problem, online finetune]
- Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency [Notes] AAAI 2018 (monocular depth estimation, static assumption, surface normal)
- LEGO Learning Edge with Geometry all at Once by Watching Videos [Notes] CVPR 2018 spotlight (monocular depth estimation, static assumption, surface normal)
- Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network [Notes] (radar, RD map, OD, Arxiv 201902)
- A study on Radar Target Detection Based on Deep Neural Networks [Notes] (radar, RD map, OD)
- 2D Car Detection in Radar Data with PointNets [Notes] (from Ulm Univ, radar, point cloud, OD, Arxiv 201904)
- Learning Confidence for Out-of-Distribution Detection in Neural Networks [Notes] (budget to cheat)
- A Deep Learning Approach to Traffic Lights: Detection, Tracking, and Classification [Notes] ICRA 2017 (Bosch, traffic lights)
- How hard can it be? Estimating the difficulty of visual search in an image [Notes] CVPR 2016
- Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [Notes] (review from Bosch)
- Review of monocular 3d object detection (blog from 知乎)
- Deep3dBox: 3D Bounding Box Estimation Using Deep Learning and Geometry [Notes] CVPR 2017 [Zoox]
- MonoPSR: Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [Notes] CVPR 2019
- OFT: Orthographic Feature Transform for Monocular 3D Object Detection [Notes] BMVC 2019 [Convert camera to BEV, Alex Kendall]
- MixMatch: A Holistic Approach to Semi-Supervised Learning [Notes]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Notes] ICML 2019
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? [Notes] NIPS 2017
- Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding [Notes]BMVC 2017
- TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents [Notes] AAAI 2019 oral
- Deep Depth Completion of a Single RGB-D Image [Notes] CVPR 2018 (indoor)
- DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image [Notes] CVPR 2019 (outdoor)
- SfMLearner: Unsupervised Learning of Depth and Ego-Motion from Video [Notes] CVPR 2017
- Monodepth2: Digging Into Self-Supervised Monocular Depth Estimation [Notes] ICCV 2019 [Niantic]
- DeepSignals: Predicting Intent of Drivers Through Visual Signals [Notes] ICRA 2019 (@Uber, turn signal detection)
- FCOS: Fully Convolutional One-Stage Object Detection [Notes] ICCV 2019 [Chunhua Shen]
- Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving [Notes] ICLR 2020
- MMF: Multi-Task Multi-Sensor Fusion for 3D Object Detection [Notes] CVPR 2019 (@Uber, sensor fusion)
- CenterNet: Objects as points (from ExtremeNet authors) [Notes]
- CenterNet: Object Detection with Keypoint Triplets [Notes]
- Object Detection based on Region Decomposition and Assembly [Notes] AAAI 2019
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [Notes] ICLR 2019
- M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network [Notes] AAAI 2019
- Deep Radar Detector [Notes] RadarCon 2019
- Semantic Segmentation on Radar Point Clouds [[Notes]] (from Daimler AG) FUSION 2018
- Pruning Filters for Efficient ConvNets [Notes] ICLR 2017
- Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks [Notes] NIPS 2018 talk
- LeGR: Filter Pruning via Learned Global Ranking [Notes] CVPR 2020 oral
- NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [Notes] CVPR 2019
- AutoAugment: Learning Augmentation Policies from Data [Notes] CVPR 2019
- Path Aggregation Network for Instance Segmentation [Notes] CVPR 2018
- Channel Pruning for Accelerating Very Deep Neural Networks ICCV 2017 (Face++, Yihui He) [Notes]
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices ECCV 2018 (Song Han, Yihui He)
- MobileNetV3: Searching for MobileNetV3 [Notes] ICCV 2019
- MnasNet: Platform-Aware Neural Architecture Search for Mobile [Notes] CVPR 2019
- Rethinking the Value of Network Pruning ICLR 2019
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (MobileNets v2) [Notes] CVPR 2018
- A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms [Notes] ITSC 2013
- MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving [Notes]
- Optimizing the Trade-off between Single-Stage and Two-Stage Object Detectors using Image Difficulty Prediction (Very nice illustration of 1 and 2 stage object detection)
- Light-Head R-CNN: In Defense of Two-Stage Object Detector [Notes] (from Megvii)
- CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection [Notes] CVPR 2019 [center and scale prediction, anchor-free, near SOTA pedestrian]
- Review of Anchor-free methods (知乎Blog) 目标检测:Anchor-Free时代 Anchor free深度学习的目标检测方法 My Slides on CSP
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- CornerNet: Detecting Objects as Paired Keypoints [Notes] ECCV 2018
- ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points [Notes] CVPR 2019
- FSAF: Feature Selective Anchor-Free Module for Single-Shot Object Detection [Notes] CVPR 2019
- FoveaBox: Beyond Anchor-based Object Detector (anchor-free) [Notes]
- Bag of Freebies for Training Object Detection Neural Networks [Notes]
- mixup: Beyond Empirical Risk Minimization [Notes] ICLR 2018
- Multi-view Convolutional Neural Networks for 3D Shape Recognition (MVCNN) [Notes] ICCV 2015
- 3D ShapeNets: A Deep Representation for Volumetric Shapes [Notes] CVPR 2015
- Volumetric and Multi-View CNNs for Object Classification on 3D Data [Notes] CVPR 2016
- Group Normalization [Notes] ECCV 2018
- Spatial Transformer Networks [Notes] NIPS 2015
- Frustum PointNets for 3D Object Detection from RGB-D Data (F-PointNet) [Notes] CVPR 2018
- Dynamic Graph CNN for Learning on Point Clouds [Notes]
- PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud (SOTA for 3D object detection) [Notes] CVPR 2019
- MV3D: Multi-View 3D Object Detection Network for Autonomous Driving [Notes] CVPR 2017 (Baidu, sensor fusion, BV proposal)
- AVOD: Joint 3D Proposal Generation and Object Detection from View Aggregation [Notes] IROS 2018 (sensor fusion, multiview proposal)
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [Notes]
- Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gafp in 3D Object Detection for Autonomous Driving [Notes] CVPR 2019
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection CVPR 2018 (Apple, first end-to-end point cloud encoding to grid)
- SECOND: Sparsely Embedded Convolutional Detection Sensors 2018 (builds on VoxelNet)
- PointPillars: Fast Encoders for Object Detection from Point Clouds [Notes] CVPR 2019 (builds on SECOND)
- Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite [Notes] CVPR 2012
- Vision meets Robotics: The KITTI Dataset [Notes] IJRR 2013
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) [Notes]Video CVPR 2017
- Initialization Strategies of Spatio-Temporal Convolutional Neural Networks [Notes] Video
- Detect-and-Track: Efficient Pose Estimation in Videos [Notes] ICCV 2017 Video
- Deep Learning Based Rib Centerline Extraction and Labeling [Notes] MI MICCAI 2018
- SlowFast Networks for Video Recognition [Notes] ICCV 2019 Oral
- Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) [Notes] CVPR 2017
- Beyond the pixel plane: sensing and learning in 3D (blog, 中文版本)
- VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition (VoxNet) [Notes]
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation CVPR 2017 [Notes]
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space NIPS 2017 [Notes]
- Review of Geometric deep learning 几何深度学习前沿 (from 知乎) (Up to CVPR 2018)
- DQN: Human-level control through deep reinforcement learning (Nature DQN paper) [Notes] DRL
- Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection [Notes] MI
- Panoptic Segmentation [Notes] PanSeg
- Panoptic Feature Pyramid Networks [Notes] PanSeg
- Attention-guided Unified Network for Panoptic Segmentation [Notes] PanSeg
- Bag of Tricks for Image Classification with Convolutional Neural Networks [Notes] CLS
- Deep Reinforcement Learning for Vessel Centerline Tracing in Multi-modality 3D Volumes [Notes] DRL MI
- Deep Reinforcement Learning for Flappy Bird [Notes] DRL
- Long-Term Feature Banks for Detailed Video Understanding [Notes] Video
- Non-local Neural Networks [Notes] Video CVPR 2018
- Mask R-CNN
- Cascade R-CNN: Delving into High Quality Object Detection
- Focal Loss for Dense Object Detection (RetinaNet) [Notes]
- Squeeze-and-Excitation Networks (SENet)
- Progressive Growing of GANs for Improved Quality, Stability, and Variation
- Deformable Convolutional Networks ICCV 2017 [build on R-FCN]
- Learning Region Features for Object Detection
- Learning notes on Deep Learning
- List of Papers on Machine Learning
- Notes of Literature Review on CNN in CV This is the notes for all the papers in the recommended list here
- Notes of Literature Review (Others)
- Notes on how to set up DL/ML environment
- Useful setup notes
Here is the list of papers waiting to be read.
- SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness ICML 2019
- Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet (BagNet) blog ICML 2019
- A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
- Understanding deep learning requires rethinking generalization
- Gradient Reversal: Unsupervised Domain Adaptation by Backpropagation ICML 2015
- Rethinking Pre-training and Self-training NeurIPS 2020 [Quoc Le]
- Mask Scoring R-CNN CVPR 2019
- Training Region-based Object Detectors with Online Hard Example Mining
- Gliding vertex on the horizontal bounding box for multi-oriented object detection
- ONCE: Incremental Few-Shot Object Detection CVPR 2020
- Domain Adaptive Faster R-CNN for Object Detection in the Wild CVPR 2018
- Foggy Cityscapes: Semantic Foggy Scene Understanding with Synthetic Data IJCV 2018
- Foggy Cityscapes ECCV: Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding ECCV 2018
- Dropout Sampling for Robust Object Detection in Open-Set Conditions ICRA 2018 (Niko Sünderhauf)
- Hybrid Task Cascade for Instance Segmentation CVPR 2019 (cascaded mask RCNN)
- Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection ICRA 2019 (Niko Sünderhauf)
- A Unified Panoptic Segmentation Network CVPR 2019 PanSeg
- Model Vulnerability to Distributional Shifts over Image Transformation Sets (CVPR workshop) tl:dr
- Automatic adaptation of object detectors to new domains using self-training CVPR 2019 (find corner case and boost)
- Missing Labels in Object Detection CVPR 2019
- DenseBox: Unifying Landmark Localization with End to End Object Detection
- Circular Object Detection in Polar Coordinates for 2D LIDAR Data CCPR 2016
- LFFD: A Light and Fast Face Detector for Edge Devices [Lightweight, face detection, car detection]
- UnitBox: An Advanced Object Detection Network ACM MM 2016 [Ln IoU loss, Thomas Huang]
- Learning Spatiotemporal Features with 3D Convolutional Networks (C3D) Video ICCV 2015
- AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
- Spatiotemporal Residual Networks for Video Action Recognition (decouple spatiotemporal) NIPS 2016
- Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks (P3D, decouple spatiotemporal) ICCV 2017
- A Closer Look at Spatiotemporal Convolutions for Action Recognition (decouple spatiotemporal) CVPR 2018
- Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification (decouple spatiotemporal) ECCV 2018
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? CVPR 2018
- AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation ICCV 2019
- One-Shot Video Object Segmentation CVPR 2017
- Looking Fast and Slow: Memory-Guided Mobile Video Object Detection CVPR 2018
- Towards High Performance Video Object Detection [Notes] CVPR 2018
- Towards High Performance Video Object Detection for Mobiles [Notes]
- Temporally Distributed Networks for Fast Video Semantic Segmentation CVPR 2020 [efficient video segmentation]
- Memory Enhanced Global-Local Aggregation for Video Object Detection CVPR 2020 [efficient video object detection]
- Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation IJCAI 2018 oral [video skeleton]
- RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving NeurIPS 2019 workshop
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description CVPR 2015 oral
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition ECCV 2016
- TRN: Temporal Relational Reasoning in Videos ECCV 2018
- X3D: Expanding Architectures for Efficient Video Recognition CVPR 2020 oral [FAIR]
- Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians CVPR 2020 oral [pedestrian, video]
- Flow-guided feature aggregation for video object detection ICCV 2017 [video, object detection]
- 3D human pose estimation in video with temporal convolutions and semi-supervised training CVPR 2019 [mono3D pose estimation from video]
- OmegaNet: Distilled Semantics for Comprehensive Scene Understanding from Videos CVPR 2020
- Object Detection in Videos with Tubelet Proposal Networks CVPR 2017 [video object detection]
- T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos [video object detection]
- Flow-Guided Feature Aggregation for Video Object Detection ICCV 2017 [Jifeng Dai]
- Efficient Deep Learning Inference based on Model Compression (Model Compression)
- Neural Network Distiller [Intel]
- Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks
- CBAM: Convolutional Block Attention Module
- Playing Atari with Deep Reinforcement Learning NIPS 2013
- Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scan
- An Artificial Agent for Robust Image Registration
- 3D-CNN:3D Convolutional Neural Networks for Landing Zone Detection from LiDAR
- Generative and Discriminative Voxel Modeling with Convolutional Neural Networks
- Orientation-boosted Voxel Nets for 3D Object Recognition (ORION) <BMVC 2017>
- GIFT: A Real-time and Scalable 3D Shape Search Engine CVPR 2016
- 3D Shape Segmentation with Projective Convolutional Networks (ShapePFCN)CVPR 2017
- Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
- Open3D: A Modern Library for 3D Data Processing
- Multimodal Deep Learning for Robust RGB-D Object Recognition IROS 2015
- FlowNet3D: Learning Scene Flow in 3D Point Clouds CVPR 2019
- Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling CVPR 2018 (Neighbors Do Help: Deeply Exploiting Local Structures of Point Clouds)
- PU-Net: Point Cloud Upsampling Network CVPR 2018
- Recurrent Slice Networks for 3D Segmentation of Point Clouds CVPR 2018
- SPLATNet: Sparse Lattice Networks for Point Cloud Processing CVPR 2018
- Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering NIPS 2016
- Semi-Supervised Classification with Graph Convolutional Networks ICLR 2017
- Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks NIPS 2017
- Graph Attention Networks ICLR 2018
- 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection (3D SSD)
- Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models ICCV 2017
- Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis CVPR 2017
- IPOD: Intensive Point-based Object Detector for Point Cloud
- Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images CVPR 2017
- 2D-Driven 3D Object Detection in RGB-D Images
- 3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection
- Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection [classify occluded object]
- PSMNet: Pyramid Stereo Matching Network CVPR 2018
- Stereo R-CNN based 3D Object Detection for Autonomous Driving CVPR 2019
- Deep Rigid Instance Scene Flow CVPR 2019
- Upgrading Optical Flow to 3D Scene Flow through Optical Expansion CVPR 2020
- Learning Multi-Object Tracking and Segmentation from Automatic Annotations CVPR 2020 [automatic MOTS annotation]
- Traffic-Sign Detection and Classification in the Wild CVPR 2016 [Tsinghua, Tencent, traffic signs]
- A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection IEEE CRV 2018 [U torronto]
- Detecting Traffic Lights by Single Shot Detection ITSC 2018
- DeepTLR: A single Deep Convolutional Network for Detection and Classification of Traffic Lights IV 2016
- Evaluating State-of-the-art Object Detector on Challenging Traffic Light Data CVPR 2017 workshop
- Traffic light recognition in varying illumination using deep learning and saliency map ITSC 2014 [traffic light]
- Traffic light recognition using high-definition map features RAS 2019
- Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives TITS 2015
- The DriveU Traffic Light Dataset: Introduction and Comparison with Existing Datasets ICRA 2018
- The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset
- Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives (traffic light survey, UCSD LISA)
- Review of Graph Spectrum Theory (WIP)
- 3D Deep Learning Tutorial at CVPR 2017 [Notes] - (WIP)
- A Survey on Neural Architecture Search
- Network pruning tutorial (blog)
- GNN tutorial at CVPR 2019
- Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset [Waymo, prediction dataset]
- PANDA: A Gigapixel-level Human-centric Video Dataset CVPR 2020
- WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving ICCV 2019 [Valeo]
- Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation 3DV 2018
- Depth Map Prediction from a Single Image using a Multi-Scale Deep Network NIPS 2014 (Eigen et al)
- Learning Depth from Monocular Videos using Direct Methods CVPR 2018 (monocular depth estimation)
- Virtual-Normal: Enforcing geometric constraints of virtual normal for depth prediction [Notes] ICCV 2019 (better generation of PL)
- Spatial Correspondence with Generative Adversarial Network: Learning Depth from Monocular Videos ICCV 2019
- Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM ICCV 2019
- Visualization of Convolutional Neural Networks for Monocular Depth Estimation ICCV 2019
- Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation ICCV 2019 workshop [indoor]
- Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation ECCV 2020 [indoor depth]
- Disambiguating Monocular Depth Estimation with a Single Transient ECCV 2020 [additional laser sensor, indoor depth]
- Guiding Monocular Depth Estimation Using Depth-Attention Volume ECCV 2020 [indoor depth]
- Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets ECCV 2020 [indoor depth]
- CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss ECCV 2020 [indoor depth]
- PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation (pointnet alternative, backbone)
- Vehicle Detection from 3D Lidar Using Fully Convolutional Network (VeloFCN) RSS 2016
- KPConv: Flexible and Deformable Convolution for Point Clouds (from the authors of PointNet)
- PointCNN: Convolution On X-Transformed Points NIPS 2018
- L3-Net: Towards Learning based LiDAR Localization for Autonomous Driving CVPR 2019
- RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement (sensor fusion, 3D mono proposal, refined in point cloud)
- DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map CVPR 2018
- Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection IROS 2019
- PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing
- Gated2Depth: Real-time Dense Lidar from Gated Images ICCV 2019 oral
- A Multi-Sensor Fusion System for Moving Object Detection and Tracking in Urban Driving Environments ICRA 2014
- PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation CVPR 2018 [sensor fusion, Zoox]
- Deep Hough Voting for 3D Object Detection in Point Clouds ICCV 2019 [Charles Qi]
- StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation CVPR 2020
- Depth Sensing Beyond LiDAR Range CVPR 2020 [wide baseline stereo with trifocal]
- Probabilistic Semantic Mapping for Urban Autonomous Driving Applications IROS 2020 [lidar mapping]
- RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds CVPR 2020 oral [lidar segmentation]
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation CVPR 2020 [lidar segmentation]
- OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression CVPR 2020 oral [lidar compression]
- MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models NeurIPS 2020 oral [lidar compression]
- Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty CVPR 2018 [on-board bbox prediction]
- Unsupervised Traffic Accident Detection in First-Person Videos IROS 2019 (Honda)
- NEMO: Future Object Localization Using Noisy Ego Priors (Honda)
- Robust Aleatoric Modeling for Future Vehicle Localization (perspective)
- Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments WACV 2020 (perspective bbox, pedestrian)
- Using panoramic videos for multi-person localization and tracking in a 3D panoramic coordinate
- End-to-end Lane Detection through Differentiable Least-Squares Fitting ICCV 2019
- Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit TITS 2019 [object-like proposals]
- Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers [3D LLD]
- Ultra Fast Structure-aware Deep Lane Detection ECCV 2020 [lane detection]
- A Novel Approach for Detecting Road Based on Two-Stream Fusion Fully Convolutional Network (convert camera to BEV)
- FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network
- RetinaTrack: Online Single Stage Joint Detection and Tracking CVPR 2020
- Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art (latest update in Dec 2019)
- Simultaneous Identification and Tracking of Multiple People Using Video and IMUs CVPR 2019
- Detect-and-Track: Efficient Pose Estimation in Videos
- TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis
- Video Action Transformer Network CVPR 2019 oral
- Online Real-time Multiple Spatiotemporal Action Localisation and Prediction ICCV 2017
- 多目标跟踪 近年论文及开源代码汇总
- GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning CVPR 2020 oral [3DMOT, CMU, Kris Kitani]
- Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking ECCV 2020 spotlight [MOT, Tencent]
- Towards Real-Time Multi-Object Tracking ECCV 2020 [MOT]
- Probabilistic 3D Multi-Object Tracking for Autonomous Driving [TRI]
- Probabilistic Face Embeddings ICCV 2019
- Data Uncertainty Learning in Face Recognition CVPR 2020
- Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos CVPR 2020 oral [VGG, self-supervised, interpretable, discriminator]
- Revisiting Small Batch Training for Deep Neural Networks
- ICML2019 workshop: Adaptive and Multitask Learning: Algorithms & Systems ICML 2019
- Adaptive Scheduling for Multi-Task Learning NIPS 2018 (NMT)
- Polar Transformer Networks ICLR 2018
- Measuring Calibration in Deep Learning CVPR 2019
- Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation ICCV 2019 (epistemic uncertainty)
- Making Convolutional Networks Shift-Invariant Again ICML
- Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty NeurIPS 2019
- Understanding deep learning requires rethinking generalization ICLR 2017 [ICLR best paper]
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks ICLR 2017 (NLL score as anomaly score)
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination CVPR 2018 spotlight (Stella Yu)
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks TIP 2018
- The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning ICML 2018
- Designing Network Design Spaces CVPR 2020
- Moco2: Improved Baselines with Momentum Contrastive Learning
- SGD on Neural Networks Learns Functions of Increasing Complexity NIPS 2019 (SGD learns a linear classifier first)
- Pay attention to the activations: a modular attention mechanism for fine-grained image recognition
- A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images BMVC 2018 (multi-bin, what's new?)
- In-Place Activated BatchNorm for Memory-Optimized Training of DNNs CVPR 2018 (optimized BatchNorm + ReLU)
- FCNN: Fourier Convolutional Neural Networks (FFT as CNN)
- Visualizing the Loss Landscape of Neural Nets NIPS 2018
- Xception: Deep Learning with Depthwise Separable Convolutions (Xception)
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (uncertainty)
- Learning to Drive from Simulation without Real World Labels ICRA 2019 (domain adaptation, sim2real)
- Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks CVPR 2020 oral
- Switchable Whitening for Deep Representation Learning ICCV 2019 [domain adaptation]
- Visual Chirality CVPR 2020 oral [best paper nominee]
- Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data CVPR 2020
- Self-training with Noisy Student improves ImageNet classification CVPR 2020 [distillation]
- Keep it Simple: Image Statistics Matching for Domain Adaptation CVPRW 2020 [Domain adaptation for 2D mod bbox]
- Epipolar Transformers CVPR 2020 [Yihui He]
- Scalable Uncertainty for Computer Vision With Functional Variational Inference CVPR 2020 [epistemic uncertainty with one fwd pass]
- 3DOP: 3D Object Proposals for Accurate Object Class Detection NIPS 2015
- DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation
- Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery ECCV 2018 (Monocular 3D object detection and depth estimation)
- Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-aware Representation CVPR 2019 [unified conditional decoder]
- DDP: Dense Depth Posterior from Single Image and Sparse Range CVPR 2019
- Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes IJCV 2018 (data augmentation with AR, Toyota)
- Exploring the Capabilities and Limits of 3D Monocular Object Detection -- A Study on Simulation and Real World Data IITS
- Towards Scene Understanding with Detailed 3D Object Representations IJCV 2014 (keypoint, 3D bbox annotation)
- Deep Cuboid Detection: Beyond 2D Bounding Boxes (Magic Leap)
- Viewpoints and Keypoints (Malik)
- Lifting Object Detection Datasets into 3D (PASCAL)
- 3D Object Class Detection in the Wild (keypoint based)
- Fast Single Shot Detection and Pose Estimation 3DV 2016 (SSD + pose, Wei Liu)
- Virtual KITTI 2
- Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing CVPR 2017
- Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views ICCV 2015 Oral
- Real-Time Seamless Single Shot 6D Object Pose Prediction CVPR 2018
- Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching NIPS 2018 [disparity estimation]
- Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera ICRA 2019
- Learning Depth with Convolutional Spatial Propagation Network (Baidu, depth from SPN) ECCV 2018
- Just Go with the Flow: Self-Supervised Scene Flow Estimation CVPR 2020 oral [Scene flow, Lidar]
- Online Depth Learning against Forgetting in Monocular Videos CVPR 2020 [monodepth]
- Self-Supervised Deep Visual Odometry with Online Adaptation CVPR 2020 oral [DF-VO, TrianFlow, meta-learning]
- Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume CVPR 2020
- Online Depth Learning against Forgetting in Monocular Videos CVPR 2020 [monodepth, online learning]
- SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation CVPR 2020 [monodepth, semantic]
- Inferring Distributions Over Depth from a Single Image TRO [Depth confidence, stitching them together]
- Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths CVPR 2020
- The Edge of Depth: Explicit Constraints between Segmentation and Depth CVPR 2020 [Xiaoming Liu, multimodal, depth bleeding]
- MV-RSS: Multi-View Radar Semantic Segmentation ICCV 2021
- Classification of Objects in Polarimetric Radar Images Using CNNs at 77 GHz (Radar, polar)
- CNNs for Interference Mitigation and Denoising in Automotive Radar Using Real-World Data NeurIPS 2019 (radar)
- Road Scene Understanding by Occupancy Grid Learning from Sparse Radar Clusters using Semantic Segmentation ICCV 2019 (radar)
- RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects ECCV 2020 [Uber ATG]
- Depth Estimation from Monocular Images and Sparse Radar Data IROS 2020 [Camera + Radar for monodepth, nuscenes]
- RPR: Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles IROS 2020 [radar proposal refinement]
- Warping of Radar Data into Camera Image for Cross-Modal Supervision in Automotive Applications
- PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization [Notes] ICCV 2015
- PoseNet2: Modelling Uncertainty in Deep Learning for Camera Relocalization ICRA 2016
- PoseNet3: Geometric Loss Functions for Camera Pose Regression with Deep Learning CVPR 2017
- EssNet: Convolutional neural network architecture for geometric matching CVPR 2017
- NC-EssNet: Neighbourhood Consensus Networks NeurIPS 2018
- Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task CVPR 2020 oral [Eric Brachmann, ngransac]
- Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints CVPR 2018
- DynSLAM: Robust Dense Mapping for Large-Scale Dynamic Environments [dynamic SLAM, Andreas Geiger] ICRA 2018
- GCNv2: Efficient Correspondence Prediction for Real-Time SLAM LRA 2019 [Superpoint + orb slam]
- [Real-time Scalable Dense Surfel Mapping](Real-time Scalable Dense Surfel Mapping) ICRA 2019 [dense reconstruction, monodepth]
- Dynamic SLAM: The Need For Speed
- GSLAM: A General SLAM Framework and Benchmark ICCV 2019
- Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar CVPR 2020 [Daimler]
- Radar+RGB Attentive Fusion for Robust Object Detection in Autonomous Vehicles ICIP 2020
- Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor sensors 2020 [radar, camera, early fusion]
- A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence
- Monocular Depth Estimation Based On Deep Learning: An Overview
- Uncertainty Guided Multi-Scale Residual Learning-using a Cycle Spinning CNN for Single Image De-Raining CVPR 2019
- Learn to Combine Modalities in Multimodal Deep Learning (sensor fusion, general DL)
- Safe Trajectory Generation For Complex Urban Environments Using Spatio-temporal Semantic Corridor LRA 2019 [Motion planning]
- DAgger: Driving Policy Transfer via Modularity and Abstraction CoRL 2018 [DAgger, Immitation Learning]
- Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching ICRA 2020 [Motion planning]
- Calibration of Heterogeneous Sensor Systems
- Intro:Sensor Fusion for Adas 无人驾驶中的数据融合 (from 知乎) (Up to CVPR 2018)
- YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving CVPR 2019 (Real Time, Low Power)
- Deep Fusion of Heterogeneous Sensor Modalities for the Advancements of ADAS to Autonomous Vehicles
- Temporal Coherence for Active Learning in Videos ICCVW 2019 [active learning, temporal coherence]
- R-TOD: Real-Time Object Detector with Minimized End-to-End Delay for Autonomous Driving RTSS 2020 [perception system design]
- Learning Lane Graph Representations for Motion Forecasting ECCV 2020 [Uber ATG]
- DSDNet: Deep Structured self-Driving Network ECCV 2020 [Uber ATG]
- Temporal Coherence for Active Learning in Videos ICCV 2019 workshop
- Leveraging Pre-Trained 3D Object Detection Models For Fast Ground Truth Generation ITSC 2018 [UToronto, autolabeling]
- Learning Multi-Object Tracking and Segmentation From Automatic Annotations CVPR 2020 [Autolabeling]
- Canonical Surface Mapping via Geometric Cycle Consistency ICCV 2019
- TIDE: A General Toolbox for Identifying Object Detection Errors ECCV 2018 [tools]
- Self-Supervised Camera Self-Calibration from Video [TRI, intrinsic calibration, fisheye/pinhole]
- A Convolutional Neural Network for Modelling Sentences ACL 2014
- FastText: Bag of Tricks for Efficient Text Classification ACL 2017
- Siamese recurrent architectures for learning sentence similarity AAAI 2016
- Efficient Estimation of Word Representations in Vector Space ICLR 2013
- Neural Machine Translation by Jointly Learning to Align and Translate ICLR 2015
- Transformers: Attention Is All You Need NIPS 2017
- Ad推荐系统方向文章汇总
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction [Notes] (dimension reduction, better than t-SNE)
- Review Notes of Classical Key Points and Descriptors
- CRF
- Visual SLAM and Visual Odometry
- ORB SLAM
- Bundle Adjustment
- 3D vision
- SLAM/VIO学习总结
- Design Patterns
- Capturing Omni-Range Context for Omnidirectional Segmentation CVPR 2021
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers CVPR 2021 [transformers]
- DCL: Dense Label Encoding for Boundary Discontinuity Free Rotation Detection CVPR 2021
- 4D Panoptic LiDAR Segmentation CVPR 2021 [TUM]
- CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild CVPR 2021
- Fast and Accurate Model Scaling CVPR 2021 [FAIR]
- Cylinder3D: Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation CVPR 2021 [lidar semantic segmentation]
- LiDAR R-CNN: An Efficient and Universal 3D Object Detector CVPR 2021 [TuSimple, Lidar]
- PREDATOR: Registration of 3D Point Clouds with Low Overlap CVPR 2021 oral
- DBB: Diverse Branch Block: Building a Convolution as an Inception-like Unit CVPR 2021 [RepVGG, ACNet, Xiaohan Ding, Megvii]
- GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection CVPR 2021 [mono3D]
- DDMP: Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection CVPR 2021 [mono3D]
- M3DSSD: Monocular 3D Single Stage Object Detector CVPR 2021 [mono3D]
- MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation CVPR 2021 [mono3D]
- HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection CVPR 2021 [Lidar]
- PLUME: Efficient 3D Object Detection from Stereo Images [Yan Wang, Uber ATG]
- V2F-Net: Explicit Decomposition of Occluded Pedestrian Detection [crowded, pedestrian, megvii]
- IP-basic: In Defense of Classical Image Processing: Fast Depth Completion on the CPU CRV 2018
- Revisiting Feature Alignment for One-stage Object Detection [cls+reg]
- Per-frame mAP Prediction for Continuous Performance Monitoring of Object Detection During Deployment WACV 2021 [SafetyNet]
- TSD: Revisiting the Sibling Head in Object Detector CVPR 2020 [sensetime, cls+reg]
- 1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation [sensetime, cls+reg, 1st place OpenImage2019]
- Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation ICRA 2021
- End-to-end Lane Detection through Differentiable Least-Squares Fitting ICCV workshop 2019
- Revisiting ResNets: Improved Training and Scaling Strategies
- Multi-Modality Cut and Paste for 3D Object Detection
- LD: Localization Distillation for Object Detection
- PolyTransform: Deep Polygon Transformer for Instance Segmentation CVPR 2020 [single stage instance segmentation]
- ROAD: The ROad event Awareness Dataset for Autonomous Driving
- LidarMTL: A Simple and Efficient Multi-task Network for 3D Object Detection and Road Understanding [lidar MTL]
- High-Performance Large-Scale Image Recognition Without Normalization ICLR 2021
- Ground-aware Monocular 3D Object Detection for Autonomous Driving RA-L [mono3D]
- Demystifying Pseudo-LiDAR for Monocular 3D Object Detection [mono3d]
- Pseudo-labeling for Scalable 3D Object Detection [Waymo]
- LLA: Loss-aware Label Assignment for Dense Pedestrian Detection [Megvii]
- VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation CVPR 2020 [Waymo]
- CoverNet: Multimodal Behavior Prediction using Trajectory Sets CVPR 2020 [prediction, nuScenes]
- SplitNet: Divide and Co-training
- VoVNet: An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection CVPR 2019 workshop
- Isometric Neural Networks: Non-discriminative data or weak model? On the relative importance of data and model resolution ICCV 2019 workshop [spatial2channel]
- TResNet WACV 2021 [spatial2channel]
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression AAAI 2020 [DIOU, NMS]
- RegNet: Designing Network Design Spaces CVPR 2020 [FAIR]
- On Network Design Spaces for Visual Recognition [FAIR]
- Lane Endpoint Detection and Position Accuracy Evaluation for Sensor Fusion-Based Vehicle Localization on Highways Sensors 2018 [lane endpoints]
- Map-Matching-Based Cascade Landmark Detection and Vehicle Localization IEEE Access 2019 [lane endpoints]
- GCNet: End-to-End Learning of Geometry and Context for Deep Stereo Regression ICCV 2017 [disparity estimation, Alex Kendall, cost volume]
- Traffic Control Gesture Recognition for Autonomous Vehicles IROS 2020 [Daimler]
- Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild ECCV 2020
- OrcVIO: Object residual constrained Visual-Inertial Odometry [dynamic SLAM, very mathematical]
- InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling ECCV 2020
- DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving ECCV 2020
- Towards Lightweight Lane Detection by Optimizing Spatial Embedding ECCV 2020 workshop [LLD]
- Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection ECCV 2020 workshop [lidar]
- DeepIM: Deep iterative matching for 6d pose estimation ECCV 2018 [pose estimation]
- Monocular Depth Prediction through Continuous 3D Loss IROS 2020
- Multi-Task Learning for Dense Prediction Tasks: A Survey [MTL, Luc Van Gool]
- Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems ITSC 2020 oral [MTL]
- NeurAll: Towards a Unified Model for Visual Perception in Automated Driving ITSC 2019 oral [MTL]
- Deep Evidential Regression NeurIPS 2020 [one-pass aleatoric/epistemic uncertainty]
- Estimating Drivable Collision-Free Space from Monocular Video WACV 2015 [Drivable space]
- Visualization of Convolutional Neural Networks for Monocular Depth Estimation ICCV 2019 [monodepth]
- Differentiable Rendering: A Survey [differentiable rendering, TRI]
- SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware Feature Extraction [monodepth, semantics, Naver labs]
- Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework WACV 2020
- Towards Good Practice for CNN-Based Monocular Depth Estimation WACV 2020
- Self-Supervised Scene De-occlusion CVPR 2020 oral
- TP-LSD: Tri-Points Based Line Segment Detector
- Data Distillation: Towards Omni-Supervised Learning CVPR 2018 [Kaiming He, FAIR]
- MiDas: Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer [monodepth, dynamic object, synthetic dataset]
- Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation [monodepth]
- Towards Lightweight Lane Detection by Optimizing Spatial Embedding ECCV 2020 workshop
- Synthetic-to-Real Domain Adaptation for Lane Detection [GM Israel, LLD]
- PolyLaneNet: Lane Estimation via Deep Polynomial Regression ICPR 2020 [polynomial, LLD]
- Learning Universal Shape Dictionary for Realtime Instance Segmentation
- End-to-End Video Instance Segmentation with Transformers [DETR, transformers]
- Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks CVPR 2020 workshop
- When and Why Test-Time Augmentation Works
- Footprints and Free Space from a Single Color Image CVPR 2020 oral [Parking use, footprint]
- Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning [BEV, only predict footprint]
- Rethinking Classification and Localization for Object Detection CVPR 2020
- Monocular 3D Object Detection with Sequential Feature Association and Depth Hint Augmentation [mono3D]
- Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
- ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- MVSNet: Depth Inference for Unstructured Multi-view Stereo ECCV 2018
- Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference CVPR 2019 [Deep learning + MVS, Vidar, same author MVSNet]
- Artificial Dummies for Urban Dataset Augmentation AAAI 2021
- DETR for Pedestrian Detection [transformer, pedestrian detection]
- Multi-Modality Cut and Paste for 3D Object Detection [SenseTime]
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [transformer, semantic segmenatation]
- TransPose: Towards Explainable Human Pose Estimation by Transformer [transformer, pose estimation]
- Seesaw Loss for Long-Tailed Instance Segmentation
- SWA Object Detection [Stochastic Weights Averaging (SWA)]
- 3D Object Detection with Pointformer
- Toward Transformer-Based Object Detection [DETR-like]
- Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion [dense SfM]
- Multi-Modality Cut and Paste for 3D Object Detection
- Vision Global Localization with Semantic Segmentation and Interest Feature Points
- Transformer Interpretability Beyond Attention Visualization [transformers]
- Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
- DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution
- Empirical Upper Bound in Object Detection and More
- Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline [Fisheye, Senthil Yogamani]
- Monocular 3D Object Detection with Sequential Feature Association and Depth Hint Augmentation [mono3D]
- SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images [Jiwen Lu, monodepth]
- Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [TRI]
- Linformer: Self-Attention with Linear Complexity
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks ICML 2019
- PCT: Point cloud transformer Computational Visual Media 2021
- DDT: Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming IJCAI 2017
- Hierarchical Road Topology Learning for Urban Map-less Driving [Mercedes]
- Probabilistic Future Prediction for Video Scene Understanding ECCV 2020 [Alex Kendall]
- Detecting 32 Pedestrian Attributes for Autonomous Vehicles [VRU, MTL]
- Cascaded deep monocular 3D human pose estimation with evolutionary training data CVPR 2020 oral
- MonoGeo: Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [mono3D]
- Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth [mono3D]
- Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting [mono3D]
- Lite-FPN for Keypoint-based Monocular 3D Object Detection [mono3D]
- Lidar Point Cloud Guided Monocular 3D Object Detection
- Vision Transformers for Dense Prediction [Vladlen Koltun, Intel]
- Efficient Transformers: A Survey
- Do Vision Transformers See Like Convolutional Neural Networks?
- Progressive Coordinate Transforms for Monocular 3D Object Detection [mono3D]
- AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection ICCV 2021 [mono3D]
- BlazePose: On-device Real-time Body Pose tracking
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [Andy Zeng]
- Large Language Models as General Pattern Machines [Embodied AI]
- RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer
- PlaNet: Learning Latent Dynamics for Planning from Pixels ICML 2019
- Dreamer: Dream to Control: Learning Behaviors by Latent Imagination ICLR 2020 oral
- DreamerV2: Mastering Atari with Discrete World Models ICLR 2021 [World models]
- DreamerV3: Mastering Diverse Domains through World Models
- DayDreamer: World Models for Physical Robot Learning CoRL 2022
- JEPA: A Path Towards Autonomous Machine Intelligence
- I-JEPA: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture CVPR 2023
- Runway Gen-1: Structure and Content-Guided Video Synthesis with Diffusion Models
- IL Difficulty Model: Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula CoRL 2022 [Waymo]
- Decision Transformer: Reinforcement Learning via Sequence Modeling NeurIPS 2021 [LLM for planning]
- LID: Pre-Trained Language Models for Interactive Decision-Making NeurIPS 2022 [LLM for planning]
- Planning with Large Language Models via Corrective Re-prompting NeurIPS 2022 Workshop
- Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability ICCV 2023 [TuSimple]
- Speculative Sampling: Accelerating Large Language Model Decoding with Speculative Sampling [Accelerated LLM, DeepMind]
- Inference with Reference: Lossless Acceleration of Large Language Models [Accelerated LLM, Microsoft]
- EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments T-RO 2021
- Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching ICRA 2020
- StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
- SSCNet: Semantic Scene Completion from a Single Depth Image CVPR 2017
- SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences ICCV 2019
- PixPro: Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning [self-supervised]
- Pixel-Wise Contrastive Distillation [self-supervised]
- VICRegL: Self-Supervised Learning of Local Visual Features NeurIPS 2022
- ImageBind: One Embedding Space To Bind Them All CVPR 2023
- KEMP: Keyframe-Based Hierarchical End-to-End Deep Model for Long-Term Trajectory Prediction ICRA 2022 [Planning]
- Deep Interactive Motion Prediction and Planning: Playing Games with Motion Prediction Models L4DC [Planning]
- GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving [Planning]
- LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [Planning, Raquel]
- DIPP: Differentiable Integrated Motion Prediction and Planning with Learnable Cost Function for Autonomous Driving [Planning]
- Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios [Planning, Waymo]
- Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving IROS 2022 [Planning, Waymo]
- Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation ICRA 2022 [Planning, Waymo]
- JFP: Joint Future Prediction with Interactive Multi-Agent Modeling for Autonomous Driving [Planning, Waymo]
- MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation NeurIPS 2021
- 3D Semantic Scene Completion: a Survey IJCV 2022
- DETIC: Detecting Twenty-thousand Classes using Image-level Supervision ECCV 2022
- Atlas: End-to-End 3D Scene Reconstruction from Posed Images ECCV 2020
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers NeurIPS 2021
- SimpleOccupancy: A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving [Occupancy Network]
- OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion [Occupancy Network, stereo]
- Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception NeurIPS 2022
- Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline
- ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals CVPR 2023 [Qcraft, prediction]
- Motion Transformer with Global Intention Localization and Local Movement Refinement NeurIPS 2022 Oral
- P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving
- MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction
- ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries
- SAM: Segment Anything [FAIR]
- GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
- Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge [Encode Road requirement to prediction]
- Transformer Feed-Forward Layers Are Key-Value Memories EMNLP 2021
- BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline CVPR 2023 [BEVNet]
- Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception [BEVNet, megvii]
- VAD: Vectorized Scene Representation for Efficient Autonomous Driving [Horizon]
- A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving
- BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment [BEVDet, PhiGent]
- NVRadarNet: Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving
- GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping CVPR 2020 [Cewu Lu]
- AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains [Cewu Lu]
- Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
- HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding
- MTR: Motion Transformer with Global Intention Localization and Local Movement Refinement NeurIPS 2022
- UVTR: Unifying Voxel-based Representation with Transformer for 3D Object Detection [BEVFusion, Megvii, BEVNet, camera + lidar]
- Don't Use Large Mini-Batches, Use Local SGD ICLR 2020
- Grokking: Generalization beyond Overfitting on small algorithmic datasets
- Progress measures for grokking via mechanistic interpretability
- Understanding deep learning requires rethinking generalization ICLR 2017
- Unifying Grokking and Double Descent
- Deep Interactive Motion Prediction and Planning: Playing Games with Motion Prediction Models L4DC 2022
- Interactive Prediction and Planning for Autonomous Driving: from Algorithms to Fundamental Aspects [PhD thesis of Wei Zhan, 2019]
- Lyft1001: One Thousand and One Hours: Self-driving Motion Prediction Dataset [Lyft Level 5, prediction dataset]
- PCAccumulation: Dynamic 3D Scene Analysis by Point Cloud Accumulation ECCV 2022
- UniSim: A Neural Closed-Loop Sensor Simulator CVPR 2023 [simulation, Raquel]
- GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving CVPR 2023
- Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills [Driving Skill]
- Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors RSS 2023 [Driving Skill]
- IL Difficulty Model: Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula CoRL 2022 [Waymo]
- Neural Map Prior for Autonomous Driving CVPR 2023
- Track Anything: Segment Anything Meets Videos
- Self-Supervised Camera Self-Calibration from Video ICRA 2022 [TRI, calibration]
- Real-time Online Video Detection with Temporal Smoothing Transformers ECCV 2022 [ConvLSTM-style cross-attention]
- NeRF-Supervised Deep Stereo CVPR 2023
- GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images NeurIOS 2022
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation CVPR 2023
- Ego-Body Pose Estimation via Ego-Head Pose Estimation CVPR 2023
- PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- Visual Instruction Tuning
- VideoChat: Chat-Centric Video Understanding
- CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers CoRL 2022
- BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision [BEVNet, Jifeng Dai]
- Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception NeurIPS 2022
- Traj++: Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TITS 2021
- Data Driven Prediction Architecture for Autonomous Driving and its Application on Apollo Platform IV 2020 [Baidu]
- THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling ICLR 2022
- Learning Lane Graph Representations for Motion Forecasting ECCV 2020 oral
- Identifying Driver Interactions via Conditional Behavior Prediction ICRA 2021 [Waymo]
- Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data ECCV 2020
- TPNet: Trajectory Proposal Network for Motion Prediction CVPR 2020
- GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation
- PECNet: It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction ECCV 2020 oral
- From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting ICCV 2019
- PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings ICCV 2019
- PiP: Planning-informed Trajectory Prediction for Autonomous Driving ECCV 2020
- MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction CoRL 2019
- LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents CVPR 2021
- PRIME: Learning to Predict Vehicle Trajectories with Model-based Planning CoRL 2021
- A Flexible and Explainable Vehicle Motion Prediction and Inference Framework Combining Semi-Supervised AOG and ST-LSTM TITS 2020
- Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver based LSTMs IV 2018 [Trivedi]
- HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling ICRA 2022
- Trajectory Prediction with Linguistic Representations ICRA 2022
- What-If Motion Prediction for Autonomous Driving
- End-to-end Contextual Perception and Prediction with Interaction Transformer IROS 2020 [Auxiliary collision loss, scene compliant pred]
- SafeCritic: Collision-Aware Trajectory Prediction BMVC 2019 [IRL, scene compliant pred]
- Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset ICCV 2021 [Waymo]
- Interaction-Based Trajectory Prediction Over a Hybrid Traffic Graph IROS 2020
- Joint Interaction and Trajectory Prediction for Autonomous Driving using Graph Neural Networks NeurIPS 2019 workshop
- Fast Risk Assessment for Autonomous Vehicles Using Learned Models of Agent Futures Robotics: science and systems 2020
- Monocular 3D Object Detection: An Extrinsic Parameter Free Approach CVPR 2021 [PJLab]
- UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View [BEVFormer, BEVNet, Temporal]
- GitNet: geometric prior-baesd transformation for birds yee view segmentation
- WBF: weighted box fusion: ensembling boxes from differnt object detection modules
- NNI: auto parameter finding algorithm
- BEVFormer++: Improving BEVFormer for 3D Camera-only Object Detection [Waymo open dataset challenge 1st place in mono3d]
- LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D Detection [Waymo open dataset challenge official metric]
- High-Level Interpretation of Urban Road Maps Fusing Deep Learning-Based Pixelwise Scene Segmentation and Digital Navigation Maps Journal of Advanced Transportation 2018
- A Hybrid Vision-Map Method for Urban Road Detection Journal of Advanced Transportation 2017
- Terminology and Analysis of Map Deviations in Urban Domains: Towards Dependability for HD Maps in Automated Vehicles IV 2020
- TIME WILL TELL: NEW OUTLOOKS AND A BASELINE FOR TEMPORAL MULTI-VIEW 3D OBJECT DETECTION
- Conditional DETR for Fast Training Convergence ICCV 2021
- DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR ICLR 2022
- DN-DETR: Accelerate DETR Training by Introducing Query DeNoising CVPR 2022
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
- Trajectory Forecasting from Detection with Uncertainty-Aware Motion Encoding [Ouyang Wanli]
- Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation [BEVNet, polar]
- MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries [BEVNet, tracking] CVPR 2022 workshop [Hang Zhao]
- ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning ECCV 2022 [Hongyang Li]
- GKT: Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer [BEVNet, Horizon]
- SiamRPN: High Performance Visual Tracking with Siamese Region Proposal Network CVPR 2018
- TPLR: Topology Preserving Local Road Network Estimation from Single Onboard Camera Image CVPR 2022 [STSU, Luc Van Gool]
- LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation [Valeo, BEVNet, polar]
- PolarDETR: Polar Parametrization for Vision-based Surround-View 3D Detection [BEVNet]
- Exploring Geometric Consistency for Monocular 3D Object Detection CVPR 2022
- ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection WACV 2022 [mono3D]
- Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints AAAI 2022
- Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers ICICN 2021 [BEVNet, lane line]
- Unsupervised Labeled Lane Markers Using Maps ICCV 2019 workshop [Bosch, 2D lane line]
- M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [Lidar detection, Waymo open dataset] WACV 2022
- K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways [lane line dataset]
- Robust Monocular 3D Lane Detection With Dual Attention ICIP 2021
- OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction CVPR 2022
- MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer ICLR 2022 [lightweight Transformers]
- XFormer: Lightweight Vision Transformer with Cross Feature Attention [Samsung]
- CenterFormer: Center-based Transformer for 3D Object Detection ECCV 2022 oral [TuSimple]
- LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception [2022 Waymo Open Dataset, TuSimple]
- MTRA: 1st Place Solution for 2022 Waymo Open Dataset Challenge - Motion Prediction [Waymo open dataset challenge 1st place in motion prediction]
- BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs [BEVNet]
- Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers CVPR 2022 [nVidia]
- Efficiently Identifying Task Groupings for Multi-Task Learning NeurIPS 2021 spotlight [MTL]
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [Google, Golden Backbone]
- "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping CVPR 2022
- GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [BEVNet, Baidu]
- FUTR3D: A Unified Sensor Fusion Framework for 3D Detection [Hang Zhao]
- GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [BEVNet]
- MonoFormer: Towards Generalization of self-supervised monocular depth estimation with Transformers [monodepth]
- Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving
- cosFormer: Rethinking Softmax in Attention ICLR 2022
- StretchBEV: Stretching Future Instance Prediction Spatially and Temporally [BEVNet, prediction]
- Scene Representation in Bird’s-Eye View from Surrounding Cameras with Transformers [BEVNet, LLD] CVPR 2022 workshop
- Multi-Frame Self-Supervised Depth with Transformers CVPR 2022
- It's About Time: Analog Clock Reading in the Wild CVPR 2022 [Andrew Zisserman]
- SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation CoRL 2022 [Jiwen Lu]
- ONCE-3DLanes: Building Monocular 3D Lane Detection CVPR 2022
- K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways CVPR 2022 workshop [3D LLD]
- Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving CVPR 2022 workshop
- A Simple Baseline for BEV Perception Without LiDAR [TRI, BEVNet, vision+radar]
- Reconstruct from Top View: A 3D Lane Detection Approach based on Geometry Structure Prior CVPR 2022 workshop
- RIDDLE: Lidar Data Compression with Range Image Deep Delta Encoding CVPR 2022 [Waymo, Charles Qi]
- Occupancy Flow Fields for Motion Forecasting in Autonomous Driving RAL 2022 [Waymo occupancy flow challenge]
- Safe Local Motion Planning with Self-Supervised Freespace Forecasting CVPR 2021
- 数据闭环的核心 - Auto-labeling 方案分享
- K-Lane: Lidar Lane Dataset and Benchmark for Urban Roads and Highways
- LETR: Line Segment Detection Using Transformers without Edges CVPR 2021 oral
- HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps CVPR 2021 [HD mapping]
- SketchRNN: A Neural Representation of Sketch Drawings [David Ha]
- PolyGen: An Autoregressive Generative Model of 3D Meshes ICML 2020
- SOLQ: Segmenting Objects by Learning Queries NeurlPS 2021 [Megvii, end-to-end, instance segmentation]
- MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer 3DV 2022
- MVSTER: Epipolar Transformer for Efficient Multi-View Stereo ECCV 2022
- MOVEDepth: Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning [MVS + monodepth]
- SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
- Scene Transformer: A unified architecture for predicting multiple agent trajectories [prediction, Waymo] ICLR 2022
- SSIA: Monocular Depth Estimation with Self-supervised Instance Adaptation [VGG team, TTR, test time refinement, CVD]
- CoMoDA: Continuous Monocular Depth Adaptation Using Past Experiences WACV 2021
- MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera CVPR 2021 [Daniel Cremmers]
- Plenoxels: Radiance Fields without Neural Networks
- Lidar with Velocity: Motion Distortion Correction of Point Clouds from Oscillating Scanning Lidars [Livox, ISEE]
- NWD: A Normalized Gaussian Wasserstein Distance for Tiny Object Detection
- Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation NeurIPS 2021 [Sanja Fidler]
- Insta-DM: Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency AAAI 2021
- Instance-wise Depth and Motion Learning from Monocular Videos NeurIPS 2020 workshop [website]
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis ECCV 2020 oral
- BARF: Bundle-Adjusting Neural Radiance Fields ICCV 2021 oral
- NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo ICCV 2021 oral
- YOLinO: Generic Single Shot Polyline Detection in Real Time ICCV 2021 workshop [lld]
- MonoRCNN: Geometry-based Distance Decomposition for Monocular 3D Object Detection ICCV 2021
- MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation ICCV 2021 workshop
- PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection CVPR 2020 [Waymo challenge 2nd place]
- Geometry-based Distance Decomposition for Monocular 3D Object Detection ICCV 2021 [mono3D]
- Offboard 3D Object Detection from Point Cloud Sequences CVPR 2021 [Charles Qi]
- FreeAnchor: Learning to Match Anchors for Visual Object Detection NeurIPS 2019
- AutoAssign: Differentiable Label Assignment for Dense Object Detection
- Probabilistic Anchor Assignment with IoU Prediction for Object Detection ECCV 2020
- FOVEA: Foveated Image Magnification for Autonomous Navigation ICCV 2021 [Argo]
- PifPaf: Composite Fields for Human Pose Estimation CVPR 2019
- Monocular 3D Localization of Vehicles in Road Scenes ICCV 2021 workshop [mono3D, tracking]
- TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- Conditional DETR for Fast Training Convergence
- Anchor DETR: Query Design for Transformer-Based Detector [megvii]
- PGD: Probabilistic and Geometric Depth: Detecting Objects in Perspective CoRL 2021
- Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
- What Makes for End-to-End Object Detection? PMLR 2021
- Instances as Queries ICCV 2021 [instance segmentation]
- One Million Scenes for Autonomous Driving: ONCE Dataset [Huawei]
- NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis 3DV 2021
- Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?
- Topology Preserving Local Road Network Estimation from Single Onboard Camera Image [BEVNet, Luc Van Gool]
- Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine [Small LLM prompting, Microsoft]
- CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models NeurIPS 2022
- ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models [Notes] NeurIPS 2023 Oral
- Cumulative Reasoning with Large Language Models
- A Survey of Techniques for Maximizing LLM Performance [OpenAI]
- Drive AGI
- Harnessing the Power of Multi-Modal LLMs for Autonomy [Ghost Autonomy]
- Language to Rewards for Robotic Skill Synthesis
- ALOHA: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
- LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent [UM]
- LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [Sergey Levine]
- A Survey of Embodied AI: From Simulators to Research Tasks IEEE TETCI 2021
- Habitat Challenge 2021
- Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
- DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment [Jianyu Chen]
- The Power of Scale for Parameter-Efficient Prompt Tuning EMNLP 2021
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents ICML 2022
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models ICRA 2023
- Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation CoRL 2022
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale NeurIPS 2022 [LLM Quant]
- AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration [Song Han, LLM Quant]
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- CoDi: Any-to-Any Generation via Composable Diffusion NeurIPS 2023
- What if a Vacuum Robot has an Arm? UR 2023
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- GPT in 60 Lines of NumPy
- Speeding up the GPT - KV cache
- LLM Parameter Counting
- Transformer Inference Arithmetic
- ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation NeurIPS 2021 [Junnan Li]
- CLIP: Learning Transferable Visual Models From Natural Language Supervision ICLR 2021 [OpenAI]
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation ICML 2022 [Junnan Li]
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [Junnan Li]
- MOO: Open-World Object Manipulation using Pre-trained Vision-Language Models [Google Robotics, end-to-end visuomotor]
- VC-1: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
- CLIPort: What and Where Pathways for Robotic Manipulation CoRL 2021 [Nvidia, end-to-end visuomotor]
- GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers ICLR 2023
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models ICML 2023 [Song Han, LLM Quant]
- SAPIEN: A SimulAted Part-based Interactive ENvironment CVPR 2020
- FiLM: Visual Reasoning with a General Conditioning Layer AAAI 2018
- TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? NeurIPS 2021
- QLoRA: Efficient Finetuning of Quantized LLMs
- OVO: Open-Vocabulary Occupancy
- Code Llama: Open Foundation Models for Code
- Chinchilla: Training Compute-Optimal Large Language Models [DeepMind]
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
- Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
- VIMA: General Robot Manipulation with Multimodal Prompts
- An Attention Free Transformer [Apple]
- PDDL Planning with Pretrained Large Language Models [MIT, Leslie Kaelbling]
- Task and Motion Planning with Large Language Models for Object Rearrangement IROS 2023