Collect 0-5 deep learning papers with milestone impact each year (1986-Now). Actively keep updating
- [2025] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DeepSeek-R1). [paper]
- [2024] The Llama 3 Herd of Models (Llama 3). [paper]
- [2023] GPT-4 Technical Report (GPT-4). [paper]
- [2023] Mamba: Linear-time sequence modeling with selective state spaces (Mamba). [paper]
- [2022] Training language models to follow instructions with human feedback (InstructGPT, GPT-3.5, NeurIPS 2021). [paper]
- [2022] Masked Autoencoders Are Scalable Vision Learners (MAE, CVPR 2022). [paper]
- [2022] High-Resolution Image Synthesis with Latent Diffusion Models (StableDiffusion, CVPR 2022). [paper]
- [2022] LoRA: Low-Rank Adaptation of Large Language Models (ICLR 2022). [paper]
- [2021] Hierarchical Vision Transformer using Shifted Windows (Swin, ICCV 2021). [paper]
- [2021] An image is worth 16x16 words: Transformers for image recognition at scale (ViT, ICLR 2021). [paper]
- [2021] Learning Transferable Visual Models From Natural Language Supervision (CLIP, ICML 2021). [paper]
- [2020] Language Models are Few-Shot Learners (GPT-3, NeurIPS 2020). [paper]
- [2020] Denoising Diffusion Probabilistic Models (Diffusion, NeurIPS 2020). [paper]
- [2019] Decoupled Weight Decay Regularization (AdamW, ICLR 2019). [paper]
- [2018] Improving language understanding by generative pre-training (GPT-1). [paper]
- [2018] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Bert). [paper]
- [2017] Mastering the game of Go without human knowledge (AlphaGOZero, Nature 2017). [paper]
- [2017] Attention Is All You Need (Transformer, NeurIPS 2017). [paper]
- [2017] Pointnet: Deep learning on point sets for 3d classification and segmentation (PointNet, CVPR 2017). [paper]
- [2017] Mask R-CNN (ICCV 2017). [paper]
- [2016] Neural Architecture Search with Reinforcement Learning (NAS). [paper]
- [2016] Mastering the game of Go with deep neural networks and tree search (AlphaGo, Nature 2016). [paper]
- [2016] Deep Residual Learning for Image Recognition (ResNet, CVPR 2016). [paper]
- [2016] You only look once: Unified, real-time object detection (YOLO, CVPR 2016). [paper]
- [2015] Deep learning (Nature 2015). [paper]
- [2015] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (BN, ICML 2015). [paper]
- [2015] Adam: A Method for Stochastic Optimization (Adam). [paper]
- [2015] U-Net: Convolutional Networks for Biomedical Image Segmentation (U-Net). [paper]
- [2015] Very deep convolutional networks for large-scale image recognition (VGG, ICLR 2015). [paper]
- [2014] Generative Adversarial Nets (GAN, NeurIPS 2014). [paper]
- [2014] Neural Machine Translation by Jointly Learning to Align and Translate (Attention). [paper]
- [2014] Dropout: a simple way to prevent neural networks from overfitting (Dropout). [paper]
- [2014] Sequence to Sequence Learning with Neural Networks (Seq2seq, NeurIPS 2014). [paper]
- [2014] Distilling the Knowledge in a Neural Network (Knowledge Distillation). [paper]
- [2013] Distributed Representations of Words and Phrases and their Compositionality (word2vec, NeurIPS 2013). [paper]
- [2013] Playing Atari with Deep Reinforcement Learning (Q-learning). [paper]
- [2013] auto-encoding variational bayes (VAE). [paper]
- [2012] Imagenet classification with deep convolutional neural networks (AlexNet, NeurIps 2012). [paper]
- [2011] Deep Sparse Rectifier Neural Networks (ReLU). [paper]
- [2009] Imagenet: A large-scale hierarchical image database (CVPR 2009). [paper]
- [1998] Gradient-based learning applied to document recognition (CNN, Proceedings of the IEEE 1998). [paper]
- [1997] Long Short-Term Memory (LSTM, Neural Computation 1997). [paper]
- [1990] Finding Structure in Time (RNN). [paper]
- [1986] Learning internal representations by error propagation (BP, Biometrika 1986). [paper]
- The collection of papers is somewhat subjective and limited in knowledge. Sorry for any possible omissions.
- Before this list, there exist [another awesome deep learning list].