| layout | mathjax | title |
|---|---|---|
page |
true |
Machine Learning |
- Michael Nielsen: Neural Networks and Deep Learning
- Artificial Intelligence: A Modern Approach, by Russell and Norvig (4th edition, 2020)
- Deep learning theory lecture notes, Matus Telgarsky
- Fundamentals of Machine Learning for Predictive Data Analytics, J.D. Kelleher et al (2020)
- Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, B. Scholkopf, A. Smola (2001)
- Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies, D. Floreano, C. Mattiussi (2008)
- Deep Learning, I. Goodfellow, Y. Bengio, A. Courville (2016)
- Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production, Andres Rodriguez (2020)
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, by Aurelien Geron (3rd ed, 2022) (Jupyter notebooks)
- Python Machine Learning Cookbook, Chris Albon (2018)
- Learning Machine Learning, M. Ekman (2021), github
- fast.ai, Deep Learning for Coders with Fastai and PyTorch, J. Howard and S. Gugger (2020)
- Advanced Applied Deep Learning, CNNs and Object Detection, by U. Michelucci
- Python Deep Learning, Exploring deep learning techniques and neural network architectures with PyTorch, Keras, and TensorFlow, I. Vasiliev el al (2019), github
- M. Kochenderfer et al: Algorithms for Decision Making (2022)
- IJCAI keynote talk: Automated Decision Making for Safety Critical Applications (2021)
- A. Zheng, A. Casari: Feature Engineering (2018)
- S. Raschka et al: Machine Learning with PyTorch and Scikit-Learn, github, Andrei's fork
- Jake VanderPlas: Python Data Science Handbook, colab
- Pattern Recognition and Machine Learning, C. Bishop, pdf
- Kevin P. Murphy
- Machine Learning: A Probabilistic Perspective, Kevin Murphy (2012)
- Probabilistic Machine Learning: An Introduction (2022)
- Probabilistic Machine Learning: Advanced Topics (2023)
- Statistical Learning Theory, Vladimir Vapnik (1998)
- The Nature of Statistical Learning Theory, V. Vapnik(1998)
- Bayesian Networks and Decision Graphs, T.D. Nielsen, F.V. Jensen (2007)
- P. Abbeel: Foundations of Deep RL in 6 Lectures (2021), slides
- University of Amsterdam: UVA Deep Learning Course, UVA Deep Learning Tutorials
- Yann LeCun
- Deep Learning course at College de France (2016)
- Deep Learning Course at CDS, Andrei's notes
- CMU
- Advanced NLP 2022
- 10-704: Information Processing and Learning (Spring 2012)
- G. Hinton: Neural Networks for Machine Learning (2012)
- G. Hulten: Machine Learning Course (2021). Andrei's notes.
- Berkeley
- CS287-FA19: Advanced Robotics (2020), youtube, P. Abbeel
- CS294-158-SP20: Deep Unsuperviser Learning (Spring 2020), youtube (Spring 2019)
- Caltech
- Jeremy Bernstein: Neural Architecture Design (Spring 2021)
- Cornell
- Volodymyr Kuleshov: CS 5787: Applied Machine Learning, github (2020)
- DeepMind: David Silver: Introduction to Reinforcement Learning
- FastAI: Practical Deep Learning for Coders, Jeremy Howard et al.
- Andrej Karpathy: Neural Networks: Zero to Hero
- Hugo Larochelle: math heavy Neural networks class, Université de Sherbrooke]
- MIT
- Oxford
- Deep Learning for Natural Language Processing (2016-2017)
- Sebastian Raschka:
- Introduction to Machine Learning - Tree-based Methods, Model Evaluation, and Feature Selection
- Introduction to Deep Learning, 170 Video Lectures from Adaptive Linear Neurons to Zero-shot Classification with Transformers
- Github: stat453-deep-learning-ss21, deeplearning-models, Andrei's fork
- Stanford
- CS221: Artificial Intelligence: Principles and Techniques (Autumn 2019), syllabus,video
- CS229 - Machine Learning, Autumn 2018 video , slides, Summer 2021 video, slides, Andrei's notes
- CS231n: Convolutional Neural Networks for Visual Recognition (Spring 2017), syllabus, 2016 video, 2017 videos, 2017 slides, 2021 slides
- Karpathy's ConvNetJS CIFAR-10 demo
- CS236: Deep Generative Models (slides only)
- CS224n:Natural Language Processing with Deep Learning Winter 2017 video, Winter 2019 video, Winter 2021 video
- CS224W: Machine Learning with Graphs Spring 2021 video
- C. Huen: A survivor's guide to AI courses at Stanford (2020)
- In-Person AI Events in Greater Boston #BostonAIevents
- The list is curated by Dan Elton and Paul Baier.
- Jax
- PyTorch
- scikit-learn
- Spark.ml and Apache Ignite ML
- TensorFlow
- Others: Caffe(Berkeley), Caffe2(Facebook), MXNet(Amazon), CNTK(Microsoft), Paddle(Baidu)
- FBLearner Flow
- Gradio, demos on Colab. Can be embedded in Huggingface.
- MosaicML
See GPUs page
- Robbie Allen: Over 200 of the Best Machine Learning, NLP, and Python Tutorials (2018)
- 3Blue1Brown Series
- sentdex: Deep Learning and Neural Networks with Python and Pytorch
- Welch Labs
- MIT 6.5191 Introduction to Deep Learning (2020)
- Convolutional Neural Networks, Alexander Amini
- Deep Generative Modeling, Ava Soleimany
- Deep Reinforcement Learning, Alexander Amini
- J. Frankle, M. Carbin:The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (2019)
- G. Hinton: What is wrong with convolutional neural nets ? (2017)
- G. Hinton: Artificial Intelligence: Turning our understanding of the mind right side up (2017)
- Y. LeCun: The Epistemology of Deep Learning, IAS (2019)
- Normalized Nerd
- Lex Fridman
- Whisper captions by Andrej Karpathy
- Ian Goodfellow: Generative Adversarial Networks (GANs), Lex Fridman Podcast #19 (2019)
- Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning, Lex Firdman Podcast #36 (2019)
- Stephen Wolfram: Cellular Automata, Computation, and Physics, Lex Fridman Podcast #89 (2020)
- David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning, Lex Fridman Podcast #86 (2020)
- Andrew Ng: Deep Learning, Education, and Real-World AI, Lex Fridman Podcast #73 (2020)
- Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning, Lex Fridman Podcast #258 (2022)
- Demis Hassabis: DeepMind - AI, Superintelligence & the Future of Humanity (2022)
- Chris Lattner: Future of Programming and AI (2023)
- Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet (2024)
- Swix: Bringing ML to the data, and Minimum Viable DevRel — Montana Low, PostgresML (2023)
- Elad Gil:
- Fireside Chat: Emad Mostaque, CEO of Stability AI (2022)
- Andrej Karpathy:
- Stephanie Zhan, Sequoia Capital: Making AI accessible with Andrej Karpathy (2024)
- G. Chen: Math 253: Mathematical Methods for Data Visualization: Lec 11, LDA, explains math behind LDA
- B. Ghojogh: Eigenvalue and Generalized Eigenvalue Problems: Tutorial (2022)
- V. Lavrenko: Mixture Models (2014)
- Andrew Ng, Stanford CS229: L14: Expectation-Maximization (2019)
- Medium: Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups, Thomas Wolf (2018)
- Towards Data Science: Distributed Neural Network Training In Pytorch, Nilesh Vijayrania (2020)
- Serge-Paul Carrasco: Distributed and Declarative Deep Learning Systems (2021)
- Ludwig: A type-based declarative deep learning toolbox
- Horovod
- Docs
- Horovod: Multi-GPU and multi-node data parallelism
- Deep Learning at Scale with Horovod feat. Travis Addair, Stanford MLSys Seminar Episode 10 (2021)
- determined.ai
- DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster, C. Lin et al (2019)
- Analysis and Comparison of Distributed Training Techniques for Deep Neural Networks in a Dynamic Environment, E. Gebremeskel (2018)
- Bringing HPC Techniques to Deep Learning, Andrew Gibiansky (2017)
- Fast Multi-GPU collectives with NCCL, Nathan Luehr, NVidia (2016)
- Fully Sharded Data Parallel: faster AI training with fewer GPUs, M. Ott et al (2021)
- CS231n Lecture 7 (2017)
- Practical Recommendations for Gradient-Based Training of Deep Architectures, Y. Bengio (2012)
- On the importance of initialization and momentum in deep learning, I. Sutskever et al (2013)
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Y. Dauphin et al (2014)
- optim.Adam vs optim.SGD. Let’s dive in
- fast.ai: AdamW and Super-convergence is now the fastest way to train neural nets , by S. Gugger and J. Howard (2018)
- S. Raschka L12: Learning rates and advanced optimization algorithms (2020)
- P. Wirth: Which Optimizer should I use for my ML Project? (2020)
- How to Avoid Overfitting in Deep Learning Neural Networks, J. Brownlee (2018)
- D.P. Kingma, M. Welling Auto-Encoding Variational Bayes (2015)
- Sebastian Raschka: Introduction to Deep Learning (2021)
- Valerio Velardo: The Sound of AI
- A. Geron: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (Chap 17)
- CS229:
- L20 - Variational Autoencoders (Summer 2019)
- The EM algorithm
- M2L:
- D. Bei et al: Variational Inference: A Review for Statisticians
- London Machine Learning Meetup: Max Welling - Make VAEs Great Again: Unifying VAEs and Flows (2020)
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurelien Geron (3rd ed, 2022), Chap 17
- S. Reed et al: Generative Adversarial Text to Image Synthesis (2016)
- T. Karras et al: A Style-Based Generator Architecture for Generative Adversarial Networks (2018)
- Stable Diffusion: DALL-E 2 For Free, For Everyone!
- Tutorial on diffusion (3.5 hrs)
- MIT 6.S192 - Lecture 20: Generative art using diffusion, by Prafulla Dhariwal
- M2L:
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurelien Geron (3rd ed, 2022), Chap 17
- Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications (3.5 hrs)
- MIT 6.S192 - Lecture 20: Generative art using diffusion, Prafulla Dhariwal, OpenAI
- Stable Diffusion: DALL-E 2 For Free, For Everyone! (2022)
- Huggingface: Stable Diffusion 2.1 Demo
- OpenAI's A. Nichols et al: Point·E: A System for Generating 3D Point Clouds from Complex Prompts (2022)
- Lil'Log: What are Diffusion Models? (2021)
- M. Welling, Y.W. Teh: Bayesian Learning via Stochastic Gradient Langevin Dynamics (2011). Compared to standard SGD, stochastic gradient Langevin dynamics injects Gaussian noise into the parameter updates to avoid collapses into local minima.
- Yannic Kilcher: OpenAI CLIP: ConnectingText and Images (Paper Explained) (2022)
- Meta: R. Girdhar et al: ImageBind: a new way to ‘link’ AI across the senses (2023)
- OpenAI: Heewoo Jun, Alex Nichol: Shap-E: Generating Conditional 3D Implicit Functions (2023), openai/shap-e
- Rethinking the Inception Architecture for Computer Vision, C. Szegedy el at (2015)
- Blog: Speeding up Convolutional Neural Networks, Alex Burlacu (2018)
- M. Tang et al: EfficientNetV2: Smaller Models and Faster Training (2021), github
- Texture Synthesis Using Convolutional Neural Networks, L.A.Gatys et al (2016) code
- Image Style Transfer Using Convolutional Neural Networks, L.A.Gatys et al (2016), torch models by J.C.Johnson
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution, J. Johnson et al (2016)
- Michael Bronstein: Geometric Deep Learning: the Erlangen Programme of ML (2021)
- Geometric Deep Learning: Grids, Groups, Graphs,Geodesics, and Gauges, M. Bronstein et al (2021)
- ML Street Talk #60: Geometric Deep Learning Blueprint (2021)
- Max Welling
- ML Street Talk #36: Max Welling: Quantum, Manifolds & Symmetries in ML (2021)
- IAS Seminar on Theoretical ML: Graph Nets: The Next Generation (2020)
- Machine Learning Street Talk #75: Emergence with Danielle Grattarola (2022)
- Bruno Gavranovic et al: Categorical Deep Learning: An Algebraic Theory of Architectures (2024)
- R. Sutton, A. Barto: Reinforcement Learning, second edition: An Introduction (2018)
- OpenAI Spinning Up
- L. Ouyang el al: Training language models to follow instructions with human feedback (2022)
- TalkRL: John Schulman interview (2022)
- AIXI
- Marcus Hutter: Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability (2005)
- Shane Legg: Machine Super Intelligence (2008)
- AI Channel: DeepMind's Shane Legg - Machine Super Intelligence (2009)
- J. Bach: When Artificial Intelligence Becomes General Enough to Understand Itself. Commentary on Pei Wang’s paper "On Defining Artificial Intelligence" (2020)
- A. Franz et al: A theory of incremental compression (2020)
- Y. LeCun: A Path Towards Autonomous Machine Intelligence, draft (2022), tweet
- Silver, Singh, Sutton: Reward is enough (2021)
- D. Ha, J. Schmidthuber: World Models (2018)
- Causality for Machine Learning, Bernhard Schölkopf (2019)
- Y. Bengio talk: Deep Learning Cognition (2020)
- torchtext Release Notes, examples
- Tutorial: Migrate torchtext from the legacy API to the new API
- J. Geiping, T. Goldstein: Cramming: Training a Language Model on a Single GPU in One Day (2022), github, review by Lucas Beyer
- Towards Data Science: Animated RNN, LSTM and GRU, by R. Karim (2018)
- Towards Data Science: Counting No. of Parameters in Deep Learning Models by Hand, by R. Karim (2019)
- MIT 6.S191: Recurrent Neural Networks and Transformers (2022)
- Leo Dirac: LSTM is dead. Long Live Transformers! (2019)
- Sebastian Raschka: L19.5.1 The Transformer Architecture
- Towards Data Science: How to code The Transformer in Pytorch, by S. Lynn-Evans (2018)
- Lucas Beyer: Transformers, Mediterranean ML Summer School 2022 seminar
- Lil'Log: Large Transformer Model Inference Optimization (2023)
- Papers
- J. von Oswald et al: Transformers learn in-context by gradient descent (2022)
- R. Pope et al: Efficiently scaling transformer inference (2022)
- A Tutorial on Energy-Based Learning, Y. LeCun et al (2006)
- Y. LeCun: Energy-Based Self-Supervised Learning, IPAM (2019), slides
- The Physics of Energy-Based Models, P. Huembeli et al (2021)
- A. Dawid et al: Modern applications of machine learning in quantum sciences (2022)
- M.A. Carreira-Perpinan, G.E. Hinton: On Contrastive Divergence Learning (2005)
- B.A. Cipra: An Introduction to the Ising Model (1987), AMM Monthly. Finally I can understand what the Ising Model is about.
- E. Aurell, M. Ekberg: Inverse Ising inference using all the data (2012)
- Surya Ganguli: Statistical mechanics of neural networks (2022), 2nd part
- Jonathan Frankle: Jonathan Frankle: Neural Network Pruning and Training (2023)
- G. Hinton: The Forward-Forward Algorithm: Some Preliminary Investigations, talk (2022)
- S. Alshammari et al: I-CON: A Unifying Framework for Representation Learning (2025), openreview
- X. Ni et al: Prioritizing Original News on Facebook (2021)
- Facebook: Harmful content can evolve quickly. Our new AI system adapts to tackle it. (2021)
- S. Wang et al: Entailment as Few-Shot Learner (2021)
- Facebook: How AI is getting better at detecting hate speech (2020)
- Lil'Log: How to Train Really Large Models on Many GPUs? (2021)
- Lilian Weng, Greg Brockman: Techniques for Training Large Neural Networks (2022)
- S. Li et al: PyTorch Distributed: Experiences on Accelerating Data Parallel Training (2020)
- How neural networks learn from experience, G. Hinton (1992)
- Neural networks and physical systems with emergent collective computational abilities, J. J. Hopfield (1982)
- Reducing the Dimensionality of Data with Neural Networks, G. E. Hinton and R. R. Salakhutdinov (2006), using Bolzmann machines to initialize weights close to a good solution
- ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton (2012), describes the AlexNet Conv network.
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, C. Finn, P. Abbeel, S. Levine (2017)
- S. Khodadadeh: Model Agnostic Meta Learning (2018)
- Deep Learning Explainer: Toward Efficient Learning: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2020)
- Meta-Learning with Implicit Gradients, A. Rajeswaran et al (2019), video
- The Mechanics of n-Player Differentiable Games, D. Balduzzi et al (2018)
- Simple, Distributed, and AcceleratedProbabilistic Programming, D. Tran et al (2018)
- Machine Theory of Mind, N.C. Rabinowitz et al (2018)
- Recent Advances in Deep Learning for Object Detection, X. Wu (2019)
- Online Bayesian Goal Inference for Boundedly-Rational Planning Agents, T. Zhi-Xuan et al (2020)
- Open Problems in Cooperative AI, A. Dafoe et al (2020)
- Rethinking the maturity of artificial intelligence in safety-critical settings, M.L. Cummings, 2019
- Sendtex tutorials at https://pythonprogramming.net
- Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis, Bel-Nun, Hoefler (2018), video
- Medium: RegNet or How to methodologically design effective networks, Chris Ha (2000)
- Hands-on Bayesian Neural Networks - a Tutorial for Deep Learning Users, L. V. Jospin et al (2021)
- R. Ghugare et al: Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (2022), talk, code
- J. Tenenbaum et al: 3DP3: 3D Scene Perception via Probabilistic Programming (2021)
- Why do tree-based models still outperform deep learning on tabular data?, Léo Grinsztajn et al (2022)
- M. Richardson, P. Domingos: Building Large Knowledge Bases by Mass Collaboration (2023)
- S. Raschka: Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning (2020)
- Colah's blog
- distill.pub
- MuZero and The Evolution of AlphaGo to MuZero
- McGill COMP-424 Intro to AI Lecture Notes (Doina Precup, 2013), Lecture 16, Why does a finite MDP optimal policy exist?
- Open AI: [Safety Gym] (https://openai.com/blog/safety-gym/) (2019)
- OpenAI Baselines, a set of high-quality implementations of reinforcement learning algorithms
- ConvnetJS demo: Toy 2d classification with 2-layer neural network, A. Karpathy
- Adrian Rosebrock tutorials
- The Important Definitions of Rafael Padilla repo is a very good introduction to the relationship between IOU, precision/recall, PR curve, average precision
- Ben Dickson: The challenges of applied machine learning (2021)
- Jonathan Hui: How to start a Deep Learning project? (2018)
- Georgii Evtushenko: Multi-GPU Programming
- Mihail Eric: MLOps Is a Mess But That's to be Expected (2022)
- Matt Turk: Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape
- A. Kumar, I. Kostrikov, S. Levine: Should I Use Offline RL or Imitation Learning? (2022)
- Giuliano Giacaglia: How Transformers Work (2019)
- Sebastian Raschka Ahead of AI (2022)
- Simon Willison Weblog: Large language models are having their Stable Diffusion moment
- Google leak: "We have no moat, and neither does OpenAI" (2023)
- There's an AI for that
- Derrick Harris, Matt Bornstein, Guido Appenzeller: The AI Canon (2023)
- NeurIPS 2023
- A Guide to NeurIPS 2023 — 7 Research Areas & 10 Spotlight Papers to See, blog post
- Tim Dettmers et al: QLoRA: Efficient Finetuning of Quantized LLMs
- R. Rafailov et al: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- S. Malladi et al: Fine-Tuning Language Models with Just Forward Passes, ML in 2 summary
- Niklas Muennighoff et al: Scaling Data-Constrained Language Models
- Kingma and Gao: Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
- D. Sculley et al: Hidden Technical Debt in Machine Learning Systems (2015)
- Berkeley: Full Stack Deep Learning: Lecture 6: MLOps Infrastructure & Tooling
- P. Barham, A. Chowdhery, J. Dean et al: Pathways: Asynchronous Distributed Dataflow for ML (2022)
- std::bodun::blog: Pathways: Google's New ML System (2022)
- NVIDIA NeMo Megatron
- Microsoft DeepSpeed
- Natan Benaich: State of AI 2024