Applied Machine Learning

Lecturer: Hossein Hajiabolhassan Data Science Center Shahid Beheshti University
Teaching Assistants:
Zahra Taheri	Ali Hojatnia	Yavar T. Yeganeh

Fatemeh Amanian	Mahdis Hosseini	Sohrab Faridi

Index:

Course Overview
TextBooks
Slides and Papers
1. Lecture 1: Toolkit Lab (Part 1)
2. Lecture 2: Introduction
3. Lecture 3: Empirical Risk Minimization
4. Lecture 4: PAC Learning
5. Lecture 5: The Bias-Complexity Tradeoff
6. Lecture 6: Learning via Uniform Convergence
7. Lecture 7: The VC-Dimension
8. Lecture 8: Toolkit Lab (Part 2)
9. Lecture 9: Linear Predictors
10. Lecture 10: Decision Trees
11. Lecture 11: Nearest Neighbor
12. Lecture 12: Ensemble Methods
13. Lecture 13: Model Selection and Validation
14. Lecture 14: Neural Networks
15. Lecture 15: Convex Learning Problems
16. Lecture 16: Regularization and Stability
17. Lecture 17: Support Vector Machines
18. Lecture 18: Multiclass Classification
- Additional NoteBooks and Slides
Class Time and Location
- Recitation and Assignments
Projects
Grading
- Two Written Exams
Prerequisites
Topics
Account
Academic Honor Code
Questions
Miscellaneous

Course Overview:

Machine learning is an area of artificial intelligence that provides systems the ability to 
automatically learn. Machine learning allows machines to handle new situations via analysis, 
self-training, observation and experience. The wonderful success of machine learning has made 
it the default method of choice for artificial intelligence experts. In this course, we review 
the fundamentals and algorithms of machine learning.

TextBooks:

Main TextBooks:

Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David
An Introduction to Statistical Learning: with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd Edition) by Aurelien Geron

Additional TextBooks:

Pattern Recognition and Machine Learning by Christopher Bishop
Hands-On Machine Learning with R by Bradley Boehmke and Brandon Greenwell

Slides and Papers:

Recommended Slides & Papers:

Toolkit Lab (Part 1: Anaconda, Jupyter Lab, Markdown, Git, GitHub, and Google Colab)

Required Reading:

Anaconda, Jupyter Lab, Markdown, Git, GitHub, and Google Colab:

Blog: Managing Environments
Blog: Kernels for Different Environments
Slide: Practical Data Science: Jupyter NoteBook Lab by Zico Kolter
Awesome JupyterLab by Hai Nguyen Mau
Blog: Learn Markdown Online
Slide: An Introduction to Git by Politecnico di Torino
Blog: Google Colab Free GPU Tutorial by Fuat

Teaching Assitant Class:
Python continues to take leading positions in solving data science tasks and challenges. Here are three of the most important of libraries.
Numpy is the fundamental package for scientific computing with Python.	Pandas is an easy-to-use data structures and data analysis tools	Matplotlib is a Python 2D plotting library
Resources:
Scipy Lecture Notes	Data Science iPython Notebooks

Homeworks: Python Libraries for Data Science
- Exercises: Practice Numpy in LabEx
- Exercises: Practice Pandas in LabEx
- Exercises: Practice Matplotlib in LabEx

Suggested Reading:

Tools in Data Science
28 Jupyter Notebook Tips, Tricks, and Shortcuts by Josh Devlin
Cheat Sheet: Markdown Syntax
Git Cheat Sheet
R Tutorial for Beginners: Learning R Programming

Additional Resources:

PDF: Conda Cheat Sheet
Blog: Conda Commands (Create Virtual Environments for Python with Conda) by LipingY
Blog: Colab Tricks by Rohit Midha
Paper: Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence by Sebastian Raschka, Joshua Patterson, and Corey Nolet
The following table was adopted from Applied Machine Learning and Deep Learning created by Cuixian Chen

Python Overview [Word] Python Tutorial [PDF] [Code]	Numpy [PDF] [Code] User Guide [Link] Quickstart [Link] Reference [Link] Practice Numpy in LabEx [Link] Cheatsheet [Link]	Matplotlib [PDF][Code] Example [Link] Tutorials [Link] Reference [Link] Practice Matplotlib in LabEx [Link] Cheatsheet [Link]
Pandas [Code] 10 Min to Pandas [Link] Cookbook [Link] Tutorials [Link] Reference [Link] Practice Pandas in LabEx [Link] Cheatsheet [Link]	Seaborn: Stat data Visulization [Link] Example [Link] Tutorials [Link] Reference [Link] Cheatsheet [Link]	Scikit Learn [Link] Scikit Image [Link] Scikit Tutorial #1 [Code] Scikit Tutorial #2 [Code] Cheatsheet [Link]

Introduction
```
Required Reading:
```
- Introduction
  Chapter 1 of Understanding Machine Learning: From Theory to Algorithms
Empirical Risk Minimization
```
Required Reading:
```
- A Formal Model – The Statistical Learning Framework & Empirical Risk Minimization
  Chapter 2 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 2.1, 2.2, and 2.3
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 1 by Shai Shalev-Shwartz
- Blog: Some Key Machine Learning Definitions by Joydeep Bhattacharjee
PAC Learning
```
Required Reading:
```
- Chapter 3 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 3.2, 3.3, 3.4, 3.5, 3.6, 3.7
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 2 by Shai Shalev-Shwartz
Learning via Uniform Convergence
```
Required Reading:
```
- Chapter 4 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
The Bias-Complexity Tradeoff
```
Required Reading:
```
- Chapter 5 of Understanding Machine Learning: From Theory to Algorithms
  - Exercise: 5.2
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 3 by Shai Shalev-Shwartz
- Paper: The Bias-Variance Dilemma by Raul Rojas
```
Suggested Reading:
```
- Paper: A Unified Bias-Variance Decomposition by Pedro Domingos
```
Additional Reading:
```
- NoteBook: Exploring the Bias-Variance Tradeoff by Kevin Markham
- Blog: Bias-Variance Decomposition by Sebastian Raschka
- Slide: Bias-Variance Theory by Thomas G. Dietterich
The VC-Dimension
```
Required Reading:
```
- Chapter 6 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 6.2, 6.4, 6.6, 6.9, 6.10, and 6.11
- Slide: Machine Learning by Roland Kwitt
Toolkit Lab (Part 2)
```
Required Reading:
```
- Machine Learning Mastery With Python by Jason Brownlee
  - Python Codes
- Data Exploration:
  - NoteBook: Titanic 1 – Data Exploration by John Stamford
  - NoteBook: Kaggle Titanic Supervised Learning Tutorial
  - NoteBook: An Example Machine Learning Notebook by Randal S. Olson
  - Homework: Take the 7-Day Machine Learning Challenge of Kaggle: Machine learning is the hottest field in data science, and this track will get you started quickly.
Linear Predictors
```
Required Reading:
```
- Chapter 9 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 9.1, 9.3, 9.4, and 9.6
- Slide: Machine Learning by Roland Kwitt
- Slide: Tutorial 3: Consistent linear predictors and Linear regression by Nir Ailon
- NoteBook: Perceptron in Scikit by Chris Albon
- Blog: Why Linear Regression is not Suitable for Classification by Hong Jing
- Slide: Logistic Regression by Jeff Howbert
```
Additional Reading:
```
- Blog: MAE, MSE, RMSE, Coefficient of Determination, Adjusted R Squared — Which Metric is Better? by Akshita Chugh
- Blog: Key Difference between R-squared and Adjusted R-squared for Regression Analysis by Aniruddha Bhandari
- NoteBook: Linear Regression by Kevin Markham
- Paper: Matrix Differentiation by Randal J. Barnes
- Lecture: Logistic Regression by Cosma Shalizi
- Lecture: Multiclass Classification by Yossi Keshet
- NoteBook: Logistic Regression-Analysis by Nitin Borwankar
  - DataSets
- NoteBook: Logistic Regression by Kevin Markham
- Infographic and Code: Simple Linear Regression (100 Days Of ML Code) by Avik Jain
- Infographic and Code: Multiple Linear Regression (100 Days Of ML Code) by Avik Jain
- Infographic and Code: Logistic Regression (100 Days Of ML Code) by Avik Jain
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Linear Regression by UC Business Analytics R Programming Guide
- Blog: Linear Regression with lm() by Nathaniel D. Phillips
- Blog: Logistic Regression by UC Business Analytics R Programming Guide
Decision Trees
```
Required Reading:
```
- Chapter 18 of Understanding Machine Learning: From Theory to Algorithms
  - Exercise: 18.2
- Slide: Decision Trees by Nicholas Ruozzi
- Slide: Representation of Boolean Functions by Troels Bjerre Sørensen
- Slide: Overfitting in Decision Trees by Reid Johnson
- NoteBook: Decision Trees
```
Additional Reading:
```
- Paper: Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? by Manuel Fernandez-Delgado, Eva Cernadas, Senen Barro, and Dinani Amorim
- Blog: Introduction to Random Forest and its Hyper-parameters
- Blog: Random Forest Classifier Example by Chris Albon. This tutorial is based on Yhat’s 2013 tutorial on Random Forests in Python. - NoteBook
- NoteBook: Titanic Competition with Random Forest by Chris Albon
- Infographic and Code: Decision Trees (100 Days Of ML Code) by Avik Jain
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Decision Tree Classifier Implementation in R by Rahul Saxena
- Blog: Regression Trees by UC Business Analytics R Programming Guide
Nearest Neighbor
```
Required Reading:
```
- Chapter 19 (Section 1) of Understanding Machine Learning: From Theory to Algorithms
- Slide: Nearest Neighbor Classification by Vivek Srikumar
- NoteBook: k-Nearest Neighbors
```
Additional Reading:
```
- Blog: When to perform a Feature Scaling? by By Raghav Vashisht
- Blog: Voronoi Tessellations
- Blog: Mahalanobis Distance by Chris McCormick
- NoteBook: Training a Machine Learning Model with Scikit-Learn by Kevin Markham
  - Video
- NoteBook: Comparing Machine Learning Models in Scikit-Learn by Kevin Markham
  - Video
- Infographic: K-Nearest Neighbours (100 Days Of ML Code) by Avik Jain
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Knn Classifier Implementation in R with Caret Package by Rahul Saxena
Ensemble Methods
```
Required Reading:
```
- Chapter 10 of Understanding Machine Learning: From Theory to Algorithms and Chapter 8 of An Introduction to Statistical Learning: with Applications in R
  - Exercises: 10.1 and 10.4 from Understanding Machine Learning: From Theory to Algorithms
- Slide: Ensemble Learning through Diversity Management: Theory, Algorithms, and Applications by Huanhuan Chen and Xin Yao
- Slide: Ensemble Learning, Model selection, Statistical validation by João Mendes Moreira and José Luís Borges
- Slide: Bagging and Random Forests by David Rosenberg
- Slide: Machine Learning by Roland Kwitt
- Slide: Introduction to Machine Learning (Boosting) by Shai Shalev-Shwartz
- Paper: Ensemble Methods in Machine Learnin by Thomas G. Dietterich
- NoteBook: AdaBoost
- Question: Adaboost with a Weak Versus a Strong Learner
```
Additional Reading:
```
- Blog: Ensemble Methods by Rai Kapil
- Blog: Ensemble Methods (Part 3): Meta-learning, Stacking and Mixture of Experts by Marta Enesco and Keshav Dhandhania
- Blog: Consensus Clustering by Lance Fernando
- Blog: Boosting, Bagging, and Stacking — Ensemble Methods with sklearn and mlens by Robert R.F. DeFilippi
  - NoteBook
- NoteBook: Introduction to Python Ensembles by Sebastian Flennerhag
- Library (ML-Ensemble): Graph handles for deep computational graphs and ready-made ensemble classes for ensemble networks by Sebastian Flennerhag
- NoteBook: Ensemble Methods by Vadim Smolyakov
- Paper: On Agnostic Boosting and Parity Learning by A. T. Kalai, Y. Mansour, and E. Verbin
- Paper: Faster Face Detection Using Convolutional Neural Networks & the Viola-Jones Algorithm by Karina Enriquez
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Random Forests by UC Business Analytics R Programming Guide
Model Selection and Validation
```
Required Reading:
```
- Chapter 11 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 11.1 and 11.2 from Understanding Machine Learning: From Theory to Algorithms
- Blog: What is the Difference Between a Parameter and a Hyperparameter? by Jason Brownlee
- Blog: A “short” introduction to model selection by David Schönleber
- Blog: K-Fold and Other Cross-Validation Techniques by Renu Khandelwal
- Tutorial: Learning Curves for Machine Learning in Python by Alex Olteanu
```
Suggested Reading:
```
- NoteBook: Split the Dataset Using Stratified K-Folds Cross-Validator
- Blog: Hyperparameter Tuning the Random Forest in Python by Will Koehrsen
  - Jupyter NoteBook
- Blog: Hyperparameter Optimization: Explanation of Automatized Algorithms by Dawid Kopczyk
  - Code (Python):
- Blog: How to Use Random Seeds Effectively by Jai Bansal
```
Additional Reading:
```
- Blog: Nested Cross Validation Explained by Weina Jin
- NoteBook: Cross Validation by Ritchie Ng
- NoteBook: Cross Validation With Parameter Tuning Using Grid Search by Chris Albon
- Blog: Random Test/Train Split is not Always Enough by Win-Vector
- Slide: Cross-Validation: What, How and Which? by Pradeep Reddy Raamana
- Paper: Algorithms for Hyper-Parameter Optimization (NIPS 2011) by J. Bergstra, R. Bardenet,Y. Bengio, and B. Kégl
- Library: Yellowbrick (Machine Learning Visualization)
  - Learning Curve
  - Validation Curve
  - Double Descent by Arjun Ahuja
  - Deep Double Descent by Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Resampling Methods by UC Business Analytics R Programming Guide
- Blog: Linear Model Selection by UC Business Analytics R Programming Guide
Neural Networks
```
Required Reading:
```
- Chapter 20 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Neural Networks by Shai Shalev-Shwartz
- Blog: 7 Types of Neural Network Activation Functions: How to Choose?
- Blog: Activation Functions
- Blog: Back-Propagation, an Introduction by Sanjeev Arora and Tengyu Ma
```
Additional Reading:
```
- Blog: The Gradient by Khanacademy
- Blog: Activation Functions by Dhaval Dholakia
- Paper: Why Does Deep & Cheap Learning Work So Well? by Henry W. Lin, Max Tegmark, and David Rolnick
- Slide: Basics of Neural Networks by Connelly Barnes
```
R (Programming Language):
```
- Blog: Classification Artificial Neural Network by UC Business Analytics R Programming Guide
Convex Learning Problems
```
Required Reading:
```
- Chapter 12 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
```
Additional Reading:
```
- Blog: Escaping from Saddle Points by Rong Ge
Regularization and Stability
```
Required Reading:
```
- Chapter 13 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
- Blog: L1 and L2 Regularization by Renu Khandelwal
- Blog: L1 Norm Regularization and Sparsity Explained for Dummies by Shi Yan
```
Additional Resources:
```
- NoteBook: Regularization by Ethen
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Regularized Regression by UC Business Analytics R Programming Guide
Support Vector Machines
```
Required Reading:
```
- Chapter 15 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Support Vector Machines and Kernel Methods by Shai Shalev-Shwartz
- Blog: Understanding the Mathematics behind Support Vector Machines by Nikita Sharma
```
Additional Reading:
```
- Infographic: Support Vector Machines (100 Days Of ML Code) by Avik Jain
  - Markdown (NoteBook)
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Support Vector Machine Classifier Implementation in R with Caret Package by Rahul Saxena
- Blog: Support Vector Machine by UC Business Analytics R Programming Guide
Multiclass Classification
```
Required Reading:
```
- Chapter 17 of [Understanding Machine Learning: From Theory to Algorithms]
- Slide: Machine Learning Basics Lecture 7: Multiclass Classification by Yingyu Liang

Additional NoteBooks and Slides:
- Course: Fondations of Machine Learning by David S. Rosenberg
- Python Machine Learning Book Code Repository
- Dive into Machine Learning
- Python code for "An Introduction to Statistical Learning with Applications in R" by Jordi Warmenhoven
- iPython-NoteBooks by John Wittenauer
- Scikit-Learn Tutorial by Jake Vanderplas
- Data Science Roadmap by Javier Estraviz

Class Time and Location:

Saturday, Monday, Wednesday 10:30-12:00 PM

Recitation and Assignments:

Monday and Wednesday 17:00-18:30 PM Refer to the following link to check the assignments.
Also, I recommend to study link of recitation and assignments of machine learning in 2020.

Projects:

Projects are programming assignments that cover the topic of this course. Any project is written by
Jupyter Notebook. Projects will require the use of Python 3.7, as well as
additional Python libraries as follows.

Python 3.7: An interactive, object-oriented, extensible programming language.
NumPy: A Python package for scientific computing.
Pandas: A Python package for high-performance, easy-to-use data structures and data analysis tools.
Scikit-Learn: A Python package for machine learning.
Matplotlib: A Python package for 2D plotting.
SciPy: A Python package for mathematics, science, and engineering.
IPython: An architecture for interactive computing with Python.

Practical Guide:

Slide: Practical Advice for Building Machine Learning Applications by Vivek Srikumar
Blog: Comparison of Machine Learning Models by Kevin Markham

Fascinating Guide to Use Python Libraries (Machine Learning):

Technical Notes On Using Data Science & Artificial Intelligence: To Fight For Something That Matters by Chris Albon

Google Colab:

Google Colab is a free cloud service and it supports free GPU!

Latex:

The students can include mathematical notation within markdown cells using LaTeX in their Jupyter Notebooks.

A Brief Introduction to LaTeX PDF
Math in LaTeX PDF
Sample Document PDF

Useful NoteBooks:

Preparing and Cleaning Data for Machine Learning by Josh Devlin
Getting Started with Kaggle: House Prices Competition by Adam Massachi
Scikit-learn Tutorial: Machine Learning in Python by Satyabrata Pal

Grading:

Homework – 30%
— Will consist of mathematical problems and/or programming assignments.
Midterm – 20%
Endterm – 50%

Three Written Exams:

Midterm Examination: Saturday 1400/09/27, 10:30-12:00
Final Examination: Thursday 1400/11/07, 14:00-16:00

Prerequisites:

General mathematical sophistication; and a solid understanding of Algorithms, Linear Algebra, and Probability Theory, at the advanced undergraduate or beginning graduate level, or equivalent.

Linear Algebra:

Video: Professor Gilbert Strang's Video Lectures on linear algebra.

Probability and Statistics:

Learn Probability and Statistics Through Interactive Visualizations: Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).
Statistics and Probability: This website provides training and tools to help you solve statistics problems quickly, easily, and accurately - without having to ask anyone for help.
Jupyter NoteBooks: Introduction to Statistics by Bargava
Video: Professor John Tsitsiklis's Video Lectures on Applied Probability.
Video: Professor Krishna Jagannathan's Video Lectures on Probability Theory.

Discrete Mathematics:

Course (Videos, Lectures, Assignments): MIT OpenCourseWare (Discrete Mathematics)

Topics:

Have a look at some reports of Kaggle or Stanford students (CS224N, CS224D) to get some general inspiration.

Account:

It is necessary to have a GitHub account to share your projects. It offers plans for both private repositories and free accounts. Github is like the hammer in your toolbox, therefore, you need to have it!

Academic Honor Code:

Honesty and integrity are vital elements of the academic works. All your submitted assignments must be entirely your own (or your own group's).

We will follow the standard of Department of Mathematical Sciences approach:

You can get help, but you MUST acknowledge the help on the work you hand in
Failure to acknowledge your sources is a violation of the Honor Code
You can talk to others about the algorithm(s) to be used to solve a homework problem; as long as you then mention their name(s) on the work you submit
You should not use code of others or be looking at code of others when you write your own: You can talk to people but have to write your own solution/code

Questions?

I will be having office hours for this course on Monday (09:30 AM--12:00 AM). If this is not convenient, email me at hhaji@sbu.ac.ir or talk to me after class.

Name		Name	Last commit message	Last commit date
Latest commit History 1,375 Commits
Data-Handling		Data-Handling
Images		Images
NoteBooks		NoteBooks
Projects		Projects
Recitation-Assignments		Recitation-Assignments
Tutorials		Tutorials
_layouts		_layouts
.gitmodules		.gitmodules
Applied-Machine-Learning-2019.md		Applied-Machine-Learning-2019.md
Applied-Machine-Learning-2020-Spring.md		Applied-Machine-Learning-2020-Spring.md
Applied-Machine-Learning-2020.md		Applied-Machine-Learning-2020.md
Readme.md		Readme.md
_config.yml		_config.yml

hhaji/Applied-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Applied Machine Learning

Index:

Course Overview:

TextBooks:

Slides and Papers:

Toolkit Lab (Part 1: Anaconda, Jupyter Lab, Markdown, Git, GitHub, and Google Colab)

Introduction

Empirical Risk Minimization

PAC Learning

Learning via Uniform Convergence

The Bias-Complexity Tradeoff

The VC-Dimension

Toolkit Lab (Part 2)

Linear Predictors

Decision Trees

Nearest Neighbor

Ensemble Methods

Model Selection and Validation

Neural Networks

Convex Learning Problems

Regularization and Stability

Support Vector Machines

Multiclass Classification

Additional NoteBooks and Slides:

Class Time and Location:

Recitation and Assignments:

Projects:

Practical Guide:

Fascinating Guide to Use Python Libraries (Machine Learning):

Google Colab:

Latex:

Useful NoteBooks:

Grading:

Three Written Exams:

Prerequisites:

Linear Algebra:

Probability and Statistics:

Discrete Mathematics:

Topics:

Account:

Academic Honor Code:

Questions?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages