Lecturer: Hossein Hajiabolhassan
The Webpage of the Course: Applied Machine Learning 2019
Data Science Center, Shahid Beheshti University

Index:

Course Overview
TextBooks
Slides and Papers
1. Lecture 1: Toolkit Lab (Part 1)
2. Lecture 2: Introduction
3. Lecture 3: Empirical Risk Minimization
4. Lecture 4: PAC Learning
5. Lecture 5: The Bias-Complexity Tradeoff
6. Lecture 6: The VC-Dimension
7. Lecture 7: Toolkit Lab (Part 2)
8. Lecture 8: Linear Predictors
9. Lecture 9: Decision Trees
10. Lecture 10: Nearest Neighbor
11. Lecture 11: Ensemble Methods
12. Lecture 12: Model Selection and Validation
13. Lecture 13: Neural Networks
14. Lecture 14: Convex Learning Problems
15. Lecture 15: Regularization and Stability
16. Lecture 16: Support Vector Machines
- Additional NoteBooks and Slides
Class Time and Location
Projects
Grading
Prerequisites
Topics
Account
Academic Honor Code
Questions
Miscellaneous

Course Overview:

Machine learning is an area of artificial intelligence that provides systems the ability to 
automatically learn. Machine learning allows machines to handle new situations via analysis, 
self-training, observation and experience. The wonderful success of machine learning has made 
it the default method of choice for artificial intelligence experts. In this course, we review 
the fundamentals and algorithms of machine learning.

TextBooks:

Main TextBooks:

Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David
An Introduction to Statistical Learning: with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

Additional TextBooks:

Machine Learning Mastery With Python by Jason Brownlee
- Python Codes
Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas Mueller and Sarah Guido
- Notebooks
Pattern Recognition and Machine Learning by Christopher Bishop

Slides and Papers:

Recommended Slides & Papers:

Toolkit Lab (Part 1)
```
Required Reading:
```
- Python Libraries for Data Science
  - Exercises: Practice Numpy in LabEx
  - Exercises: Practice Pandas in LabEx
  - Exercises: Practice Matplotlib in LabEx
```
Additional Reading:
```
- Tools in Data Science
- R Tutorial for Beginners: Learning R Programming
Introduction
```
Required Reading:
```
- Slide: Machine Learning: Types of Machine Learning I by Javier Bejar
- Slide: Machine Learning: Types of Machine Learning II by Javier Bejar
Empirical Risk Minimization
```
Required Reading:
```
- A Formal Model – The Statistical Learning Framework & Empirical Risk Minimization
  Chapter 2 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 2.1, 2.2, and 2.3
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 1 by Shai Shalev-Shwartz
PAC Learning
```
Required Reading:
```
- Chapter 3 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 3.2, 3.3, 3.4, 3.5, 3.6, 3.7
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 2 by Shai Shalev-Shwartz
The Bias-Complexity Tradeoff
```
Required Reading:
```
- Chapter 5 of Understanding Machine Learning: From Theory to Algorithms
  - Exercise: 5.2
- Slide: Machine Learning by Roland Kwitt
- Slide: Lecture 3 by Shai Shalev-Shwartz
- Paper: The Bias-Variance Dilemma by Raul Rojas
```
Additional Reading:
```
- NoteBook: Exploring the Bias-Variance Tradeoff by Kevin Markham
The VC-Dimension
```
Required Reading:
```
- Chapter 6 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 6.2, 6.4, 6.6, 6.9, 6.10, and 6.11
- Slide: Machine Learning by Roland Kwitt
Toolkit Lab (Part 2)
```
Required Reading:
```
- Machine Learning Mastery With Python by Jason Brownlee
  - Python Codes
- Data Exploration:
  - NoteBook: Titanic 1 – Data Exploration by John Stamford
  - NoteBook: Kaggle Titanic Supervised Learning Tutorial
  - NoteBook: An Example Machine Learning Notebook by Randal S. Olson
- Homework: Take the 7-Day Machine Learning Challenge of Kaggle: Machine learning is the hottest field in data science, and this track will get you started quickly.
Linear Predictors
```
Required Reading:
```
- Chapter 9 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 9.1, 9.3, 9.4, and 9.6
- Slide: Machine Learning by Roland Kwitt
- Slide: Tutorial 3: Consistent linear predictors and Linear regression by Nir Ailon
- NoteBook: Perceptron in Scikit by Chris Albon
- Paper: Perceptron for Imbalanced Classes and Multiclass Classification by Piyush Rai
```
Additional Reading:
```
- NoteBook: Linear Regression by Kevin Markham
- Paper: Matrix Differentiation by Randal J. Barnes
- Lecture: Logistic Regression by Cosma Shalizi
- NoteBook: Logistic Regression-Analysis by Nitin Borwankar
  - DataSets
- NoteBook: Logistic Regression by Kevin Markham
- Infographic and Code: Simple Linear Regression (100 Days Of ML Code) by Avik Jain
- Infographic and Code: Multiple Linear Regression (100 Days Of ML Code) by Avik Jain
- Infographic and Code: Logistic Regression (100 Days Of ML Code) by Avik Jain
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Linear Regression by UC Business Analytics R Programming Guide
- Blog: Linear Regression with lm() by Nathaniel D. Phillips
- Blog: Logistic Regression by UC Business Analytics R Programming Guide
Decision Trees
```
Required Reading:
```
- Chapter 18 of Understanding Machine Learning: From Theory to Algorithms
  - Exercise: 18.2
- Slide: Decision Trees by Nicholas Ruozzi
- Slide: Representation of Boolean Functions by Troels Bjerre Sørensen
- Slide: Overfitting in Decision Trees by Reid Johnson
- NoteBook: Decision Trees
```
Additional Reading:
```
- Paper: Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? by Manuel Fernandez-Delgado, Eva Cernadas, Senen Barro, and Dinani Amorim
- Blog: Random Forest Classifier Example by Chris Albon. This tutorial is based on Yhat’s 2013 tutorial on Random Forests in Python.
  - NoteBook
- NoteBook: Titanic Competition with Random Forest by Chris Albon
- Infographic and Code: Decision Trees (100 Days Of ML Code) by Avik Jain
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Decision Tree Classifier Implementation in R by Rahul Saxena
- Blog: Regression Trees by UC Business Analytics R Programming Guide
Nearest Neighbor
```
Required Reading:
```
- Chapter 19 (Section 1) of Understanding Machine Learning: From Theory to Algorithms
- Slide: Nearest Neighbor Classification by Vivek Srikumar
- NoteBook: k-Nearest Neighbors
```
Additional Reading:
```
- NoteBook: Training a Machine Learning Model with Scikit-Learn by Kevin Markham
  - Video
- NoteBook: Comparing Machine Learning Models in Scikit-Learn by Kevin Markham
  - Video
- Infographic: K-Nearest Neighbours (100 Days Of ML Code) by Avik Jain
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Knn Classifier Implementation in R with Caret Package by Rahul Saxena
Ensemble Methods
```
Required Reading:
```
- Chapter 10 of Understanding Machine Learning: From Theory to Algorithms and Chapter 8 of An Introduction to Statistical Learning: with Applications in R
  - Exercises: 10.1, 10.3, 10.4, and 10.5 from Understanding Machine Learning: From Theory to Algorithms
- Slide: Bagging and Random Forests by David Rosenberg
- Slide: Ensemble Learning through Diversity Management: Theory, Algorithms, and Applications by Huanhuan Chen and Xin Yao
- Slide: Machine Learning by Roland Kwitt
- Slide: Introduction to Machine Learning (Boosting) by Shai Shalev-Shwartz
- Paper: Ensemble Methods in Machine Learnin by Thomas G. Dietterich
- NoteBook: AdaBoost
```
Additional Reading:
```
- Blog: Ensemble Methods by Rai Kapil
- Blog: Boosting, Bagging, and Stacking — Ensemble Methods with sklearn and mlens by Robert R.F. DeFilippi
  - NoteBook
- NoteBook: Introduction to Python Ensembles by Sebastian Flennerhag
- Library (ML-Ensemble): Graph handles for deep computational graphs and ready-made ensemble classes for ensemble networks by Sebastian Flennerhag
- NoteBook: Ensemble Methods by Vadim Smolyakov
- Paper: On Agnostic Boosting and Parity Learning by A. T. Kalai, Y. Mansour, and E. Verbin
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Random Forests by UC Business Analytics R Programming Guide
Model Selection and Validation
```
Required Reading:
```
- Chapter 11 of Understanding Machine Learning: From Theory to Algorithms
  - Exercises: 11.1 and 11.2 from Understanding Machine Learning: From Theory to Algorithms
- Tutorial: Learning Curves for Machine Learning in Python by Alex Olteanu
- Blog: K-Fold and Other Cross-Validation Techniques by Renu Khandelwal
- NoteBook: Split the Dataset Using Stratified K-Folds Cross-Validator
- Blog: Hyperparameter Tuning the Random Forest in Python by Will Koehrsen
  - Jupyter NoteBook
- Blog: Hyperparameter Optimization: Explanation of Automatized Algorithms by Dawid Kopczyk
  - Code (Python):
```
Additional Reading:
```
- NoteBook: Cross Validation by Ritchie Ng
- NoteBook: Cross Validation With Parameter Tuning Using Grid Search by Chris Albon
- Blog: Random Test/Train Split is not Always Enough by Win-Vector
- Slide: Cross-Validation: What, How and Which? by Pradeep Reddy Raamana
- Paper: Algorithms for Hyper-Parameter Optimization (NIPS 2011) by J. Bergstra, R. Bardenet,Y. Bengio, and B. Kégl
- Library: Yellowbrick (Machine Learning Visualization)
  - Learning Curve
  - Validation Curve
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Resampling Methods by UC Business Analytics R Programming Guide
- Blog: Linear Model Selection by UC Business Analytics R Programming Guide
Neural Networks
```
Required Reading:
```
- Chapter 20 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Neural Networks by Shai Shalev-Shwartz
- Blog: 7 Types of Neural Network Activation Functions: How to Choose?
- Blog: Activation Functions
- Blog: Back-Propagation, an Introduction by Sanjeev Arora and Tengyu Ma
```
Additional Reading:
```
- Blog: The Gradient by Khanacademy
- Blog: Activation Functions by Dhaval Dholakia
- Paper: Why Does Deep & Cheap Learning Work So Well? by Henry W. Lin, Max Tegmark, and David Rolnick
- Slide: Basics of Neural Networks by Connelly Barnes
```
R (Programming Language):
```
- Blog: Classification Artificial Neural Network by UC Business Analytics R Programming Guide
Convex Learning Problems
```
Required Reading:
```
- Chapter 12 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
```
Additional Reading:
```
- Blog: Escaping from Saddle Points by Rong Ge
Regularization and Stability
```
Required Reading:
```
- Chapter 13 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Machine Learning by Roland Kwitt
- Blog: L1 and L2 Regularization by Renu Khandelwal
- Blog: L1 Norm Regularization and Sparsity Explained for Dummies by Shi Yan
```
Additional Resources:
```
- NoteBook: Regularization by Ethen
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Regularized Regression by UC Business Analytics R Programming Guide
Support Vector Machines
```
Required Reading:
```
- Chapter 15 of Understanding Machine Learning: From Theory to Algorithms
- Slide: Support Vector Machines and Kernel Methods by Shai Shalev-Shwartz
- Blog: Support Vector Machine (SVM) by Ajay Yadav
- Blog: Support Vector Machine vs Logistic Regression by Georgios Drakos
```
Additional Reading:
```
- Infographic: Support Vector Machines (100 Days Of ML Code) by Avik Jain
  - Markdown (NoteBook)
```
R (Programming Language):
```
- Book: Machine Learning Mastery With R by Jason Brownlee
- Blog: Support Vector Machine Classifier Implementation in R with Caret Package by Rahul Saxena
- Blog: Support Vector Machine by UC Business Analytics R Programming Guide

Additional NoteBooks and Slides:
- Course: Fondations of Machine Learning by David S. Rosenberg
- Python Machine Learning Book Code Repository
- Dive into Machine Learning
- Python code for "An Introduction to Statistical Learning with Applications in R" by Jordi Warmenhoven
- iPython-NoteBooks by John Wittenauer
- Scikit-Learn Tutorial by Jake Vanderplas
- Data Science Roadmap by Javier Estraviz

Class Time and Location:

Saturday and Monday 08:00-09:30 AM (Spring 2019), Room 204/1.

Projects:

Projects are programming assignments that cover the topic of this course. Any project is written by
Jupyter Notebook. Projects will require the use of Python 3.7, as well as
additional Python libraries as follows.

Python 3.7: An interactive, object-oriented, extensible programming language.
NumPy: A Python package for scientific computing.
Pandas: A Python package for high-performance, easy-to-use data structures and data analysis tools.
Scikit-Learn: A Python package for machine learning.
Matplotlib: A Python package for 2D plotting.
SciPy: A Python package for mathematics, science, and engineering.
IPython: An architecture for interactive computing with Python.

Practical Guide:

Slide: Practical Advice for Building Machine Learning Applications by Vivek Srikumar
Blog: Comparison of Machine Learning Models by Kevin Markham

Fascinating Guide to Use Python Libraries (Machine Learning):

Technical Notes On Using Data Science & Artificial Intelligence: To Fight For Something That Matters by Chris Albon

Google Colab:

Google Colab is a free cloud service and it supports free GPU!

How to Use Google Colab by Souvik Mandal
Primer for Learning Google Colab
Deep Learning Development with Google Colab, TensorFlow, Keras & PyTorch

Latex:

The students can include mathematical notation within markdown cells using LaTeX in their Jupyter Notebooks.

A Brief Introduction to LaTeX PDF
Math in LaTeX PDF
Sample Document PDF

Useful NoteBooks:

Preparing and Cleaning Data for Machine Learning by Josh Devlin
Getting Started with Kaggle: House Prices Competition by Adam Massachi
Scikit-learn Tutorial: Machine Learning in Python by Satyabrata Pal

Grading:

Projects and Midterm – 50%
Endterm – 50%

Final Exam:

Final Examination: Saturday 1398/03/25, 08:30-10:30

Prerequisites:

General mathematical sophistication; and a solid understanding of Algorithms, Linear Algebra, and Probability Theory, at the advanced undergraduate or beginning graduate level, or equivalent.

Linear Algebra:

Video: Professor Gilbert Strang's Video Lectures on linear algebra.

Probability and Statistics:

Learn Probability and Statistics Through Interactive Visualizations: Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).
Statistics and Probability: This website provides training and tools to help you solve statistics problems quickly, easily, and accurately - without having to ask anyone for help.
Jupyter NoteBooks: Introduction to Statistics by Bargava
Video: Professor John Tsitsiklis's Video Lectures on Applied Probability.
Video: Professor Krishna Jagannathan's Video Lectures on Probability Theory.

Discrete Mathematics:

Course (Videos, Lectures, Assignments): MIT OpenCourseWare (Discrete Mathematics)

Topics:

Have a look at some reports of Kaggle or Stanford students (CS224N, CS224D) to get some general inspiration.

Account:

It is necessary to have a GitHub account to share your projects. It offers plans for both private repositories and free accounts. Github is like the hammer in your toolbox, therefore, you need to have it!

Academic Honor Code:

Honesty and integrity are vital elements of the academic works. All your submitted assignments must be entirely your own (or your own group's).

We will follow the standard of Department of Mathematical Sciences approach:

You can get help, but you MUST acknowledge the help on the work you hand in
Failure to acknowledge your sources is a violation of the Honor Code
You can talk to others about the algorithm(s) to be used to solve a homework problem; as long as you then mention their name(s) on the work you submit
You should not use code of others or be looking at code of others when you write your own: You can talk to people but have to write your own solution/code

Questions?

I will be having office hours for this course on Monday (09:30 AM--12:00 AM). If this is not convenient, email me at hhaji@sbu.ac.ir or talk to me after class.

Files

Applied-Machine-Learning-2019.md

Latest commit

History

Applied-Machine-Learning-2019.md

File metadata and controls

Index:

Course Overview:

TextBooks:

Slides and Papers:

Toolkit Lab (Part 1)

Introduction

Empirical Risk Minimization

PAC Learning

The Bias-Complexity Tradeoff

The VC-Dimension

Toolkit Lab (Part 2)

Linear Predictors

Decision Trees

Nearest Neighbor

Ensemble Methods

Model Selection and Validation

Neural Networks

Convex Learning Problems

Regularization and Stability

Support Vector Machines

Additional NoteBooks and Slides: