This is a collection of Jupyter notebooks I prepared while attending Machine Learning Practice Course from IIT Madras Online Degree Programme in Data Science and Programming(Diploma Level).
The course focuses on practical implementation of machine learning algorithm using scikit-learn APIs.
Running Examples
- California housing prediction for regression tasks.
- MNIST Digit recognition for classification tasks
- Data loading
- Basic data loading/generation features(load, fetch, make)
- Data preprocessing
- Data cleaning
- Feature Exraction -
DictVectorizer
- Data Imputation -
SimpleImputer
,KNNImputer
- Feature Scaling -
MaxAbsoluteScaler
,MinMaxScaler
,StandardScaler
,
- Feature Exraction -
- Feature transformations
- Polynomial Features -
PolynomialFeatures
- Discretization -
KBinsDiscretizer
- Handling categorical variables -
OrdinalEncoder
,OneHotEncoder
,LabelEncoder
,MultiLabelBinarizer
,pandas.get_dummies
,add_dummy_feature
- Custom Transformers -
FunctionTransformer
- Composite Transformers -
ColumnTransformer
,TransformedTargetRegressor
- Polynomial Features -
- Feature Selection
- Filter based methods -
VarianceThreshold
,SelectKBest
,SelectPercentile
,GenericUnivariateSelect
- Wrapper based Methods -
RFE
,SelectFromModel
,SequentialFeatureSelector
- Filter based methods -
- Feature extraction
- PCA -
PCA
- PCA -
- Pipeline -
Pipeline
,make_pipeline
,FeatureUnion
- Hyper Parameter tuning and Cross validation -
GridSearchCV
,RandomizedSearchCV
- Handling imbalance(
imblearn
) -RandomUnderSampler
,RandomOverSampler
,SMOTE
- Data cleaning
- Baseline models
- How to build simple baseline models
- Linear Regression
- Normal equation method(
LinearRegression
) - Iterative optimisation method(
SGDRegressor
)
- Normal equation method(
- California Housing Prediction
- Exploratory data Analysis
- Regularised Linear regression and Hyper parameter tuning
- Perceptron
- Binary Classification
- Multiclass Classification
- Logistic regression
- Naive Bayes models
- K Nearest Neighbour model
- Classification
- Regression
- Training Large Scale ML Models
- Learning in batches(
partial_fit()
)
- Learning in batches(
- Softmax Regression
- Support Vector Machines
- Decision trees
- Bagging and Boosting
- RandomForest
- Clustering
- K Means
- Heirarchical Agglomerative Clustering
- Multi-layer Perceptron