Title | Author | Date |
---|---|---|
Day 12 of 30 Day Of ML Code Event |
Rajendrasinh Parmar |
August 14, 2021 |
This is a lesson to provide overview about the course.
Follow the notebook Introduction for lesson details.
As part of the exercise for the first lesson we've created 5 Random Forest Regressor models using different values for parameters.
We changes values for n_estimators
, criterion
, min_samples_split
, max_depth
.
After that we evaluated all the models and found the best model that have minimum value of MAE(Mean Absolute Error).
As a part of the exercise for the first lesson we have predicted the values of house prices using the best model we decided from the 5 models defined in previous steps. Data used for the exercise is available under the folder home-data-for-ml-course
The associated exercise with the first lesson of the course is provided in Exercise: Introduction
Submission for the exercise is available in submission_introduction.csv
In this tutorial, you will learn three approaches to dealing with missing values. Then you'll compare the effectiveness of these approaches on a real-world dataset.
There are three approach to handle missing values
- Drop columns with missing values
- Imputation
- An Extension to Imputation
Follow the notebook Missing Values for lesson details.
As a part of the exercise for the second lesson I used different approaches to handle missing values.
Then for the final exercise I've used An Extension to Imputation
approach to train my model and predict the values.
The associated exercise with the second lesson of the course is provided in Exercise: Missing Values
Submission for the exercise is available in submission_missing_values.csv
In this tutorial, you will learn what a categorical variable is, along with three approaches for handling this type of data.
Categorical data is any kind of data that takes limited number of values.
As most of the models will not directly support categorical data we need to process them first. There are three approaches to handle categorical data.
- Drop Categorical Variables
- Ordinal Encoding - The categorical variables that has clear ordering are called
Ordinal variables
. - One-Hot Encoding - The categorical variables that does not have intrinsic ranking are called
nominal variables
.
Follow the notebook Categorical Variables for lesson details.
As an Exercise of the second lesson I have worked on all three methods to handle categorical data.
After the completion of the exercise, I worked on the optional exercise to clean up the data for the test data set X_test
using all the methods I've learned till now.
The associated exercise with the second lesson of the course is provided in Exercise: Categorical Variables