Skip to content

Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"

License

Notifications You must be signed in to change notification settings

pradipece/Weather_forecast_data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Weather_forecast_data_analysis

Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"

Problem Statement

This project coding-focused approach how to use decision trees and random forests to solve a real-world problem from Kaggle:

QUESTION: The dataset contains about 10 years of daily weather observations from numerous Au weather stations. Here's a small sample from the dataset:

As a data scientist at the Bureau of Meteorology, you are tasked with creating a fully automated system that can use today's weather data for a given location to predict whether it will rain at the location.

Overview

Perform the following steps to prepare the dataset for training:

  1. Create a train/test/validation split
  2. Identify input and target columns
  3. Identify numeric and categorical columns
  4. Impute (fill) missing numeric values
  5. Scale numeric values to the $(0, 1)$ range
  6. Encode categorical columns to one-hot vectors

Training and Visualizing Decision Trees

A decision tree in general parlance represents a hierarchical series of binary decisions:

A decision tree in machine learning works in the same way except that we let the computer figure out the optimal structure hierarchy of decisions, following the instruction of criteria.

Summary

The following topics were covered in this tutorial:

  • Downloading a real-world dataset
  • Preparing a dataset for training
  • Training and interpreting decision trees
  • Training and interpreting random forests
  • Overfitting, hyperparameter tuning & regularization
  • Making predictions on single inputs

Introduced the following terms:

  • Decision tree
  • Random forest
  • Overfitting
  • Hyperparameter
  • Hyperparameter tuning
  • Regularization
  • Ensembling
  • Generalization
  • Bootstrapping

References

Check out the following resources to learn more:

About

Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published