Skip to content

Decision Tree

Ishani Kathuria edited this page May 28, 2023 · 2 revisions

Overview

A decision tree is a predictive modelling algorithm that uses a tree-like structure to make decisions or predictions. It partitions the data based on different features and creates a tree of decisions to classify or predict the outcome. It's like asking a series of yes or no questions to reach a final decision. It's an interpretable and widely used algorithm for both classification and regression tasks.

A decision tree consists of three main parts:

  1. Root node: initial question
  2. Internal nodes: intermediate features/questions
  3. Leaf nodes: final outcomes/predictions

The root node starts the decision-making process, internal nodes guide the flow based on features, and leaf nodes provide the final predictions.

How to make a decision tree

  1. Start with a question or feature that divides the data into two or more subsets.
  2. Choose the best question or feature that provides the most useful information for making predictions.
  3. Divide the data based on the selected question or feature, creating branches or paths in the tree.
  4. Repeat the process for each subset, considering different questions or features at each step.
  5. Keep splitting the data until reaching a point where further divisions do not provide significant improvement.
  6. Assign the final outcomes or predictions to the leaf nodes of the tree.
  7. To make a prediction, follow the path from the root to a leaf node based on the answers to the questions.

Step-by-Step Implementation

The iris dataset (UCI) was used with the columns as,

  • sepal length
  • sepal width
  • petal length
  • petal width
  • class (y – dependent variable)

The dataset can be used to classify what the iris plant species is (Iris Setosa, Iris Versicolour, Iris Virginica) or it can also be used for regression problems to predict the values of any of the other features as they are continuous values.

See implementation in Jupyter Notebook