Skip to content

Oneiben/EntropyBased-RandomForest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Entropy-Based Random Forest Classifier

Table of Contents

Introduction

This project implements a Random Forest Classifier from scratch using Entropy-based Decision Trees. It includes:

  • A Decision Tree classifier built using entropy and information gain.
  • A Random Forest that ensembles multiple decision trees using bootstrap aggregation (bagging).

Why? This project helps understand how Random Forest works internally, beyond just using sklearn!


How It Works

Decision Tree Implementation

  • Splits data using the feature that maximizes information gain.
  • Recursively builds a tree until max depth is reached.
  • Handles categorical features without needing encoding.

Random Forest Implementation

  • Bootstrap Sampling: Each tree is trained on a random subset of the dataset.
  • Multiple Decision Trees are trained with max_depth and num_trees parameters.
  • Majority Voting: The final class prediction is based on the most common output of all trees.

Project Structure

EntropyBased-RandomForest/
├── notebooks/                          
│   ├── random_forest.ipynb             # Main notebook for the Random Forest implementation
├── src/                            
│   ├── decision_tree.py                # Decision Tree implementation
│   ├── random_forest.py                # Random Forest implementation                              
│   ├── test.py                         # Tests for the Decision Tree and Random Forest
├── README.md                            
├── requirements.txt                    # Python dependencies
└── LICENSE                             # License file

Installation & Usage

1. Clone the Repository

git clone https://github.com/Oneiben/EntropyBased-RandomForest.git
cd EntropyBased-RandomForest

2. Install Dependencies

pip install -r requirements.txt

3. Run the Test Script

python test.py

Example Usage

A toy dataset with 4 features (A, B, C, D) and a binary classification target (result).

import pandas as pd
import numpy as np
from DecisionTree import DecisionTree
from RandomForest import RandomForest

# Example Data
details = {
    "A": [1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
    "B": [0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0],
    "C": [0, 1, 0, 2, 0, 0, 1, 1, 1, 2, 1, 2, 1, 0],
    "D": [1, 0, 0, 0, 0, 2, 2, 0, 1, 1, 0, 1, 2, 2],
    "result": [1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1]
}

df = pd.DataFrame(details)
X_train = df.drop(columns=["result"]).values
y_train = df["result"].values


print("------------------------ Decision Tree --------------------------")

# Train the decision tree
model = DecisionTree(max_depth=3)
model.fit(X_train, y_train)

# Print the decision tree structure
model.print_tree()

# Predict an example
example = np.array([[1, 0, 1, 0]])
prediction = model.predict(example)
print("Predicted class:", prediction[0])

print("------------------------ Random Forest --------------------------")

# Train the Random Forest
model = RandomForest(num_trees=14, max_depth=3)
model.fit(X_train, y_train)

# Predict an example
example = np.array([[1, 0, 1, 0]])
prediction = model.predict(example)
print("Random Forest Predicted class:", prediction[0])

Expected Output:

------------------------ Decision Tree --------------------------
Feature 3:
        Value 0:
                Feature 1:
                        Value 0:
                                Leaf: 1
                        Value 1:
                                Leaf: 0
        Value 1:
                Leaf: 1
        Value 2:
                Feature 0:
                        Value 0:
                                Leaf: 1
                        Value 1:
                                Leaf: 0
Decision Tree Predicted class: 1
------------------------ Random Forest --------------------------
Random Forest Predicted class: 1


Results & Performance

  • Robust Classification: Works well on small datasets without overfitting.
  • Customizable: Adjust num_trees and max_depth to optimize performance.
  • Lightweight: No external ML libraries required (except numpy & pandas).

Future Improvements

  • Support for continuous numerical features (currently categorical only).
  • Implement Gini Impurity as an alternative to entropy.
  • Optimize tree-building for better performance on large datasets.

Contributing

Contributions are welcome! Follow these steps to contribute:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature-name
  3. Make your changes and commit:
    git commit -m "Description of changes"
  4. Push the changes and open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

a Random Forest Classifier from scratch using Entropy-based Decision Trees.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published