Entropy-Based Random Forest Classifier

Introduction

This project implements a Random Forest Classifier from scratch using Entropy-based Decision Trees. It includes:

A Decision Tree classifier built using entropy and information gain.
A Random Forest that ensembles multiple decision trees using bootstrap aggregation (bagging).

Why? This project helps understand how Random Forest works internally, beyond just using sklearn!

How It Works

Decision Tree Implementation

Splits data using the feature that maximizes information gain.
Recursively builds a tree until max depth is reached.
Handles categorical features without needing encoding.

Random Forest Implementation

Bootstrap Sampling: Each tree is trained on a random subset of the dataset.
Multiple Decision Trees are trained with max_depth and num_trees parameters.
Majority Voting: The final class prediction is based on the most common output of all trees.

Project Structure

EntropyBased-RandomForest/
├── notebooks/                          
│   ├── random_forest.ipynb             # Main notebook for the Random Forest implementation
├── src/                            
│   ├── decision_tree.py                # Decision Tree implementation
│   ├── random_forest.py                # Random Forest implementation                              
│   ├── test.py                         # Tests for the Decision Tree and Random Forest
├── README.md                            
├── requirements.txt                    # Python dependencies
└── LICENSE                             # License file

Installation & Usage

1. Clone the Repository

git clone https://github.com/Oneiben/EntropyBased-RandomForest.git
cd EntropyBased-RandomForest

2. Install Dependencies

pip install -r requirements.txt

3. Run the Test Script

python test.py

Example Usage

A toy dataset with 4 features (A, B, C, D) and a binary classification target (result).

import pandas as pd
import numpy as np
from DecisionTree import DecisionTree
from RandomForest import RandomForest

# Example Data
details = {
    "A": [1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
    "B": [0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0],
    "C": [0, 1, 0, 2, 0, 0, 1, 1, 1, 2, 1, 2, 1, 0],
    "D": [1, 0, 0, 0, 0, 2, 2, 0, 1, 1, 0, 1, 2, 2],
    "result": [1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1]
}

df = pd.DataFrame(details)
X_train = df.drop(columns=["result"]).values
y_train = df["result"].values


print("------------------------ Decision Tree --------------------------")

# Train the decision tree
model = DecisionTree(max_depth=3)
model.fit(X_train, y_train)

# Print the decision tree structure
model.print_tree()

# Predict an example
example = np.array([[1, 0, 1, 0]])
prediction = model.predict(example)
print("Predicted class:", prediction[0])

print("------------------------ Random Forest --------------------------")

# Train the Random Forest
model = RandomForest(num_trees=14, max_depth=3)
model.fit(X_train, y_train)

# Predict an example
example = np.array([[1, 0, 1, 0]])
prediction = model.predict(example)
print("Random Forest Predicted class:", prediction[0])

Expected Output:

------------------------ Decision Tree --------------------------
Feature 3:
        Value 0:
                Feature 1:
                        Value 0:
                                Leaf: 1
                        Value 1:
                                Leaf: 0
        Value 1:
                Leaf: 1
        Value 2:
                Feature 0:
                        Value 0:
                                Leaf: 1
                        Value 1:
                                Leaf: 0
Decision Tree Predicted class: 1
------------------------ Random Forest --------------------------
Random Forest Predicted class: 1

Results & Performance

Robust Classification: Works well on small datasets without overfitting.
Customizable: Adjust num_trees and max_depth to optimize performance.
Lightweight: No external ML libraries required (except numpy & pandas).

Future Improvements

Support for continuous numerical features (currently categorical only).
Implement Gini Impurity as an alternative to entropy.
Optimize tree-building for better performance on large datasets.

Contributing

Contributions are welcome! Follow these steps to contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature-name
```
Make your changes and commit:
```
git commit -m "Description of changes"
```
Push the changes and open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Entropy-Based Random Forest Classifier

Table of Contents

Introduction

How It Works

Decision Tree Implementation

Random Forest Implementation

Project Structure

Installation & Usage

1. Clone the Repository

2. Install Dependencies

3. Run the Test Script

Example Usage

Expected Output:

Results & Performance

Future Improvements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Oneiben/EntropyBased-RandomForest

Folders and files

Latest commit

History

Repository files navigation

Entropy-Based Random Forest Classifier

Table of Contents

Introduction

How It Works

Decision Tree Implementation

Random Forest Implementation

Project Structure

Installation & Usage

1. Clone the Repository

2. Install Dependencies

3. Run the Test Script

Example Usage

Expected Output:

Results & Performance

Future Improvements

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages