- Entropy-Based Random Forest Classifier
This project implements a Random Forest Classifier from scratch using Entropy-based Decision Trees. It includes:
- A Decision Tree classifier built using entropy and information gain.
- A Random Forest that ensembles multiple decision trees using bootstrap aggregation (bagging).
Why? This project helps understand how Random Forest works internally, beyond just using sklearn
!
- Splits data using the feature that maximizes information gain.
- Recursively builds a tree until max depth is reached.
- Handles categorical features without needing encoding.
- Bootstrap Sampling: Each tree is trained on a random subset of the dataset.
- Multiple Decision Trees are trained with
max_depth
andnum_trees
parameters. - Majority Voting: The final class prediction is based on the most common output of all trees.
EntropyBased-RandomForest/
├── notebooks/
│ ├── random_forest.ipynb # Main notebook for the Random Forest implementation
├── src/
│ ├── decision_tree.py # Decision Tree implementation
│ ├── random_forest.py # Random Forest implementation
│ ├── test.py # Tests for the Decision Tree and Random Forest
├── README.md
├── requirements.txt # Python dependencies
└── LICENSE # License file
git clone https://github.com/Oneiben/EntropyBased-RandomForest.git
cd EntropyBased-RandomForest
pip install -r requirements.txt
python test.py
A toy dataset with 4 features (A, B, C, D
) and a binary classification target (result
).
import pandas as pd
import numpy as np
from DecisionTree import DecisionTree
from RandomForest import RandomForest
# Example Data
details = {
"A": [1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
"B": [0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0],
"C": [0, 1, 0, 2, 0, 0, 1, 1, 1, 2, 1, 2, 1, 0],
"D": [1, 0, 0, 0, 0, 2, 2, 0, 1, 1, 0, 1, 2, 2],
"result": [1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1]
}
df = pd.DataFrame(details)
X_train = df.drop(columns=["result"]).values
y_train = df["result"].values
print("------------------------ Decision Tree --------------------------")
# Train the decision tree
model = DecisionTree(max_depth=3)
model.fit(X_train, y_train)
# Print the decision tree structure
model.print_tree()
# Predict an example
example = np.array([[1, 0, 1, 0]])
prediction = model.predict(example)
print("Predicted class:", prediction[0])
print("------------------------ Random Forest --------------------------")
# Train the Random Forest
model = RandomForest(num_trees=14, max_depth=3)
model.fit(X_train, y_train)
# Predict an example
example = np.array([[1, 0, 1, 0]])
prediction = model.predict(example)
print("Random Forest Predicted class:", prediction[0])
------------------------ Decision Tree --------------------------
Feature 3:
Value 0:
Feature 1:
Value 0:
Leaf: 1
Value 1:
Leaf: 0
Value 1:
Leaf: 1
Value 2:
Feature 0:
Value 0:
Leaf: 1
Value 1:
Leaf: 0
Decision Tree Predicted class: 1
------------------------ Random Forest --------------------------
Random Forest Predicted class: 1
- Robust Classification: Works well on small datasets without overfitting.
- Customizable: Adjust
num_trees
andmax_depth
to optimize performance. - Lightweight: No external ML libraries required (except
numpy
&pandas
).
- Support for continuous numerical features (currently categorical only).
- Implement Gini Impurity as an alternative to entropy.
- Optimize tree-building for better performance on large datasets.
Contributions are welcome! Follow these steps to contribute:
- Fork the repository.
- Create a new branch:
git checkout -b feature-name
- Make your changes and commit:
git commit -m "Description of changes"
- Push the changes and open a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.