decision-trees
contains implementations of the decision tree classifier, random forest, and boosted trees (with AdaBoost) for binary categorical data.
-
numpy 1.17.3
-
pandas 0.25.2
import pandas as pd
from models.decision_tree import DecisionTreeClassifier
train_set = pd.read_csv('data/pa3_train.csv')
validation_set = pd.read_csv('data/pa3_val.csv')
test_set = pd.read_csv('data/pa3_test.csv')
tree = DecisionTreeClassifier(train=train_set, validation=validation_set,test=test_set,
label='class', max_depth=2)
results = tree.train()
import pandas as pd
from models.random_forest import RandomForestClassifier
train_set = pd.read_csv('data/pa3_train.csv')
validation_set = pd.read_csv('data/pa3_val.csv')
test_set = pd.read_csv('data/pa3_test.csv')
rf = RandomForestClassifier(train=train_set, validation=validation_set, test=test_set,
label='class', n_trees=5, n_features=5, seed=1, max_depth=2)
results = rf.train()
import pandas as pd
from models.adaboost import AdaBoostClassifier
train_set = pd.read_csv('data/pa3_train.csv')
validation_set = pd.read_csv('data/pa3_val.csv')
test_set = pd.read_csv('data/pa3_test.csv')
boosted_trees = AdaBoostClassifier(train=train_set, validation=validation_set, test=test_set,
label='class', n_classifiers=5, max_depth=2)
results = boosted_trees.train()
The data/
folder contains .csv files with training, validation, and test sets.
run_part1.py
creates decision trees with varied depths.run_part2.py
creates random forests with varied parameters.run_part3.py
creates boosted trees with varied parameters.
python main.py
will run all three parts in order, output will be saved in model_output
folder.
- Refactor
AdaBoostClassifier
andRandomForestClassifier
classes to inherit attributes fromDecisionTreeClassifier
class.