Implementation of three decision tree algorithms in C++. Namely id3, RandomForest and AdaBoost(with id3).
This implementation can handle continous and missing value attributes.
- Compile the code
$ g++ -w -o decision_tree decision_tree.cpp
- Run any one of the algorithms
where ALGORITHM can be one of
$ ./decision_tree <ALGORITHM>
id3
,random_forest
, oradaboost
.
Adult Dataset has been used.
I split the data beforehand. datafiles/data.txt contains the training instances, and testfiles/test.txt contains the testing instances.
NOTE: Code was written specifically for this dataset and will require significant changes before running on some other dataset.
Algorithm | Accuracy (%) | Runtime |
---|---|---|
id3 | 80.5356 | ~1min |
RandomForest | 82.4335 | ~3mins |
AdaBoost | 81.3218 | ~3mins |
I did this project to get a better understanding of the said algorithms, and thus, performance optimizations were not a priority. Coming back to the code after several years, I can see that there are several oppurtunities for improvement.