Predicting Breast Cancer in a patient
In this project we used three different Ensemble models to our Breast Cancer Diagnosis dataset. Our results are reported with 75:25 training test data split and for 10 fold cross validation.
- our proposed ensemble learning models achieved F1 score accuracies of 94%, 90% and 92% respectively.
- Voting Classifier performs better than XG Boost with an accuracy of 94%
Since our project is medical diagnosis we need to give maximum importance to Type II error in statistics(False Negative).False Negative is that the truth is positive, but the test predicts a negative. The person is sick, but the test inaccurately reports that they are not.To know the performance over False negative rate we will compare our ensemble models with confusion matrix ,ROC and DET curves.
- Voting classifier has less number False Negatives compared to XG Boost and Bagging Classifier
-
While comparing ROC Curves we found Voting classifier lies at ideal point that is top left corner and has larger area under the curve (AUC) which is 0.99 compared to XG Boost which has 0.98 and Bagging classifier with 0.94
-
The DET Curve has distinct advantages over the standard ROC type curve for presenting performance results where tradeoffs of two error types are involved. Here we can observe Voting Classifier has lesser error tradeoff compared to XG Boost.
Experimental results show that Voting Classifier(soft voting) was the most powerful prediction model than other ensemble machine learning techniques for our Breast Cancer dataset.