Description
-
Dataset used: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
Why?- letsfcancer
- Recommended dataset to use (447 scholarly articles cite this dataset)
-
Methodology
- Data preprocessing: Analyze features with histograms and heat maps. Since the data set is small (less than 600 rows), I do not need to use PCA. I will follow most of what the following URL recommends: https://www.kaggle.com/kanncaa1/feature-selection-and-data-visualization.
- Machine Learning model: The objective of this project is to predict whether the cancer is benign or malignant. From my research, it seems that Random Forest, Decision tree and SVM gave good results of around 99% accuracy.
- Final conceptualization: Simple web page where you manually enter features of a cell and it predicts whether it is malignant or benign.