We are using the dataset collected from the University of Wisconsin. This dataset contains features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. We want to classify the masses as either benign or malignant.
Breast cancer is a common type of cancer, affecting more than a million people annually in India alone. There are 2 types of tumors associated with breast cancer namely benign and malignant tumors. By using Machine Learning and classification algorithms, it can be possible to detect a malignant growth before it gets dangerous and can possibly save a person’s life. In this project, we aim to properly visualise the data given to us and predict the model's effeciency and effectiveness based on certain parameters and compare the results achieved by using 3 different train/test ratios to see which is the best algorithm we can use.
The parameters we will be comparing each model with are: accuracy, precision, recall, f1-measure, sensitivity, specificity, false positive rate, false negative rate, negative predictive values, false discovery rate, matthews' corrlation coefficient, confusion matrices, loss vs epoch graph, accuracy vs epoch graph, mean squared error and ROC Curve.
- To clean and pre-process the dataset to improve the accuracy of the dataset.
- To use standard data visualisation techniques to represent the data and find outliers, patterns, correlations etc.
- Implement Gaussian Naive Bayes classifier algorithm on 3 different train/test ratios and to compare the results achieved on the basis of various parameters.
The loss vs epoch graphs and accuray vs epoch graphs have been achieved using a 3-layered, sequential artificial neural network.