Skip to content

Classifying breast cancer scans as malignant or benign with various ML classification algos

Notifications You must be signed in to change notification settings

Anamay23/Mammography_Masses_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mammography_Masses_Classification

This project is a case study comparing the accuracies of different supervised classification algorithms for the Mammographic masses dataset provided by UC, Irvine.

Problem

Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last years. These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead.

Data set

The dataset contains 961 instances of masses detected in mammograms, and contains the following attributes:

  • BI-RADS assessment: 1 to 5 (ordinal)
  • Age: Patient's age in years (integer)
  • Shape: Mass shape: (round = 1, oval = 2, lobular = 3, irregular = 4) (nominal)
  • Margin: Mass Margin: (circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5) (nominal)
  • Density: Mass Density: (high=1 iso=2 low=3 fat-containing=4) (ordinal)
  • Severity: benign=0 or malignant=1 (binominal)

Ignoring BI-RADS as it is NOT a predictive attribute

Objective

  1. To clean and pre-process the data for any missing/spurious values.
  2. To predict whether the mass is malignant or benign using different suprevised classification techniques.
  3. To see which one yields the HIGHEST accuracy as measured with K-Fold cross validation. (K = best value found after trial and error)

Results

After running various classifiers, the accuracies are as follows:

  • Decision Trees - 76.12%
  • Random Forest - 77.25%
  • SVM (linear kernel) - 79.87%
  • SVM (polynomial kernel) - 79.03%
  • SVM (RBF kernel) - 80.34%
  • K Nearest Neighbours - 80.02% (for k = 7) tested upto k = 50
  • Naive Bayes - 78.31%
  • Logistic Regression - 80.61%
  • Artificial Neural Network (Keras) - 80.36% (for 10 epochs)

Most of the classifiers have an accuracy between 78-80% with Decision Trees and Random Classifier being the outliers. The classifier with the highest accuracy is Logistic Regresssion.

References

This case study is part of the final project for this Udemy course.

About

Classifying breast cancer scans as malignant or benign with various ML classification algos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages