Skip to content

Comparing logistic regression, decision tree, random forest, k-nearest neighbors, and SVMs in regard to binary prediction performance metrics.

Notifications You must be signed in to change notification settings

connormcmanigal/Diabetes-Supervised-Machine-Learning-Analysis-And-Prediction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 

Repository files navigation

COGS118A Final Project

Binary Classification: Diabetes Prediction

Comparing logistic regression, decision tree, random forest, k-nearest neighbors, and SVMs

Abstract:

This project aims to solve the difficulty associated with making accurate diagnoses of diabetes in patients. If a patient is incorrectly diagnosed, it could lead to dire consequences, such as additional health issues or even death. Our goal is to solve this problem by designing machine learning algorithms that will accurately predict whether a patient has diabetes. Our data encompasses eight features such as age, gender, body mass index(BMI), hypertension, heart disease, smoking history, HbA1c levels, and blood glucose levels, along with their diabetes status: positive or negative. These electronic health records are collected through surveys, medical records, and laboratory tests from individuals by healthcare providers in hospitals or clinics. With this data, we will train multiple binary classification algorithms and select the algorithm that provides the highest sensitivity. We will compare the performances of logistic regression, decision tree, k-nearest neighbor, and support vector machines to see which algorithm best suits our needs. We will measure performance using sensitivity, precision, specificity, ROC-AUC, and precision-recall curves with a heavy emphasis on high recall, as it is important to detect all the positive diabetes cases in order to provide immediate treatment.

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 56.8%
  • HTML 43.2%