This is my learning project when taking Machine Learning 0-Master course on Udemy. The goal of this project is to predict weather or not someone has heart disease based on their medical attributes via Logestic Regression, KNN Classifier and Random Forest Classifier machine learning models.
Give clinical statement of a patient, can we predict weather or not they have heart disease?
The original data came from Cleavland data from Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/heart+disease
There is also a version available on Kaggle: https://www.kaggle.com/ronitf/heart-disease-uci?select=heart.csv
Python version: 3.7 Packages: pandas, numpy, matplotlib, seaborn, sklearn
I looked into the medical attributes: what are they, any possible corrlation to the heart dieases. Here are some examples:
Heart disease frequency according to sex
Max heart rate and heart disease
Heart disease frequency per chest pain type
correlation matrix of medical atributes
There is no need to preprocess the data for this project, the data set is already processed.
- Split the data into train and test set.
- Write a function to loop through Logisitc Regression, KNeighbors Classifier, and Random Forest Classifier
- Compared the accuracies from three models.
- Hyperparameter tuning 3 models: n_neighbors for KNN model, RandomSearchCV for Logisitic and Random Forest, and GridSearchCV for Logistic.
Result after fine tuning KNN
6.2 Evaluating our tuned machine learning classifier, beyond accuracy (use Logistic as a model example)
- ROC curve and AUC score
- Confusion matrix
- Classification report
- Precision
- Recall
- F1-score