Heart Disease Risk Prediction System

This project is an end-to-end machine learning system that predicts the risk of heart disease using clinical features. It is built using the UCI Cleveland Heart Disease dataset and deployed as an interactive Streamlit web application.

⚠️ This project is a research prototype and not a medical diagnostic tool.

Problem Statement

Early detection of heart disease is critical for preventive healthcare. This project aims to estimate the probability of heart disease based on patient clinical attributes using machine learning.

Dataset

Source: UCI Cleveland Heart Disease Dataset
Samples: 304 patients
Target:
- 0 → No heart disease
- 1 → Heart disease present

Only the Cleveland dataset was used to avoid data leakage and corrupted labels present in other variants.

Machine Learning Pipeline

Data filtering (Cleveland-only)
Target binarization
Feature selection & cleanup
Encoding:
- Binary: sex, fbs, exang
- One-hot: chest pain (cp), restecg
Train/test split (stratified)
Models:
- Logistic Regression (baseline)
- XGBoost (final model)
Probability calibration (Isotonic Regression)
Model explainability using SHAP
Deployment using Streamlit

Model Performance (Test Set)

Model	Accuracy	F1 Score	ROC-AUC
Logistic Regression	0.87	0.85	0.92
XGBoost (Calibrated)	0.89	0.88	0.92

Calibration improved probability reliability (Brier score: 0.094).

Key Insights (SHAP)

Top predictive features:

Sex
Oldpeak (ST depression)
Maximum heart rate (thalach)
Age
Chest pain type

Tech Stack

Python
scikit-learn
XGBoost
SHAP
Streamlit
pandas, numpy
joblib

How to Run Locally

pip install -r requirements.txt
streamlit run app/app.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
data		data
.gitignore		.gitignore
HeartDiseaseModel.ipynb		HeartDiseaseModel.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Risk Prediction System

Problem Statement

Dataset

Machine Learning Pipeline

Model Performance (Test Set)

Key Insights (SHAP)

Tech Stack

How to Run Locally

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Ankush-22/Heart-Disease-Risk-Prediction

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Risk Prediction System

Problem Statement

Dataset

Machine Learning Pipeline

Model Performance (Test Set)

Key Insights (SHAP)

Tech Stack

How to Run Locally

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages