This training project develops a prediction model of risk for heart disease in patients. Potential usecases are automatic first screening of incoming patients for risk of heart disease in hospitals or usage of the resulting tree structure as heuristic for medical professionals.
This analysis is a slightly modified version of an existing Statquest Jupyter Notebook (different dataset).
- Python (scikit-learn, matplotlib)
- Jupyter Notebook (Static Version)
- Data formatting for classification (based on initial exploratory analysis)
- Classification Tree (node separation based on the default setting: Gini Impurity score)
- Tree validation: Cross-validation
- Attributes: 14 patient data attributes (eg. age, sex, cholesterol, chest pain, ...)
- Scale level: Categorical and Float
- The dataset can be found at the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/heart+disease
Please feel free to contact the author at dominikjung[at]gmx[dot]de