Skip to content

JARVIS843/Stroke_Prediction_ML_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


About:

This project aims to predict whether people across all age groups is likely to get strokes, based on eleven distinct features such as gender, age, and diseases etc. It trains various binary classification models using different methods, on Jupyter Notebook.


Installation Instructions:

Setup Conda Environment

To install / clone this project onto your machine, you should:

Run the following command:

conda update conda
git clone https://github.com/JARVIS843/Stroke_Prediction_ML_Project.git
cd Stroke_Prediction_ML_Project
conda env create -f environment.yml
conda activate StrokePredictionML

Add the environment (StrokePredictionML) to Jupyter:

python -m ipykernel install --user --name=StrokePredictionML --display-name "Python (StrokePredictionML)"

(Optional): Confirm the kernel was added successfully:

jupyter kernelspec list

You would then need to manually select the kernel from the Jupyter Interface

Setup Tensorflow with CUDA (Only for Neural Network Models)

It took me 2 hours to setup everything up correctly, so I decided to put the instructions here.

Since this project relies on tensorflow 2.18.0, so Windows Native WOULD NOT WORK!!! (TF gave up its development since 2.11). However, WSL still does.

Before everything, you need to make sure you have installed the newest driver. If you are using Nvidia graphics card that supports CUDA 12.5, you need to make sure your driver version is at least 555.42.02 for Linux (NOT WSL), and 555.85 for Windows. You may manually download specific driver on Nvidia's website, but I recommend using Nvidia Geforce Experience. If you are using WSL2, then you only have to install the newest driver on Windows side, and ABSOLUTELY, DO NOT INSTALL IT ON WSL LINUX, as it will mess up everything.

Then, according to this, install CUDA 12.5. Note, if you are using WLS2 Ubuntu, choose WSL-Ubuntu in the Distribution section (as the Ubuntu version includes driver and may mess up the driver installed on Windows). After installation, check your installation with:

nvcc --version

If it's not found, then you have to add it to PATH with:

export PATH=/usr/local/cuda-12.5/bin${PATH:+:${PATH}}

Finally, you need to install cuDNN 9.3 and follow the instruction on the webpage.

To verify that you have done everything correctly, use the following code (not command) in your jupyter notebook, with the previously established environment and kernel (StrokePredictionML). If setup correctly, it should not be zero (unless your GPU does not support CUDA 12.5 to begin with)

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

And you should be done. If you ever encounter any problems, please consult the following links, as they helped me a bunch when I'm doing this myself:


Project Structure:

The project is divided into two parts: Non-nerual-network models, and Neural Network models

Non-Neural-Network Models:

This part of the project is in ML_project.ipynb, and it's responsible for data clean up, data analysis, data preprocessing, and training a variety of models (Extra Trees Classifier, Gradient boosting, Random Forest, and XGboost). The accuracies of the models are displayed with visualized confusion matrices.

Neural Network Models:

This is the other part of the project, and can be found in Neural_Network_Model.ipynb. In order to be consistent, it uses the same data clean up and preprocessing procedures. It employs a tensorflow 6 layered neural network (5 hidden 1 output and specific specs can be found in Models), and cross tests its accuracies with various optimizers (RMSprop, Nadam,and Adam), learning rates (0.01, 0.001,and 0.0001), batch sizes (32 ,and 64), as well as number of epochs (50 ,and 100). Adapted Learning Rate strategies are also attempted.


Models:

If you would like to use our pre-trained models, or to see the performances of them, all of them can be found: Here

*Note: The models are serialized and exported using Pickle


Dataset Used:

All of the models for this project are trained using the Kaggle Stroke Prediction Dataset.

There's a full pre-dowloaded (downloaded on 12/13/2024) dataset in the Dataset folder to save your time re-downloading it from Kaggle. The beginning of the ML_project.ipynb automatically downloads the newest dataset from Kaggle, whilst Neural_Network_Model.ipynb relies on the pre-downloaded dataset.


Authors & Background:

This project is co-developed by: Jarvis Yang (responsible for Neural Network Model), and Jegyeong An (responsible for Non-Neural-Network Models).

The project was intended to be the final project for the Introduction to Machine Learning course, provided by Professor Sundeep Rangan.


License (MIT):

See License File

About

Various machine learning models that predict whether people of all age groups are going to have strokes.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •