Skip to content

Latest commit

 

History

History
35 lines (24 loc) · 2.19 KB

README.md

File metadata and controls

35 lines (24 loc) · 2.19 KB

STROKE-PREDICTION-PROJECT

DATA SCIENCE PROJECT: PERFORMING EDA, HYPOTHESIS TESTING, VISUALIZATIONS, CONSTRUCTING A MODEL AND PREDICTING A STROKE OCCURANCE FROM DATA.

This project was a task given to us by a professor in one of our uni courses. We are sophmores majoring in AI ENGINEERING and the course of this project is called introduction to data science. The code contains EDA, a lot of visualization and an SVM model to predict a stroke occurance. Data are from two sources in kaggle more detail on sources are going to be given later on.

please refer to this link to show interactive plots:https://nbviewer.org/github/PURPLEWATER00/STROKE-PREDICTION-PROJECT/blob/main/final%20final.ipynb

model deployed in streamlit link: https://purplewater00-stroke-prediction-project-main-vbxln1.streamlit.app/

  • EDA
  • SVM model
  • Stroke prediction
  • Synthetically generated data
  • Synthetically generated data and real life data

PROJECT STRUCTRE

The following map shows the flow of the project: LAST

PROJECT USAGE:

If you want to use the code there are several steps you might want to consider before runnin the code:

  • train set and test set are constructed of the following:
    • train set: constructed from two sources one was synthetically generated and the other is a real life data set both from kaggle (Link will be given at the end). The train1_df and train2_df variable are where either file names could go in.
    • test set: constructed of the test set of the synthetically generated data competition.

The rest of the code will run successfully if test set and train set are specified as the instructed way above.

Found a bug?

We would love some feedback in the comments. Please be as ruthless as possible we would love to learn from anyone willing to point out any issue. (professor if you are reading this please give us feedback :) )

links: