This project focuses on training a predictive machine learning model to diagnose diabetes in female patients. The dataset used for training was sourced from Kaggle, consisting of over 800 records. The workflow includes data cleaning, exploratory data analysis, model training, evaluation, and visualization of performance metrics.
- Data cleaning and preprocessing
- Exploratory Data Analysis (EDA) using Pandas
- Implementation of multiple machine learning techniques using Scikit-Learn
- Model evaluation and hyperparameter tuning
- Performance visualization using Seaborn and Matplotlib
- Achieved an accuracy score of up to 80%
- Python
- Pandas, NumPy for data manipulation
- Scikit-Learn for machine learning models
- Seaborn, Matplotlib for visualization
- Python 3.x installed
- Jupyter Notebook or a Python IDE (VS Code, PyCharm, etc.)
- Virtual environment (optional but recommended)
-
Clone the repository:
git clone https://github.com/TheVinh-Ha-1710/Diabetes-Predictive-Model.git cd Diabetes-Predictive-Model
-
Create and activate a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter notebook
- Load and preprocess the dataset.
- Perform exploratory data analysis to understand data insights.
- Train and evaluate various machine learning models.
- Optimize the best-performing model through hyperparameter tuning.
- Visualize model performance with accuracy, confusion matrix, and ROC curve.
📂 Diabetes-Predictive-Model
├── 📜 README.md # Project documentation
├── 📜 diabetes.csv # Model training script notebook
├── 📜 model_training.ipynb # Dataset
├── 📜 model_training.pdf # PDF version of the notebook
├── 📜 requirements.txt # Dependencies