Ensemble-Learning-Comparison-on-Diabetes-Classification

Comparison of ensemble learning methods on diabetes disease classification with various datasets

About The Project

This project compares various ensemble learning techniques for the classification of diabetes disease. Ensemble methods combine multiple machine learning models to improve predictive performance and robustness.
In this project, we explore and compare the effectiveness of popular ensemble algorithms, such as Random Forest, AdaBoost, Gradient Boosting, and more, in diagnosing diabetes based on three different datasets of relevant features.
Key Features:
- Implementation of different ensemble methods for classification
- Evaluation and comparison of model performance using metrics like accuracy, precision, recall, and F1-score
- Jupyter notebooks with detailed explanations and visualizations
- Dataset used for experimentation
- Code for preprocessing, model training, and evaluation
This project has already been published in JMASIF (Jurnal Masyarakat Informatika) with the title Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes.

Technology Used

Python
Pandas
Matplotlib
Seaborn
Scikit-learn
xgboost
lightgbm
catboost

Objectives/ Problems

Diabetes is a medical condition characterized by elevated blood sugar levels. According to the World Health Organization (WHO), the number of diabetes cases increased from 108 million to 422 million between 1980 and 2014. Machine Learning offers methods like Ensemble Learning for diabetes classification. This study compares three Ensemble Learning techniques, namely Bagging, Boosting, and Stacking, using three datasets: Pima Indians Diabetes, Frankfurt Hospital Diabetes, and Sylhet Hospital Diabetes.

Dataset Used

Workflow

Data Preprocessing
- MinMaxScaler for each dataset (change range of data to to fall within 0 and 1)
Data Exploration
Feature Engineering
Data Splitting
- 80% Training data
- 20% Testing data
Model Building
Model Training & Testing
Model Evaluation
- Accuracy
- Precision
- Recall
- F1-score

Algorithms/ Methods

Bagging	Boosting	Stacking

Bagging
- Bagging
- Random Forest
- Extra Trees
Boosting
- Adaptative Boosting
- Gradient Boosting
- Extreme Gradient Boosting
- Light Gradient Boosting
- Cat Boosting
Stacking
- Stacked Generalization

Performance (Accuracy)

Dataset 1 (Pima Indians Diabetes Database)

Dataset 2 (Frankfurt Hospital Diabetes Dataset)

Dataset 3 (Sylhet Hospital Diabetes Dataset)

Conclusion

In general, all Boosting methods give the best results for all datasets, but specifically, the Light Gradient Boosting method gives the best results in most of the data (Dataset 2 & Dataset 3)

Publications

L. M. Cendani, and A. Wibowo, "Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes," JURNAL MASYARAKAT INFORMATIKA, vol. 13, no. 1, pp. 33 - 44, May. 2022. https://doi.org/10.14710/jmasif.13.1.42912

Contributors

Linggar Maretva Cendani - linggarmc@gmail.com
Adi Wibowo - bowo.adi@live.undip.ac.id

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
images		images
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ensemble-Learning-Comparison-on-Diabetes-Classification

About The Project

Technology Used

Objectives/ Problems

Dataset Used

Workflow

Algorithms/ Methods

Performance (Accuracy)

Dataset 1 (Pima Indians Diabetes Database)

Dataset 2 (Frankfurt Hospital Diabetes Dataset)

Dataset 3 (Sylhet Hospital Diabetes Dataset)

Conclusion

Publications

Contributors

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

LinggarM/Ensemble-Learning-Comparison-on-Diabetes-Classification

Folders and files

Latest commit

History

Repository files navigation

Ensemble-Learning-Comparison-on-Diabetes-Classification

About The Project

Technology Used

Objectives/ Problems

Dataset Used

Workflow

Algorithms/ Methods

Performance (Accuracy)

Dataset 1 (Pima Indians Diabetes Database)

Dataset 2 (Frankfurt Hospital Diabetes Dataset)

Dataset 3 (Sylhet Hospital Diabetes Dataset)

Conclusion

Publications

Contributors

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages