This project involves an in-depth analysis of a diabetes dataset, followed by the implementation of predictive modeling techniques to classify and predict the likelihood of diabetes occurrence. The dataset was carefully examined and preprocessed to ensure data quality and feature relevance.
Three different machine learning classifiers were utilized for predictive modeling:
-
Random Forest Classifier: An ensemble learning method that operates by constructing multiple decision trees and outputting the class that is the mode of the classes of the individual trees. Fine-tuning involved optimizing the number of trees, maximum depth, and other hyperparameters to achieve the best performance.
-
Support Vector Machine (SVM): A powerful linear classifier that aims to find the optimal hyperplane that maximizes the margin between different classes. The SVM model was fine-tuned by adjusting the kernel type, regularization parameter (C), and gamma values to enhance the classification accuracy.
-
Logistic Regression:A widely used statistical method for binary classification problems. The model was fine-tuned by selecting appropriate regularization techniques and adjusting hyperparameters such as the regularization strength.
Each model was fine-tuned using cross-validation techniques to avoid overfitting and to ensure robustness in performance. After fine-tuning, the models' performance was evaluated and compared based on accuracy, precision, recall, and F1-score metrics. The results provide insights into the most effective model for predicting diabetes in the given dataset.