This repository showcases a series of projects and assignments in the field of data mining and machine learning, with a particular focus on analyzing the Pima Indian Diabetes dataset. It includes detailed explorations of data cleaning, visualization techniques, and the development of machine learning models aimed at providing insightful analyses and predictions. This collection of work is a reflection of my skills and understanding in applying data science methodologies to real-world datasets.
- Description: This project involves extensive data cleaning and visualization techniques to prepare the dataset for predictive modeling.
- Tools & Technologies: Python, Pandas, Matplotlib, Seaborn
- Description: Building on the cleaned dataset, this project develops a machine learning model to predict the likelihood of diabetes in the Pima Indian population.
- Tools & Technologies: Python, Scikit-learn, Jupyter Notebook
- In-depth analysis of the Pima Indian Diabetes dataset, focusing on identifying key patterns and relationships.
- Comprehensive data cleaning and visualization to prepare the dataset for predictive modeling.
- Application of various data preprocessing techniques, including Principal Component Analysis (PCA) for dimensionality reduction.
- Exploration of multiple machine learning models for classification, including KNN and decision tree models.
- Use of advanced model selection techniques like stratified sampling in KNN and 10-fold cross-validation in decision tree models.
- Implementation of supervised feature selection techniques and the filter method to enhance model accuracy and efficiency.
- Extensive evaluation of model performance using metrics like accuracy, precision, recall, and F1-score to ensure robustness and reliability.
- Dedication to optimizing model parameters and methodology to achieve the highest possible prediction accuracy.
- Explore the
Data Cleaning and Visualisingfolder for notebooks and data files related to the initial stages of the data science pipeline from data collection, visualisation and analysis, . - Visit the
Developing a Machine Learning modelfolder for machine learning techniques applied to the processed dataset and accuracy prediction models to predict diabetes in this case.
(I want to Acknowledge my tutor and colleagues for the experience of working with them throughout the year to unlock my full potential in completing these projects, all the best to them)