Data-Mining-and-Machine-Learning

Overview

This repository showcases a series of projects and assignments in the field of data mining and machine learning, with a particular focus on analyzing the Pima Indian Diabetes dataset. It includes detailed explorations of data cleaning, visualization techniques, and the development of machine learning models aimed at providing insightful analyses and predictions. This collection of work is a reflection of my skills and understanding in applying data science methodologies to real-world datasets.

Projects

1. Data Cleaning and Visualization

Description: This project involves extensive data cleaning and visualization techniques to prepare the dataset for predictive modeling.
Tools & Technologies: Python, Pandas, Matplotlib, Seaborn

2. Developing a Machine Learning Model

Description: Building on the cleaned dataset, this project develops a machine learning model to predict the likelihood of diabetes in the Pima Indian population.
Tools & Technologies: Python, Scikit-learn, Jupyter Notebook

Key Features

In-depth analysis of the Pima Indian Diabetes dataset, focusing on identifying key patterns and relationships.
Comprehensive data cleaning and visualization to prepare the dataset for predictive modeling.
Application of various data preprocessing techniques, including Principal Component Analysis (PCA) for dimensionality reduction.
Exploration of multiple machine learning models for classification, including KNN and decision tree models.
Use of advanced model selection techniques like stratified sampling in KNN and 10-fold cross-validation in decision tree models.
Implementation of supervised feature selection techniques and the filter method to enhance model accuracy and efficiency.
Extensive evaluation of model performance using metrics like accuracy, precision, recall, and F1-score to ensure robustness and reliability.
Dedication to optimizing model parameters and methodology to achieve the highest possible prediction accuracy.

How to Navigate this Repository

Explore the Data Cleaning and Visualising folder for notebooks and data files related to the initial stages of the data science pipeline from data collection, visualisation and analysis, .
Visit the Developing a Machine Learning model folder for machine learning techniques applied to the processed dataset and accuracy prediction models to predict diabetes in this case.

Acknowledgments

(I want to Acknowledge my tutor and colleagues for the experience of working with them throughout the year to unlock my full potential in completing these projects, all the best to them)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
1.Data Cleaning and Visualising		1.Data Cleaning and Visualising
2. Developing a Machine Learning model		2. Developing a Machine Learning model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Mining-and-Machine-Learning

Overview

Projects

1. Data Cleaning and Visualization

2. Developing a Machine Learning Model

Key Features

How to Navigate this Repository

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

GeorgeNich/Data-Mining-and-Machine-learning

Folders and files

Latest commit

History

Repository files navigation

Data-Mining-and-Machine-Learning

Overview

Projects

1. Data Cleaning and Visualization

2. Developing a Machine Learning Model

Key Features

How to Navigate this Repository

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages