Welcome to my IBM—ML Repository 😄

This repository aims to build highly interpretable and accurate machine learning models that balance variance, bias, and time complexity. The Scikit-Learn framework is being used to build machine learning models and Keras for deep learning 💡

Courses

Moreover, the repository contains hands-on labs of 6 machine learning courses created by IBM, which cover in-depth and breadth numerous ML concepts.

01 - Exploratory Data Analysis

Hands-on Labs: SQL, Hypothesis Testing, Features Transformation, Scaling, Skewness & Importance.

02 - Supervised Machine Learning [Regression]

Hands-on Labs: Cross-Validation, Ridge, Lasso, ElasticNet, Pipelines.

03 - Supervised Machine Learning [Classification]

Hands-on Labs: Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Decision Tree, Random Forrest, Extra Trees, Ensemble, Bagging, Boosting, Stacking, Model-Agnostic, Resampling Techniques.

04 - Unsupervised Machine Learning

Hands-on Labs: Principle Component Analysis, Distance Metrics, Inertia & Distortion, K-means, hierarchical, DBSCAN, Mean Shift Clustering.

05 - Deep Learning and Reinforcement Learning

Hands-on Labs: Gradient Descent, Backpropagation, Artificial NN, Convolutional NN, Recurrent NN.

06 - IBM ML Capstone Project — Online Courses Recommender System

Hands-on Labs: Bag of Words, User-Profile Recommendation, Similarity-Index Recommendation.

Capstone Projects

You are welcome to explore my findings in the personal capstone projects I created during my learning journey.

A - Treatment Costs per Person - Exploratory & Predictive Analysis

• Aim: predict the cost of medical treatments based on six features, namely, age, sex, BMI, children, smoking status, and region.

• Procedure: In-depth EDA via pair, bar, box, violin, and regression plots to see the effect of smoking on charges. Hypothesis testing on the relationship between treatment costs and smoking status.

• Findings: The test indicates that a person with a 35K$ charge or more is likely a smoker with a p-value = 0.023 and a confidence level = 0.977.

B - Forecasting Photovoltaic Generated Power - Regression Analysis

• Aim: create a regression model that predicts the generated power by PV panels to facilitate energy management in power plants.

• Procedure: Deploy a pipeline encompassing polynomial transformation, standard scaling, and regressor models. Then, apply GridSearchCV, hyper-parameters tuning and benchmarking of Regular, Lasso, Ridge, Elastic Net & Gradient Boosting Regressors.

• Findings: The winner is the Gradient Boosting Regressor model with an R2 score of ~ 0.79.

C - Fault Classification in Photovoltaic Plants - Multi-Class Classification Analysis

• Aim: Classify the faults that might occur in photovoltaic panels, namely, Short-Circuit, Open-Circuit, Degradation, and Shadowing.

• Procedure: Data stratified split, features scaling, and re-weighting the imbalanced classes. Then, apply a GridSearchCV, hyper-parameters tuning and benchmarking of Logistic Regression, Decision Tree, and Random Forrest.

• Findings: The winner is the Decision Tree algorithm with an accuracy and a weighted F1-score of ~ 97%.

D - Date Fruit Segmentation & Dimensionality Reduction via PCA - Unsupervised Analysis

• Aim: Cluster date fruits based on their physical features.

• Procedure: Check multicollinearity, scale data, and reduce the number of features via PCA. Then, apply a comparative analysis between K-means, Agglomerative, Mean Shift & DBSCAN clustering.

• Findings: The winner is the k-means++ technique. Also, an accuracy of 76% was scored with only two PCAs.

E - MRI Brain Tumor Classification via CNN - Deep Learning Analysis

• Aim: Detect whether a patient has a brain tumor or not.

• Procedure: Convert images to a NumPy array and scale them. Build a convolutional network and train the CNN model to classify brain tumors. Then, deploy the deep learning model using Flask app.

• Findings: The CNN model accuracy is 97%.

F - Personalized Course Recommendation System for Data Science Learners.

• Aim: To build a recommendation system that recommends the most suitable courses for learners on educational platforms.

• Procedure: As listed in the findings, several techniques are used to build the recommendation system.

• Findings: The recommender system is created via eight approaches. Firstly, the content-based approaches.

Approach 1 - Content-Based Recommender Using User Profile and Course Genres

Approach 2 - Content-Based Recommender Using Course Similarities

Approach 3 - Content-Based Recommender Using PCA Clustering

• Findings: The remaining five approaches are collaborative-based. The comparison between them is based on RMSE.

Approach 4 - Collaborative-Filtering Recommender Using K Nearest Neighbor

Approach 5 - Collaborative-Filtering Recommender Using Non-negative Matrix Factorization

Approach 6 - Collaborative-Filtering Recommender Using Neural Networks

Approach 7 - Collaborative-Filtering Recommender Using Embedding Features Regression

Approach 8 - Collaborative-Filtering Recommender using Embedding Features Classification

Acknowledgment

My friend, Mohamad Osman's ML-Repo has been a great source of inspiration. I implore you to have a look at his remarkable work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Welcome to my IBM—ML Repository 😄

Courses

01 - Exploratory Data Analysis

02 - Supervised Machine Learning [Regression]

03 - Supervised Machine Learning [Classification]

04 - Unsupervised Machine Learning

05 - Deep Learning and Reinforcement Learning

06 - IBM ML Capstone Project — Online Courses Recommender System

Capstone Projects

A - Treatment Costs per Person - Exploratory & Predictive Analysis

B - Forecasting Photovoltaic Generated Power - Regression Analysis

C - Fault Classification in Photovoltaic Plants - Multi-Class Classification Analysis

D - Date Fruit Segmentation & Dimensionality Reduction via PCA - Unsupervised Analysis

E - MRI Brain Tumor Classification via CNN - Deep Learning Analysis

F - Personalized Course Recommendation System for Data Science Learners.

Acknowledgment

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Welcome to my IBM—ML Repository 😄

Courses

01 - Exploratory Data Analysis

02 - Supervised Machine Learning [Regression]

03 - Supervised Machine Learning [Classification]

04 - Unsupervised Machine Learning

05 - Deep Learning and Reinforcement Learning

06 - IBM ML Capstone Project — Online Courses Recommender System

Capstone Projects

A - Treatment Costs per Person - Exploratory & Predictive Analysis

B - Forecasting Photovoltaic Generated Power - Regression Analysis

C - Fault Classification in Photovoltaic Plants - Multi-Class Classification Analysis

D - Date Fruit Segmentation & Dimensionality Reduction via PCA - Unsupervised Analysis

E - MRI Brain Tumor Classification via CNN - Deep Learning Analysis

F - Personalized Course Recommendation System for Data Science Learners.

Acknowledgment