Skip to content

Latest commit

 

History

History
108 lines (63 loc) · 8.92 KB

readme.md

File metadata and controls

108 lines (63 loc) · 8.92 KB

Welcome to my IBM—ML Repository 😄

Python Jupyter Notebook Pandas NumPy scikit-learn Keras

This repository aims to build highly interpretable and accurate machine learning models that balance variance, bias, and time complexity. The Scikit-Learn framework is being used to build machine learning models and Keras for deep learning 💡

Courses

Moreover, the repository contains hands-on labs of 6 machine learning courses created by IBM, which cover in-depth and breadth numerous ML concepts.

Hands-on Labs: SQL, Hypothesis Testing, Features Transformation, Scaling, Skewness & Importance.

Hands-on Labs: Cross-Validation, Ridge, Lasso, ElasticNet, Pipelines.

Hands-on Labs: Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Decision Tree, Random Forrest, Extra Trees, Ensemble, Bagging, Boosting, Stacking, Model-Agnostic, Resampling Techniques.

Hands-on Labs: Principle Component Analysis, Distance Metrics, Inertia & Distortion, K-means, hierarchical, DBSCAN, Mean Shift Clustering.

Hands-on Labs: Gradient Descent, Backpropagation, Artificial NN, Convolutional NN, Recurrent NN.

Hands-on Labs: Bag of Words, User-Profile Recommendation, Similarity-Index Recommendation.

Capstone Projects

You are welcome to explore my findings in the personal capstone projects I created during my learning journey.

• Aim: predict the cost of medical treatments based on six features, namely, age, sex, BMI, children, smoking status, and region.

• Procedure: In-depth EDA via pair, bar, box, violin, and regression plots to see the effect of smoking on charges. Hypothesis testing on the relationship between treatment costs and smoking status.

• Findings: The test indicates that a person with a 35K$ charge or more is likely a smoker with a p-value = 0.023 and a confidence level = 0.977.

• Aim: create a regression model that predicts the generated power by PV panels to facilitate energy management in power plants.

• Procedure: Deploy a pipeline encompassing polynomial transformation, standard scaling, and regressor models. Then, apply GridSearchCV, hyper-parameters tuning and benchmarking of Regular, Lasso, Ridge, Elastic Net & Gradient Boosting Regressors.

• Findings: The winner is the Gradient Boosting Regressor model with an R2 score of ~ 0.79.

• Aim: Classify the faults that might occur in photovoltaic panels, namely, Short-Circuit, Open-Circuit, Degradation, and Shadowing.

• Procedure: Data stratified split, features scaling, and re-weighting the imbalanced classes. Then, apply a GridSearchCV, hyper-parameters tuning and benchmarking of Logistic Regression, Decision Tree, and Random Forrest.

• Findings: The winner is the Decision Tree algorithm with an accuracy and a weighted F1-score of ~ 97%.

• Aim: Cluster date fruits based on their physical features.

• Procedure: Check multicollinearity, scale data, and reduce the number of features via PCA. Then, apply a comparative analysis between K-means, Agglomerative, Mean Shift & DBSCAN clustering.

• Findings: The winner is the k-means++ technique. Also, an accuracy of 76% was scored with only two PCAs.

• Aim: Detect whether a patient has a brain tumor or not.

• Procedure: Convert images to a NumPy array and scale them. Build a convolutional network and train the CNN model to classify brain tumors. Then, deploy the deep learning model using Flask app.

• Findings: The CNN model accuracy is 97%.

• Aim: To build a recommendation system that recommends the most suitable courses for learners on educational platforms.

• Procedure: As listed in the findings, several techniques are used to build the recommendation system.

• Findings: The recommender system is created via eight approaches. Firstly, the content-based approaches.

Approach 1 - Content-Based Recommender Using User Profile and Course Genres

Approach 2 - Content-Based Recommender Using Course Similarities

Approach 3 - Content-Based Recommender Using PCA Clustering

• Findings: The remaining five approaches are collaborative-based. The comparison between them is based on RMSE.

Approach 4 - Collaborative-Filtering Recommender Using K Nearest Neighbor

Approach 5 - Collaborative-Filtering Recommender Using Non-negative Matrix Factorization

Approach 6 - Collaborative-Filtering Recommender Using Neural Networks

Approach 7 - Collaborative-Filtering Recommender Using Embedding Features Regression

Approach 8 - Collaborative-Filtering Recommender using Embedding Features Classification

Acknowledgment

My friend, Mohamad Osman's ML-Repo has been a great source of inspiration. I implore you to have a look at his remarkable work.