This repository aims to build highly interpretable and accurate machine learning models that balance variance, bias, and time complexity. The Scikit-Learn framework is being used to build machine learning models and Keras for deep learning 💡
Moreover, the repository contains hands-on labs of 6 machine learning courses created by IBM, which cover in-depth and breadth numerous ML concepts.
Hands-on Labs: SQL, Hypothesis Testing, Features Transformation, Scaling, Skewness & Importance.
Hands-on Labs: Cross-Validation, Ridge, Lasso, ElasticNet, Pipelines.
Hands-on Labs: Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Decision Tree, Random Forrest, Extra Trees, Ensemble, Bagging, Boosting, Stacking, Model-Agnostic, Resampling Techniques.
Hands-on Labs: Principle Component Analysis, Distance Metrics, Inertia & Distortion, K-means, hierarchical, DBSCAN, Mean Shift Clustering.
Hands-on Labs: Gradient Descent, Backpropagation, Artificial NN, Convolutional NN, Recurrent NN.
Hands-on Labs: Bag of Words, User-Profile Recommendation, Similarity-Index Recommendation.
You are welcome to explore my findings in the personal capstone projects I created during my learning journey.
• Aim: predict the cost of medical treatments based on six features, namely, age, sex, BMI, children, smoking status, and region.
• Procedure: In-depth EDA via pair, bar, box, violin, and regression plots to see the effect of smoking on charges. Hypothesis testing on the relationship between treatment costs and smoking status.
• Findings: The test indicates that a person with a 35K$ charge or more is likely a smoker with a p-value = 0.023 and a confidence level = 0.977.
• Aim: create a regression model that predicts the generated power by PV panels to facilitate energy management in power plants.
• Procedure: Deploy a pipeline encompassing polynomial transformation, standard scaling, and regressor models. Then, apply GridSearchCV, hyper-parameters tuning and benchmarking of Regular, Lasso, Ridge, Elastic Net & Gradient Boosting Regressors.
• Findings: The winner is the Gradient Boosting Regressor model with an R2 score of ~ 0.79.
• Aim: Classify the faults that might occur in photovoltaic panels, namely, Short-Circuit, Open-Circuit, Degradation, and Shadowing.
• Procedure: Data stratified split, features scaling, and re-weighting the imbalanced classes. Then, apply a GridSearchCV, hyper-parameters tuning and benchmarking of Logistic Regression, Decision Tree, and Random Forrest.
• Findings: The winner is the Decision Tree algorithm with an accuracy and a weighted F1-score of ~ 97%.
• Aim: Cluster date fruits based on their physical features.
• Procedure: Check multicollinearity, scale data, and reduce the number of features via PCA. Then, apply a comparative analysis between K-means, Agglomerative, Mean Shift & DBSCAN clustering.
• Findings: The winner is the k-means++ technique. Also, an accuracy of 76% was scored with only two PCAs.
• Aim: Detect whether a patient has a brain tumor or not.
• Procedure: Convert images to a NumPy array and scale them. Build a convolutional network and train the CNN model to classify brain tumors. Then, deploy the deep learning model using Flask app.
• Findings: The CNN model accuracy is 97%.
• Aim: To build a recommendation system that recommends the most suitable courses for learners on educational platforms.
• Procedure: As listed in the findings, several techniques are used to build the recommendation system.
• Findings: The recommender system is created via eight approaches. Firstly, the content-based approaches.
Approach 1 - Content-Based Recommender Using User Profile and Course Genres
Approach 2 - Content-Based Recommender Using Course Similarities
Approach 3 - Content-Based Recommender Using PCA Clustering
• Findings: The remaining five approaches are collaborative-based. The comparison between them is based on RMSE.
Approach 4 - Collaborative-Filtering Recommender Using K Nearest Neighbor
Approach 5 - Collaborative-Filtering Recommender Using Non-negative Matrix Factorization
Approach 6 - Collaborative-Filtering Recommender Using Neural Networks
Approach 7 - Collaborative-Filtering Recommender Using Embedding Features Regression
Approach 8 - Collaborative-Filtering Recommender using Embedding Features Classification
My friend, Mohamad Osman's ML-Repo has been a great source of inspiration. I implore you to have a look at his remarkable work.