This repository serves as a comprehensive portfolio for Machine Learning (ML) and Deep Learning (DL) projects, completed as part of the Codecademy AI/ML Engineering certification. It encapsulates a wide array of fundamental and advanced concepts, from traditional supervised and unsupervised learning to modern neural network architectures.
The projects are implemented primarily in Python using Jupyter Notebooks, leveraging popular libraries such as scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch. Each folder contains project code and analysis relevant to the topic.
This repository documents the comprehensive portfolio of projects completed as part of the Codecademy AI/ML Engineering + Data Science: Machine Learning Specialist Certification. The work within these folders demonstrates proficiency in:
- Machine Learning Fundamentals: Supervised (Regression and Classification), Unsupervised (Clustering), and Ensemble methods.
- Deep Learning: Implementing neural networks using TensorFlow/Keras and PyTorch.
- Data Science Workflow: Exploratory Data Analysis (EDA), Data Visualization, Feature Engineering, and Model Selection/Tuning.
- Core Libraries: Extensive use of
scikit-learn,pandas,NumPy,Matplotlib,TensorFlow, andPyTorch.
The structured organization reflects the curriculum's progression, moving from foundational statistics and data visualization through traditional machine learning algorithms to advanced deep learning architectures and deployment concepts.
| Project | Description |
|---|---|
| Boosting | Predict whether or not a person makes more than $50,000 using census data, demonstrating the power of sequential ensemble methods. |
A collection of diverse classification tasks:
| Project | Algorithm | Description |
|---|---|---|
| Classifying Tweets | Naive Bayes Classifier | Uses a Naive Bayes Classifier to find patterns in real tweets and predict their origin (New York, London, or Paris). |
| Classifying Viral Tweets | K-Nearest Neighbor (KNN) | Employs the K-Nearest Neighbor algorithm to predict whether a tweet will go viral based on its features. |
End-to-end data analysis and model building projects:
| Project | Description |
|---|---|
| Bio Diversity | Analyze biodiversity data from the National Parks Service, focusing on various species observed across different national park locations. |
| OkCupid | Comprehensive project involving scoping, data preparation, analysis, and building a machine learning model using data from the OKCupid online dating application. |
Projects focused on creating informative and clear visualizations for data exploration:
| Project | Type | Focus |
|---|---|---|
| Categorical Data | EDA | Visual exploration of Mushroom datasets. |
| EDA | EDA | Airline Analysis data investigation. |
| Line Graph | Time Series | Tracking Online Lime Sales over time. |
| Portfolio Project | Multiple Visuals | Analyzing the relationship between Life Expectancy and GDP. |
| Project | Description |
|---|---|
| Find the flag! | Use Decision Trees to predict the continent of flags based on various features (colors, shapes, etc.), and explore feature importance. |
Advanced projects using TensorFlow/Keras for Deep Learning tasks:
| Type | Project | Description |
|---|---|---|
| Classification | Galaxies | Classifying different types of Galaxies using Convolutional Neural Networks (CNNs). |
| Classification | Heart Failure | Predict the survival of patients with heart failure. |
| Classification | X-Rays | Analyzing Lung Scans (X-Rays) to predict pneumonia, Covid-19, or no illness. |
| Regression | Chances of Admission | Predicting a student's chances of admission to a university. |
Detailed projects on initial data investigation and cleaning:
| Project | Focus Dataset |
|---|---|
| Diabetes | Analyzing health and risk factors associated with diabetes. |
| NBA Trends | Investigating trends and statistics within the National Basketball Association. |
| Stackoverflow | Exploring developer survey data from Stack Overflow. |
| Students | Analyzing student performance and demographic data. |
Projects focused on transforming raw data into features that best represent the underlying problem:
| Method | Project | Description |
|---|---|---|
| Filter Method | Customer Reviews | Applying filter methods (e.g., statistical tests) to select relevant features from a dataset of customer reviews on a clothing brand. |
| Wrapper Method | Obesity on lifestyle | Implementing wrapper methods (e.g., Recursive Feature Elimination) to determine the best subset of lifestyle factors for predicting obesity. |
| Project | Description |
|---|---|
| Classify Raisins | Classifying different types of raisins (Kecimen and Besni) by implementing and comparing two tuning techniques: Grid Search for a Decision Tree Classifier and Random Search for a Logistic Regression Classifier. |
| Project | Algorithm | Description |
|---|---|---|
| Handwriting Recognition | K-Means Clustering | Using the unsupervised K-Means algorithm to cluster and recognize patterns in handwriting data. |
| Project | Algorithm | Description |
|---|---|---|
| Breast Cancer Classifier | K-Nearest Neighbor (KNN) | Building a model to classify and predict the diagnosis of breast cancer based on medical features. |
Projects demonstrating the fundamental Linear Regression model:
| Implementation | Description |
|---|---|
| Scratch | Implementation of Traditional Linear Regression from scratch, providing a deep understanding of the underlying mathematics. |
| Sklearn | Utilizing the scikit-learn library for efficient implementation of Linear Regression. |
Projects on binary and multi-class classification using Logistic Regression:
| Project | Description |
|---|---|
| Credit Card Fraud | Building a Logistic Regression model to detect and classify instances of credit card fraud. |
| Income Classification | Classifying individuals based on demographic data to predict their income bracket (e.g., $50K+). |
| Project | Description |
|---|---|
| Classification Model | Creating a complete Machine Learning Pipeline to build a classification model for diagnosing hematologic diseases in pediatric patients. |
Projects extending Linear Regression to multiple predictor variables:
| Project | Description |
|---|---|
| Tennis Ace | Predicting the outcome (e.g., score, ranking) for a tennis player based on multiple playing habits and statistics. |
| Yelp Regression | Investigating factors that most affect a restaurant's Yelp rating and building a model to predict the rating. |
| Project | Algorithm | Description |
|---|---|---|
| Email Similarity | Implementing the Naive Bayes Classifier to measure and classify email similarity based on content. |
| Project | Description |
|---|---|
| Life Expectancy | Using TensorFlow/Keras to build a Neural Network model to predict the life expectancy of countries based on socio-economic and health factors. |
| Project | Description |
|---|---|
| Logic Gates | Modeling the fundamental building blocks of computersโlogic gates (AND, OR, and XOR)โusing simple Perceptrons. |
| File | Description |
|---|---|
| script_1.py | Classification task using PCA on the Telescope dataset to classify particles into gamma (signal) or hadrons (background). |
| script_2.py | Standalone implementation of the PCA algorithm for dimensionality reduction. |
Projects leveraging the PyTorch deep learning framework:
| Project | Description |
|---|---|
| EV_Charging | Using Neural Networks built in PyTorch for predicting Residential EV Charging Loads. |
| Hotel_Cancellation | Building a PyTorch model for predicting Hotel Booking Cancellations. |
| Project | Description |
|---|---|
| Census Data | Using the Random Forest ensemble method to predict whether or not a person makes more than $50,000 using census data. |
| Project | Description |
|---|---|
| Book Recommender System | Building a system that suggests books to users based on collaborative filtering or content-based methods. |
| Project | Description |
|---|---|
| Predict Wine Quality | Applying Regularization techniques (L1/L2) to a regression model to improve generalization and predict Wine Quality. |
Foundational projects covering core statistical concepts for data science:
| Category | Project | Focus |
|---|---|---|
| Hypothesis Testing | Blood Transfusion, Famburg, Fetchmaker, Heart Disease | Implementing statistical tests to analyze data and draw conclusions in various contexts. |
| Probability | Product Defects | Calculating and analyzing probabilities related to manufacturing defects. |
| Sampling | Dance Party | Exploring different sampling techniques in the context of event data. |
| Project | Description |
|---|---|
| Baseball Strike Zones | Using Support Vector Machines (SVMs) to classify and predict Baseball Strike Zones. |
| Project | Description |
|---|---|
| Cover Type Classification | Building a deep learning model using TensorFlow to predict the forest cover type from different cartographic variables. |
- Python
- Jupyter Notebook
- scikit-learn
- TensorFlow / Keras
- PyTorch
- Pandas & NumPy
- Matplotlib & Seaborn
- Clone the repository:
git clone https://github.com/ryantusi/AI-ML-Engineering.git
- Install dependencies: It's highly recommended to use a virtual environment.
pip install -r requirements.txt
- Navigate to any folder and open the files (e.g.,
.ipynborscript.py) to run the projects.
- Codecademy for the comprehensive certification curriculum and project inspiration.
- The open-source community for the powerful libraries that make these projects possible.
For any questions, suggestions, or collaborations, please feel free to reach out:


