🤖 AI/ML Engineering Portfolio

Overview

This repository serves as a comprehensive portfolio for Machine Learning (ML) and Deep Learning (DL) projects, completed as part of the Codecademy AI/ML Engineering certification. It encapsulates a wide array of fundamental and advanced concepts, from traditional supervised and unsupervised learning to modern neural network architectures.

The projects are implemented primarily in Python using Jupyter Notebooks, leveraging popular libraries such as scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch. Each folder contains project code and analysis relevant to the topic.

🏆 Codecademy Certification

This repository documents the comprehensive portfolio of projects completed as part of the Codecademy AI/ML Engineering + Data Science: Machine Learning Specialist Certification. The work within these folders demonstrates proficiency in:

Machine Learning Fundamentals: Supervised (Regression and Classification), Unsupervised (Clustering), and Ensemble methods.
Deep Learning: Implementing neural networks using TensorFlow/Keras and PyTorch.
Data Science Workflow: Exploratory Data Analysis (EDA), Data Visualization, Feature Engineering, and Model Selection/Tuning.
Core Libraries: Extensive use of scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch.

The structured organization reflects the curriculum's progression, moving from foundational statistics and data visualization through traditional machine learning algorithms to advanced deep learning architectures and deployment concepts.

📂 Repository Structure & Project Details

Boosting_Ensemble

Project	Description
Boosting	Predict whether or not a person makes more than $50,000 using census data, demonstrating the power of sequential ensemble methods.

Classification_Project

A collection of diverse classification tasks:

Project	Algorithm	Description
Classifying Tweets	Naive Bayes Classifier	Uses a Naive Bayes Classifier to find patterns in real tweets and predict their origin (New York, London, or Paris).
Classifying Viral Tweets	K-Nearest Neighbor (KNN)	Employs the K-Nearest Neighbor algorithm to predict whether a tweet will go viral based on its features.

Data_Science

End-to-end data analysis and model building projects:

Project	Description
Bio Diversity	Analyze biodiversity data from the National Parks Service, focusing on various species observed across different national park locations.
OkCupid	Comprehensive project involving scoping, data preparation, analysis, and building a machine learning model using data from the OKCupid online dating application.

Data_Visualization

Projects focused on creating informative and clear visualizations for data exploration:

Project	Type	Focus
Categorical Data	EDA	Visual exploration of Mushroom datasets.
EDA	EDA	Airline Analysis data investigation.
Line Graph	Time Series	Tracking Online Lime Sales over time.
Portfolio Project	Multiple Visuals	Analyzing the relationship between Life Expectancy and GDP.

Decision_Trees

Project	Description
Find the flag!	Use Decision Trees to predict the continent of flags based on various features (colors, shapes, etc.), and explore feature importance.

Deep_Learning_TensorFlow

Advanced projects using TensorFlow/Keras for Deep Learning tasks:

Type	Project	Description
Classification	Galaxies	Classifying different types of Galaxies using Convolutional Neural Networks (CNNs).
Classification	Heart Failure	Predict the survival of patients with heart failure.
Classification	X-Rays	Analyzing Lung Scans (X-Rays) to predict pneumonia, Covid-19, or no illness.
Regression	Chances of Admission	Predicting a student's chances of admission to a university.

Exploratory_Data_Analysis

Detailed projects on initial data investigation and cleaning:

Project	Focus Dataset
Diabetes	Analyzing health and risk factors associated with diabetes.
NBA Trends	Investigating trends and statistics within the National Basketball Association.
Stackoverflow	Exploring developer survey data from Stack Overflow.
Students	Analyzing student performance and demographic data.

Feature_Engineering

Projects focused on transforming raw data into features that best represent the underlying problem:

Method	Project	Description
Filter Method	Customer Reviews	Applying filter methods (e.g., statistical tests) to select relevant features from a dataset of customer reviews on a clothing brand.
Wrapper Method	Obesity on lifestyle	Implementing wrapper methods (e.g., Recursive Feature Elimination) to determine the best subset of lifestyle factors for predicting obesity.

Hyperparameter_Tuning

Project	Description
Classify Raisins	Classifying different types of raisins (Kecimen and Besni) by implementing and comparing two tuning techniques: Grid Search for a Decision Tree Classifier and Random Search for a Logistic Regression Classifier.

K_Means_Clustering

Project	Algorithm	Description
Handwriting Recognition	K-Means Clustering	Using the unsupervised K-Means algorithm to cluster and recognize patterns in handwriting data.

K_Nearest_Neighbors

Project	Algorithm	Description
Breast Cancer Classifier	K-Nearest Neighbor (KNN)	Building a model to classify and predict the diagnosis of breast cancer based on medical features.

Linear_Regression

Projects demonstrating the fundamental Linear Regression model:

Implementation	Description
Scratch	Implementation of Traditional Linear Regression from scratch, providing a deep understanding of the underlying mathematics.
Sklearn	Utilizing the `scikit-learn` library for efficient implementation of Linear Regression.

Logistic_Regression

Projects on binary and multi-class classification using Logistic Regression:

Project	Description
Credit Card Fraud	Building a Logistic Regression model to detect and classify instances of credit card fraud.
Income Classification	Classifying individuals based on demographic data to predict their income bracket (e.g., $50K+).

ML_Pipeline

Project	Description
Classification Model	Creating a complete Machine Learning Pipeline to build a classification model for diagnosing hematologic diseases in pediatric patients.

Multiple_Linear_Regression

Projects extending Linear Regression to multiple predictor variables:

Project	Description
Tennis Ace	Predicting the outcome (e.g., score, ranking) for a tennis player based on multiple playing habits and statistics.
Yelp Regression	Investigating factors that most affect a restaurant's Yelp rating and building a model to predict the rating.

Naive_Bayes_Classifier

Project	Algorithm	Description
Email Similarity	Implementing the Naive Bayes Classifier to measure and classify email similarity based on content.

Neural_Networks

Project	Description
Life Expectancy	Using TensorFlow/Keras to build a Neural Network model to predict the life expectancy of countries based on socio-economic and health factors.

Perceptrons

Project	Description
Logic Gates	Modeling the fundamental building blocks of computers—logic gates (AND, OR, and XOR)—using simple Perceptrons.

Principal_Component_Analysis

File	Description
script_1.py	Classification task using PCA on the Telescope dataset to classify particles into gamma (signal) or hadrons (background).
script_2.py	Standalone implementation of the PCA algorithm for dimensionality reduction.

PyTorch 🌟

Projects leveraging the PyTorch deep learning framework:

Project	Description
EV_Charging	Using Neural Networks built in PyTorch for predicting Residential EV Charging Loads.
Hotel_Cancellation	Building a PyTorch model for predicting Hotel Booking Cancellations.

Random_Forests

Project	Description
Census Data	Using the Random Forest ensemble method to predict whether or not a person makes more than $50,000 using census data.

Recommender_System

Project	Description
Book Recommender System	Building a system that suggests books to users based on collaborative filtering or content-based methods.

Regularization

Project	Description
Predict Wine Quality	Applying Regularization techniques (L1/L2) to a regression model to improve generalization and predict Wine Quality.

Statistics

Foundational projects covering core statistical concepts for data science:

Category	Project	Focus
Hypothesis Testing	Blood Transfusion, Famburg, Fetchmaker, Heart Disease	Implementing statistical tests to analyze data and draw conclusions in various contexts.
Probability	Product Defects	Calculating and analyzing probabilities related to manufacturing defects.
Sampling	Dance Party	Exploring different sampling techniques in the context of event data.

Support_Vector_Machines

Project	Description
Baseball Strike Zones	Using Support Vector Machines (SVMs) to classify and predict Baseball Strike Zones.

Tensorflow_Portfolio

Project	Description
Cover Type Classification	Building a deep learning model using TensorFlow to predict the forest cover type from different cartographic variables.

🛠️ Technologies Used

Python
Jupyter Notebook
scikit-learn
TensorFlow / Keras
PyTorch
Pandas & NumPy
Matplotlib & Seaborn

🚀 Getting Started

Clone the repository:

git clone https://github.com/ryantusi/AI-ML-Engineering.git

Install dependencies: It's highly recommended to use a virtual environment.
```
pip install -r requirements.txt
```
Navigate to any folder and open the files (e.g., .ipynb or script.py) to run the projects.

🛑 Conclusion

Acknowledgments

Codecademy for the comprehensive certification curriculum and project inspiration.
The open-source community for the powerful libraries that make these projects possible.

Contact

For any questions, suggestions, or collaborations, please feel free to reach out:

Ryan Tusi – LinkedIn
Portfolio - Website

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Boosting_Ensemble		Boosting_Ensemble
Classification_Project/twitter_classification_project		Classification_Project/twitter_classification_project
Data_Science		Data_Science
Data_Visualization		Data_Visualization
Decision_Trees		Decision_Trees
Deep_Learning_TensorFlow		Deep_Learning_TensorFlow
Exploratory_Data_Analysis		Exploratory_Data_Analysis
Feature_Engineering		Feature_Engineering
Hyperparameter_Tuning		Hyperparameter_Tuning
K_Means_Clustering		K_Means_Clustering
K_Nearest_Neighbors		K_Nearest_Neighbors
Linear_Regression		Linear_Regression
Logistic_Regression		Logistic_Regression
ML_Pipeline		ML_Pipeline
Multiple_Linear_Regression		Multiple_Linear_Regression
Naive_Bayes_Classifier		Naive_Bayes_Classifier
Neural_Networks		Neural_Networks
Perceptrons		Perceptrons
Principal_Component_Analysis		Principal_Component_Analysis
PyTorch		PyTorch
Random_Forests		Random_Forests
Recommender_System		Recommender_System
Regularization		Regularization
Statistics		Statistics
Support_Vector_Machines		Support_Vector_Machines
Tensorflow_Portfolio		Tensorflow_Portfolio
certificates		certificates
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

ryantusi/AI-ML-Engineering

Folders and files

Latest commit

History

Repository files navigation