Skip to content

46 projects on a full spectrum of Advanced Data Science, AI, Machine Learning, Deep Learning skills, including EDA, Data Visualization, traditional ML Fundamentals (Regression, Classification, Clustering, Ensemble methods) using TensorFlow/Keras, PyTorch, Scikit-Learn, Pandas, NumPy, & more, implemented in Python scripts & Jupyter Notebooks.k

Notifications You must be signed in to change notification settings

ryantusi/AI-ML-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

28 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– AI/ML Engineering Portfolio

Overview

This repository serves as a comprehensive portfolio for Machine Learning (ML) and Deep Learning (DL) projects, completed as part of the Codecademy AI/ML Engineering certification. It encapsulates a wide array of fundamental and advanced concepts, from traditional supervised and unsupervised learning to modern neural network architectures.

The projects are implemented primarily in Python using Jupyter Notebooks, leveraging popular libraries such as scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch. Each folder contains project code and analysis relevant to the topic.


๐Ÿ† Codecademy Certification

This repository documents the comprehensive portfolio of projects completed as part of the Codecademy AI/ML Engineering + Data Science: Machine Learning Specialist Certification. The work within these folders demonstrates proficiency in:

  • Machine Learning Fundamentals: Supervised (Regression and Classification), Unsupervised (Clustering), and Ensemble methods.
  • Deep Learning: Implementing neural networks using TensorFlow/Keras and PyTorch.
  • Data Science Workflow: Exploratory Data Analysis (EDA), Data Visualization, Feature Engineering, and Model Selection/Tuning.
  • Core Libraries: Extensive use of scikit-learn, pandas, NumPy, Matplotlib, TensorFlow, and PyTorch.

The structured organization reflects the curriculum's progression, moving from foundational statistics and data visualization through traditional machine learning algorithms to advanced deep learning architectures and deployment concepts.


๐Ÿ“‚ Repository Structure & Project Details

Boosting_Ensemble

Project Description
Boosting Predict whether or not a person makes more than $50,000 using census data, demonstrating the power of sequential ensemble methods.

Classification_Project

A collection of diverse classification tasks:

Project Algorithm Description
Classifying Tweets Naive Bayes Classifier Uses a Naive Bayes Classifier to find patterns in real tweets and predict their origin (New York, London, or Paris).
Classifying Viral Tweets K-Nearest Neighbor (KNN) Employs the K-Nearest Neighbor algorithm to predict whether a tweet will go viral based on its features.

Data_Science

End-to-end data analysis and model building projects:

Project Description
Bio Diversity Analyze biodiversity data from the National Parks Service, focusing on various species observed across different national park locations.
OkCupid Comprehensive project involving scoping, data preparation, analysis, and building a machine learning model using data from the OKCupid online dating application.

Data_Visualization

Projects focused on creating informative and clear visualizations for data exploration:

Project Type Focus
Categorical Data EDA Visual exploration of Mushroom datasets.
EDA EDA Airline Analysis data investigation.
Line Graph Time Series Tracking Online Lime Sales over time.
Portfolio Project Multiple Visuals Analyzing the relationship between Life Expectancy and GDP.

Decision_Trees

Project Description
Find the flag! Use Decision Trees to predict the continent of flags based on various features (colors, shapes, etc.), and explore feature importance.

Deep_Learning_TensorFlow

Advanced projects using TensorFlow/Keras for Deep Learning tasks:

Type Project Description
Classification Galaxies Classifying different types of Galaxies using Convolutional Neural Networks (CNNs).
Classification Heart Failure Predict the survival of patients with heart failure.
Classification X-Rays Analyzing Lung Scans (X-Rays) to predict pneumonia, Covid-19, or no illness.
Regression Chances of Admission Predicting a student's chances of admission to a university.

Exploratory_Data_Analysis

Detailed projects on initial data investigation and cleaning:

Project Focus Dataset
Diabetes Analyzing health and risk factors associated with diabetes.
NBA Trends Investigating trends and statistics within the National Basketball Association.
Stackoverflow Exploring developer survey data from Stack Overflow.
Students Analyzing student performance and demographic data.

Feature_Engineering

Projects focused on transforming raw data into features that best represent the underlying problem:

Method Project Description
Filter Method Customer Reviews Applying filter methods (e.g., statistical tests) to select relevant features from a dataset of customer reviews on a clothing brand.
Wrapper Method Obesity on lifestyle Implementing wrapper methods (e.g., Recursive Feature Elimination) to determine the best subset of lifestyle factors for predicting obesity.

Hyperparameter_Tuning

Project Description
Classify Raisins Classifying different types of raisins (Kecimen and Besni) by implementing and comparing two tuning techniques: Grid Search for a Decision Tree Classifier and Random Search for a Logistic Regression Classifier.

K_Means_Clustering

Project Algorithm Description
Handwriting Recognition K-Means Clustering Using the unsupervised K-Means algorithm to cluster and recognize patterns in handwriting data.

K_Nearest_Neighbors

Project Algorithm Description
Breast Cancer Classifier K-Nearest Neighbor (KNN) Building a model to classify and predict the diagnosis of breast cancer based on medical features.

Linear_Regression

Projects demonstrating the fundamental Linear Regression model:

Implementation Description
Scratch Implementation of Traditional Linear Regression from scratch, providing a deep understanding of the underlying mathematics.
Sklearn Utilizing the scikit-learn library for efficient implementation of Linear Regression.

Logistic_Regression

Projects on binary and multi-class classification using Logistic Regression:

Project Description
Credit Card Fraud Building a Logistic Regression model to detect and classify instances of credit card fraud.
Income Classification Classifying individuals based on demographic data to predict their income bracket (e.g., $50K+).

ML_Pipeline

Project Description
Classification Model Creating a complete Machine Learning Pipeline to build a classification model for diagnosing hematologic diseases in pediatric patients.

Multiple_Linear_Regression

Projects extending Linear Regression to multiple predictor variables:

Project Description
Tennis Ace Predicting the outcome (e.g., score, ranking) for a tennis player based on multiple playing habits and statistics.
Yelp Regression Investigating factors that most affect a restaurant's Yelp rating and building a model to predict the rating.

Naive_Bayes_Classifier

Project Algorithm Description
Email Similarity Implementing the Naive Bayes Classifier to measure and classify email similarity based on content.

Neural_Networks

Project Description
Life Expectancy Using TensorFlow/Keras to build a Neural Network model to predict the life expectancy of countries based on socio-economic and health factors.

Perceptrons

Project Description
Logic Gates Modeling the fundamental building blocks of computersโ€”logic gates (AND, OR, and XOR)โ€”using simple Perceptrons.

Principal_Component_Analysis

File Description
script_1.py Classification task using PCA on the Telescope dataset to classify particles into gamma (signal) or hadrons (background).
script_2.py Standalone implementation of the PCA algorithm for dimensionality reduction.

PyTorch ๐ŸŒŸ

Projects leveraging the PyTorch deep learning framework:

Project Description
EV_Charging Using Neural Networks built in PyTorch for predicting Residential EV Charging Loads.
Hotel_Cancellation Building a PyTorch model for predicting Hotel Booking Cancellations.

Random_Forests

Project Description
Census Data Using the Random Forest ensemble method to predict whether or not a person makes more than $50,000 using census data.

Recommender_System

Project Description
Book Recommender System Building a system that suggests books to users based on collaborative filtering or content-based methods.

Regularization

Project Description
Predict Wine Quality Applying Regularization techniques (L1/L2) to a regression model to improve generalization and predict Wine Quality.

Statistics

Foundational projects covering core statistical concepts for data science:

Category Project Focus
Hypothesis Testing Blood Transfusion, Famburg, Fetchmaker, Heart Disease Implementing statistical tests to analyze data and draw conclusions in various contexts.
Probability Product Defects Calculating and analyzing probabilities related to manufacturing defects.
Sampling Dance Party Exploring different sampling techniques in the context of event data.

Support_Vector_Machines

Project Description
Baseball Strike Zones Using Support Vector Machines (SVMs) to classify and predict Baseball Strike Zones.

Tensorflow_Portfolio

Project Description
Cover Type Classification Building a deep learning model using TensorFlow to predict the forest cover type from different cartographic variables.

๐Ÿ› ๏ธ Technologies Used

  • Python
  • Jupyter Notebook
  • scikit-learn
  • TensorFlow / Keras
  • PyTorch
  • Pandas & NumPy
  • Matplotlib & Seaborn

๐Ÿš€ Getting Started

  1. Clone the repository:
    git clone https://github.com/ryantusi/AI-ML-Engineering.git
  2. Install dependencies: It's highly recommended to use a virtual environment.
    pip install -r requirements.txt
  3. Navigate to any folder and open the files (e.g., .ipynb or script.py) to run the projects.

๐Ÿ›‘ Conclusion

Acknowledgments

  • Codecademy for the comprehensive certification curriculum and project inspiration.
  • The open-source community for the powerful libraries that make these projects possible.

Contact

For any questions, suggestions, or collaborations, please feel free to reach out:

About

46 projects on a full spectrum of Advanced Data Science, AI, Machine Learning, Deep Learning skills, including EDA, Data Visualization, traditional ML Fundamentals (Regression, Classification, Clustering, Ensemble methods) using TensorFlow/Keras, PyTorch, Scikit-Learn, Pandas, NumPy, & more, implemented in Python scripts & Jupyter Notebooks.k

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published