Projects

Multiple Projects for Data Analysis in various industries(Retail, e-Commerce, Real Estate, Manufacturing, Transportation, etc)

Twitch API Analysis(Python):

Understood project manager's requirement for finding important developer/API endpoints
Cleaned and extracted features of daily_logs from Nov 2017 to Feb 2018, , mapped with application_metadata
Analyzed and visualized daily_log for trends, correlationships between features

Zillow House Price Prediction(R/Python):

Exploratory data analysis; Data Mining; Imputation of Missing Data; Feature Selection& Generation;
Built machine learning Models(Linear Regression, Random Foest, XGBoost) to predict next season Zillow Price; Tuning Parameters and Modifying models.
R Shiny for visualization of transaction and geometric data(How properties and their price vary from city to city in CA?).

Instacart Online Grocery Store Customer Reorder Prediction(MySQL, R, Python)

Built relational-database and ERD to clarify relationships between customers, retailers and products, normalized raw data( MySQL)
Including EDA, Customer Segmentation(demographic, historical purchase behavior analysis, product-based segments)
Query in MySQL, visualization in Tableau to provide insights and recommendations to Instacart team

Implementation of Mixed base-learner Adaboost, modified by Genetic Algorithm(R)

Self-written mixed weak learner of Adaboost with feature selection using Genetic Algorithm on Real-world Binary Classification problems:

Select 4 base weak learners among 12 learners by grid search, trials and evaluations
Implemented AdaBoost algorithm with updating weights of both training dataset and learners(respectively) in each iteration
Applied GA in weak learner combination selection, tuning parameters such as crossover rate, mutation rate, elicit status, etc. to optimize the final model performance.
Reduced overall model complexity by 75%, without decreasing model preformance while increase model's interpretability
Conduct parameter tuning and feature engineering, increased 6% of prediction accuracy

Uber Rider Behavior Analysis(Python)

Data cleaning, extraction, EDA of NYC uber rider/driver behavior
Time series analysis, feature engineering
Setting 'Churn label' based on different requirements
Built and modified rider churn prediction models (Logistic Regression, Random Forest) using Sklearn
Preformed Cost Benefit Analysis of methods in new user acquisition and potential churning user retention

Real Estate Estimation of best investment area in NYC(Python)

Conducted data mining, feature extraction on Zillow historical house price estimation and Airbnb short term rent price, based on ad-hoc business target: Find the best investment area for short-term leasing
Preformed data munging on particular features with multiple units in different datasets, wrote functions to link data together in a scalable way to allow new data append
Created specific metadata and metrics, such as Cap Rate/Occupancy Rate to refine and better understand business goal
Successfully target best investment area in NYC based on defined metrics and trend prediction

Water Usage Capcity Analysis and Prediction(SAS)

Including feature generation/selection, transformation(boxcox, );
feature selection using various techniques/criteria: C_p, stepwise) in building Linear Regression Model
Checking assumptions, giving diagnostics using metrics such as studentized residuals, Cook's D, hat matrix diagonals, toleance, VIF, etc.
Making predictions based on the selected model.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Instacart		Instacart
Mixed Adaboost optimized with Genetic Algorithm		Mixed Adaboost optimized with Genetic Algorithm
NYC_best_investment_area		NYC_best_investment_area
Twitch		Twitch
Uber_Churn_prediction		Uber_Churn_prediction
Water Usage Analysis and Prediction		Water Usage Analysis and Prediction
Zillow		Zillow
A4.Rmd		A4.Rmd
NY Taxi_Tip_EDA_Prediction.ipynb		NY Taxi_Tip_EDA_Prediction.ipynb
README.md		README.md
_config.yml		_config.yml
production Capacity Adjustment		production Capacity Adjustment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Projects

Twitch API Analysis(Python):

Zillow House Price Prediction(R/Python):

Instacart Online Grocery Store Customer Reorder Prediction(MySQL, R, Python)

Implementation of Mixed base-learner Adaboost, modified by Genetic Algorithm(R)

Uber Rider Behavior Analysis(Python)

Real Estate Estimation of best investment area in NYC(Python)

Water Usage Capcity Analysis and Prediction(SAS)

About

Releases

Packages

Languages

joyceft/Projects

Folders and files

Latest commit

History

Repository files navigation

Projects

Twitch API Analysis(Python):

Zillow House Price Prediction(R/Python):

Instacart Online Grocery Store Customer Reorder Prediction(MySQL, R, Python)

Implementation of Mixed base-learner Adaboost, modified by Genetic Algorithm(R)

Uber Rider Behavior Analysis(Python)

Real Estate Estimation of best investment area in NYC(Python)

Water Usage Capcity Analysis and Prediction(SAS)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages