Skip to content

This repository covers essential tools for big data analytics with python, including but not limited to pandas, pandasql, spark, AWS EMR cluster, sklearn, spark ML, and MxNet for deep learning.

Notifications You must be signed in to change notification settings

karenyxwang/Big_Data_Analytics

Repository files navigation

Big_Data_Analytics

This repository covers all project materials of University of Pennsylvania CIS 545 Big Data Analytics.

Project 1 - Data Wrangling.ipynb: use Pandas for data manipulation and analysis.

Project 2 - Database Manipulation.ipynb: use pandasql and spark for graph data and traversing relationships.

Project 3 - Spark SQL.ipynb: use Spark with an EMR cluster to manipulate LinkedIn and stock data.

Project 4 - Machine Leaning.ipynb: use sklearn for machine learning and Spark ML for scalable machine learning.

Project 5 - Deep Learning.ipynb: use MxNet to build neural networks for image recognition.

About

This repository covers essential tools for big data analytics with python, including but not limited to pandas, pandasql, spark, AWS EMR cluster, sklearn, spark ML, and MxNet for deep learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published