- Author: Akira Takihara Wang and Calvin Huang
- Tutorial and Tools Up-to-Date as of: July 2021
- Usage: For MAST30034 students only
The R stream is available here.
On Campus:
- Monday 13:15 - 15:15 (R - Yue)
- Tuesday: 14:15 - 16:15 (Python - Akira)
- Wednesday: 11:00 - 13:00 (Python - Akira)
- Thursday: 10:00 - 12:00 (Python - Calvin)
Online:
- Tuesday: 16:15 - 18:15 (R - Yue)
- Wednesday: 14:15 - 16:15 (Python - Calvin)
- Thursday: 13:00 - 15:00 (Python - Akira), 15:15 - 17:15 (Python - Akira)
The first few tutorials will have content, with the remainder of the semester treated as consultations or additional tutorials as outlined:
-
Introduction and Project 1 Overview:
- Using the JupyterHub server
- Using GitHub Desktop vs Git CLI (Command Line Interface)
- Project 1 Overview
- Python Revision
- Introduction to
folium
andbokeh
- Data Serialization
- Downloading Files using Python
- Advanced:
WSL2
Installation + PySpark Installation
-
Geospatial Visualization and Analysis:
- Map Clusters, GIS Heatmaps, HexBins (vs SquareBins), Choropleths.
- Using and installing
geopandas
. - Descriptive statistics
- Histograms and Binning
- Advanced:
PySpark
-
Regression and Discussion:
- Linear Regression
- AIC vs MSE vs R-Squared
- Stepwise Selection (backwards and forward using AIC)
- Penalized Regression (LASSO and Ridge)
- Generalized Linear Model example (Poisson for count data)
- Advanced:
PySpark
+Spark SQL
-
Machine Learning and Working as a Team:
- Discussion: Overfitting, Curse of Dimensionality, Feature Engineering, etc.
- Dimensionality Reduction
- Agile Methodology + Standups
- Advanced:
PySpark
+Spark SQL
-
Project 2 Overview
- Introduction of themes
- Getting into teams
- Assessment Overview
- Attendance is mandatory. Groups are excused one absence only.
- The last 2 weeks of tutorials will be Presentations, all groups must attend a designated tutorial.
- The remainder of tutorials will act as checkpoints, consultation, and a chance for your group to conduct standups at a fixed time slot.
Statistical Modeling / Machine Learning:
sklearn
,statsmodels
Data Engineering / End-to-End Pipelines:
Pandas
,PySpark
,NumPy
,GeoPandas
,papermill
,re (regex)
Visualizations:
Plotly
,Folium
,Bokeh
,seaborn
,matplotlib