Welcome to the MAST30034 Python Repo

The R stream is available here.

Dates and Times

On Campus:

Online:

The first few tutorials will have content, with the remainder of the semester treated as consultations or additional tutorials as outlined:

Introduction and Project 1 Overview:
- Using the JupyterHub server
- Using GitHub Desktop vs Git CLI (Command Line Interface)
- Project 1 Overview
- Python Revision
- Introduction to folium and bokeh
- Data Serialization
- Downloading Files using Python
- Advanced: WSL2 Installation + PySpark Installation
Geospatial Visualization and Analysis:
- Map Clusters, GIS Heatmaps, HexBins (vs SquareBins), Choropleths.
- Using and installing geopandas.
- Descriptive statistics
- Histograms and Binning
- Advanced: PySpark
Regression and Discussion:
- Linear Regression
- AIC vs MSE vs R-Squared
- Stepwise Selection (backwards and forward using AIC)
- Penalized Regression (LASSO and Ridge)
- Generalized Linear Model example (Poisson for count data)
- Advanced: PySpark + Spark SQL
Machine Learning and Working as a Team:
- Discussion: Overfitting, Curse of Dimensionality, Feature Engineering, etc.
- Dimensionality Reduction
- Agile Methodology + Standups
- Advanced: PySpark + Spark SQL
Project 2 Overview
- Introduction of themes
- Getting into teams
- Assessment Overview

Attendance is mandatory. Groups are excused one absence only.
The last 2 weeks of tutorials will be Presentations, all groups must attend a designated tutorial.
The remainder of tutorials will act as checkpoints, consultation, and a chance for your group to conduct standups at a fixed time slot.

Statistical Modeling / Machine Learning:

Data Engineering / End-to-End Pipelines:

Visualizations: