Final project of CEBD 1260 - Big Data Analysis By: Arwa Sheraky & Tiffany Eversley
This 2015 dataset summarizes US airline flight delay and cancellation information as collected and published by the DOT's Bureau of Transportation Statistic.
Drawing airport and airline information from two additional datasets helped expand the original source file by pulling from, and merging , relevant attributes. The dataset is now characterized by 28 representative features and includes over a million instances. Features include airport origin, time of the flight (YMD), actual and scheduled departure times, arrival times, flight number, as well as cancellation and delay reason.
-
Code (Jupyter notebooks of the following):
- Data Cleaning and Exploration.
- Supervised Learning (Classification - Regression).
- Unsupervised Learning (Clustering).
-
Docs:
- Data Blogpost.
- Data Story.
-
Data (not uploaded due to large size):
- Flights, Airlines and Airports.
- US 2015 Weather.
- Cleaned Merged Data(Ready for prediction).
-
Presentation:
CEBD final presentation.pdf
.
Predict the average expected delay for a flight, according to specified features.
Gradient Boosting Regressor(100).
The model predicted average delays, with RMSE = 20.3
.
A simple UI application was implemented to predict average delay for a flight, according to some required inputs from user. The application can be downloaded from the repository here.