2023-11-06
- Fangyi Chen, fc2718
- Jiying Wang, jw4489
- Yumeng Qi, yq2378
- Si Chen, sc5126
- Yuxin Yin, yy3439
Weather-Induced Flight Delays: An Analysis of Patterns and Predictive Modeling
The airline industry is highly dependent on timely flight operations, and flight delays are a common concern for both passengers and airlines. Weather conditions often play a big role in causing delays. In this project, we aim to investigate the relationship between weather and flight delays using historical flight data from 2013. Specifically, we would assess which pattern might cause the longest delay time. By understanding the data patterns, we would build a prediction model to predict flight delays based on variables including but not limited to carriers, airports, weather, etc. We would also assess if the insights gained from 2013 data can be applied to 2023, which is ten years later.
- A website comprising the following pages:
- Homepage: Provides a brief introduction to the project.
- Exploratory Data Analysis Page: Offers in-depth exploration of the project’s data.
- Flight Delay Prediction Model Page: Presents details about the developed flight delay prediction model.
- Conclusions Page: Summarizes key findings and conclusions of the project.
- Interactive Dashboard Page: Utilizing R-Shiny, this page hosts an interactive dashboard for dynamic data exploration.
- A comprehensive report: A well-organized document ensuring clear and concise presentation of our project, covering all aspects from introduction to conclusions.
- A brief screencast: A concise video presentation showcasing the project’s key features and highlights.
The study intends to use two data sets. The first one on-time data for all flights that departed NYC in 2013. The second one is hourly meteorological data for LGA, JFK and EWR.
-
flights: RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
-
weather: ASOS download from Iowa Environmental Mesonet, https://mesonet.agron.iastate.edu/request/download.phtml.
We plan to perform preliminary descriptive analysis of the data, which entails the summarization of some basic characteristics (size, attributes), data distributions (mean, range, standard deviation) and number of missing values. The outcomes of the preliminary analysis will provide guidance for the future steps of data preprocessing and cleaning. Another analysis will be focused on the correlation between weather conditions and flight delays in minutes. We are also interested to investigate if there exists any disparity of flight delays among diverse airlines given the identical weather conditions. Additional analysis will be centered around the predictive model outcomes, where evaluation metrics include root mean square error (RMSE) and R squared. Performance variability across different temporal ranges (from earliest timepoint to the most recent data) will also be examined.
For data exploration, we plan to plot the distribution of important
attributes such as dep_delay
, arr_delay
, temp, dewep
, humid
,
wind_speed
, pressure
, visib
using box plot and histogram. Then we
will visualize the relationship between flight delays and
weather-related variables using scatter plot and fitted lines. We may
also visualize the relationship among weather variables as well as the
delay disparity among airlines. Finally, we will visualize our model and
predicted outcomes.
Given the original dataset flights
we will employ is a huge dataset
with 336,776 entries, one of the biggest coding challenges we will
encounter is how to, without bias, get the dataset down to a size that
is suitable for use but still representative. In addition, since we
would join two datasets that share similar variables, we might have to
deal with the many-to-many relationship when joining the datasets.
Milestone | Description | Expected Date |
---|---|---|
Project Logistics Setup | 1. Establish the general objective and scope of the project. 2. Pinpoint the required data sources | 11/10/2023 |
Project Review and Plan Outline | 1. Meet with the teaching team to discuss the project proposal. 2. Anticipate difficulties and develop a contingency plan. 3. Identify major tasks and assign tasks to group members. | 11/14/2023 |
Plan Execution | The key components of the projects are anticipated to be completed: data cleaning and preprocessing, visualizations, model training and deployment, outcome evaluation, and any statistical or follow-up analysis | 11/27/2023 |
Report Drafting and Compiling | 1. Collect and finalize written parts of the project including but not limited to the following: objective, motivation, EDA, analysis, and discussion. 2. Revise and edit to ensure coherence and clarity | 12/8/2023 |
Webpage and Screencast | 1. Set up and publish the web page. 2. Video demo | 12/9/2023 |
Peer Assessment | 1. Brief assessment of team members. 2. Contribution summary | 12/9/2023 |
In-class Discussion | Learning about projects presented by other groups | 12/14/2023 |