Skip to content

itsayaanpatel/FedAviationDataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

FedAviationDataAnalysis

Question 1 — The DS Problem

In your own words, formulate the Data Science problem that you were given.

Question 2 — Clean the data (10 pts)

Load the data using pandas and inspect it.

Perform the initial inspection of the data, its shape, types, etc. Evaluate the dataset and perform at least three type of data preparation and justify the approach that is taken to prepare the data for analysis. Data prep can include, but is not limited to: handling missing values, data types, duplicates, etc. You will need to ensure that your data preparation addressed issues in at least 7 fields in the data. Prepare meaningful* summary statistics for 3 continuous variables and 3 categorical variables. Note: meaningful summary statistics explains the statistical summary of relevant fields in a coherent manner.

Question 3 — Perform Exploratory Data Analysis (40 pts)

Perform an in-depth analysis by creating visualizations to understand the distribution and relationships within the data, while ensuring that your analysis and explanation of the results tell a story. It's important to select appropriate visualizations based on the type of data. Explain all results.

Analyze the proportion of incidents that occur each year. Evaluate the distribution of bird strikes in each state and display the top 20 states. What is the most common bird species that are involved in bird strikes. Perform a comparative analysis of the annual aircraft damage for each phase of flight. Evaluate the results over the last 10 years to determine what patterns exist. Note: this requires a year over year analysis. Analyze the frequency of bird strike incidents that occur on a monthly basis over the last 10 years. Analyze the results to determine when these incidents are more prevalent. This question requires you to drill deeper into the analysis to determine if the distribution has changed over the last 10 years. Note: this requires a year over year analysis. Perform ONE (1) additional analysis on the distribution. Ensure that you either : 1) perform a comparative analysis by evaluating 3 or more groups within the data or 2) perform a temporal analysis of the patterns over 10 or more years (this requires a year over year analysis of the patterns). Do not repeat any of the previous analysis.

Question 4 — Hypothesis Testing (30 points)

Perform pairwise analysis of select features and evaluate the significance of the pattern or trend. A suitable value for alpha is 5%. Explain all results.

Create a scatterplot that shows the relationship between aircraft height and speed. Evaluate the correlation, the strength and the significance of the results. Visualize the distribution of the aircraft speed during: 1) the approach phase of flight and 2) the landing roll phase of flight. Perform a 2 sample t-test and evaluate if there is a statistical difference between the speed during these two flight phases. Tip: if the data is skewed, you will need to address this prior to the statistical analysis. Create a visualization of the aircraft damage grouped by phase of flight. Evaluate if the results are statistically significant. Ensure that you use the appropriate test. Perform ONE (1) additional statistical test. Explain what you are testing and the reason this information is useful. Visualize the data, state the hypothesis and explain if it is statistically significant.

Question 5 — Statistical Modeling (20 points)

Explore a simple linear relationship* within the data and create a model to predict the occurrence of bird strikes in a given FAA region. Ensure that you explain all results.

Extract all data for the AWP FAA Region and use a line chart to visualize the frequency of incidents between 1990 and 2015 inclusive. Using the extracted data above, create a simple linear regression model and predicts the incidents for 2016 and 2017. Display the r-squared and what this indicates about your model's ability to fit the data. Visualize the dataset from (1) and include the predicted results for 2016 and 2017. Compare and contrast the predictions for 2016 and 2017 with the actual values. Perform predictions for 2018 through to 2020. Explain what transpired when your model was used to predict this time period. Comment on the reliability of your model. *Tip: It is important to remember that you are creating a simple linear regression model, which is limited in features and does not include information about all factors that influence the frequency of bird strikes. Keep this in mind while framing your response and explanations.

Question 6 — Summary

Summarize the overall analysis for questions 1 through 5 and share THREE (3) key findings.

Question 7 — OPTIONAL (10 points)

This optional bonus question requires that you perform a geospatial analysis on bird strikes and visualize the results using a map. Examples include (but you are not limited to):

Mapping bird strike incidents: visualize the locations of bird strike incidents to identify hot spots or patterns Heat maps: create heat maps to show the density of bird strike incidents in different areas. Proximity Analysis: analyze the proximity of bird strikes to airports or landmarks etc Ensure that you clearly indicate the analysis that you are performing and provide detailed steps to explain ALL findings from your analysis.

Note: a thorough analysis that demonstrates multiple patterns is required to earn full credit for this question.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published