Skip to content

Flight Tragedy Analysis is a comprehensive data analysis project focused on examining aviation accidents and incidents from 1905 to 2009. This project provides users with valuable insights into historical plane crashes and their associated data.

License

Notifications You must be signed in to change notification settings

Hanan-Nawaz/FlightTragedyAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flight Tragedy Analysis

Overview

Flight Tragedy Analysis is a comprehensive data analysis project focused on examining aviation accidents and incidents from 1905 to 2009. This project provides users with valuable insights into historical plane crashes and their associated data.

Core Features

  1. Extensive Data Repository: The project houses a rich dataset containing detailed information about plane crashes, including date, time, location, operator, flight number, route, aircraft type, registration, casualties, and more.

  2. Data Transformation and Storage: The data is meticulously transformed and stored in a structured manner to enable efficient analysis, including data normalization, handling missing values, and ensuring data integrity.

  3. Robust ETL Processes: Python scripts are utilized for Extract, Transform, Load (ETL) operations, facilitating the transfer of data from diverse sources to Amazon Redshift, a high-performance, fully managed data warehouse.

  4. Interactive Power BI Dashboard: A dynamic Power BI dashboard has been crafted to offer users an intuitive interface for exploring and visualizing the dataset. Users can interact with the data to answer critical questions related to plane crashes.

  5. In-Depth Analysis: The project empowers users to gain insights by addressing questions such as identifying peak years for plane crashes, determining which armed forces experienced the most losses, identifying common reasons for crashes, assessing companies with the highest incident rates, examining survival statistics, ground casualties, and pinpointing countries with the highest crash occurrences.

Project Process

This repository contains the code and data for the FlightTragedyAnalysis Data Engineering Project. The project involves analyzing airplane crash data from Kaggle, building a data model, creating a relational database, performing ETL (Extract, Transform, Load) processes, storing the data in Amazon S3, and finally, creating a Power BI dashboard for comprehensive analysis.

1. Data Acquisition

  • Downloaded the dataset from Kaggle, which contains historical airplane crash data from 1908 to 2009.

2. Data Modeling

  • Created a data model to define the structure and relationships within the dataset.

3. Database Creation

  • Established a relational database (e.g., Amazon Redshift) to efficiently store and manage the structured data.

4. Data Storage

  • Uploaded the processed dataset into an object store on Amazon S3 for easy access and scalability.

5. ETL Process

  • Developed Python scripts for the ETL process to:
    • Extract data from the source dataset
    • Transform and clean the data
    • Load the data into the Amazon Redshift database for analysis

6. Power BI Dashboard

  • Created a Power BI dashboard to visualize and analyze the airplane crash data, answering questions like:
    • Which year had the highest number of plane crashes?
    • Which armed forces experienced the most plane losses?
    • What is the most common reason for crashes?
    • Which company has the highest number of crashes?
    • How many people survived the crashes?
    • How many people on the ground died in these incidents?
    • In which country did most plane crashes occur?

Technology Stack

  • ETL: Python (Pandas), Amazon Redshift, SQL
  • Database: Amazon Redshift
  • Data Storage: Amazon S3
  • Visualization: Power BI
  • Version Control: Git, GitHub

Repository Structure

  • data/: Contains the downloaded dataset and any intermediate data files.
  • data_model/: Contains the data model and any other data files.
  • scripts/: Includes Python scripts for data extraction, transformation, and loading (ETL).
  • database/: Contains scripts or documentation related to the database setup.
  • power_bi/: Stores files related to the Power BI dashboard.
  • README.md: You are here, providing an overview of the project.

Getting Started

To replicate this project, follow these steps:

  1. Download the dataset from Kaggle or provide a link to the dataset you are using.

  2. Set up your database (e.g., Amazon Redshift) and configure the necessary credentials.

  3. Use the ETL scripts to process and load the data into your database.

  4. Create a Power BI dashboard using the provided data to answer critical questions related to airplane crashes.

Data Source

Contributors

Feel free to contribute to this project or contact the contributors for more information.

License

This project is licensed under the MIT License.

🔗 Links

portfolio linkedin twitter

Let's Talk about your project 😃

About

Flight Tragedy Analysis is a comprehensive data analysis project focused on examining aviation accidents and incidents from 1905 to 2009. This project provides users with valuable insights into historical plane crashes and their associated data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published