Flight Tragedy Analysis is a comprehensive data analysis project focused on examining aviation accidents and incidents from 1905 to 2009. This project provides users with valuable insights into historical plane crashes and their associated data.
-
Extensive Data Repository: The project houses a rich dataset containing detailed information about plane crashes, including date, time, location, operator, flight number, route, aircraft type, registration, casualties, and more.
-
Data Transformation and Storage: The data is meticulously transformed and stored in a structured manner to enable efficient analysis, including data normalization, handling missing values, and ensuring data integrity.
-
Robust ETL Processes: Python scripts are utilized for Extract, Transform, Load (ETL) operations, facilitating the transfer of data from diverse sources to Amazon Redshift, a high-performance, fully managed data warehouse.
-
Interactive Power BI Dashboard: A dynamic Power BI dashboard has been crafted to offer users an intuitive interface for exploring and visualizing the dataset. Users can interact with the data to answer critical questions related to plane crashes.
-
In-Depth Analysis: The project empowers users to gain insights by addressing questions such as identifying peak years for plane crashes, determining which armed forces experienced the most losses, identifying common reasons for crashes, assessing companies with the highest incident rates, examining survival statistics, ground casualties, and pinpointing countries with the highest crash occurrences.
This repository contains the code and data for the FlightTragedyAnalysis Data Engineering Project. The project involves analyzing airplane crash data from Kaggle, building a data model, creating a relational database, performing ETL (Extract, Transform, Load) processes, storing the data in Amazon S3, and finally, creating a Power BI dashboard for comprehensive analysis.
- Downloaded the dataset from Kaggle, which contains historical airplane crash data from 1908 to 2009.
- Created a data model to define the structure and relationships within the dataset.
- Established a relational database (e.g., Amazon Redshift) to efficiently store and manage the structured data.
- Uploaded the processed dataset into an object store on Amazon S3 for easy access and scalability.
- Developed Python scripts for the ETL process to:
- Extract data from the source dataset
- Transform and clean the data
- Load the data into the Amazon Redshift database for analysis
- Created a Power BI dashboard to visualize and analyze the airplane crash data, answering questions like:
- Which year had the highest number of plane crashes?
- Which armed forces experienced the most plane losses?
- What is the most common reason for crashes?
- Which company has the highest number of crashes?
- How many people survived the crashes?
- How many people on the ground died in these incidents?
- In which country did most plane crashes occur?
- ETL: Python (Pandas), Amazon Redshift, SQL
- Database: Amazon Redshift
- Data Storage: Amazon S3
- Visualization: Power BI
- Version Control: Git, GitHub
data/
: Contains the downloaded dataset and any intermediate data files.data_model/
: Contains the data model and any other data files.scripts/
: Includes Python scripts for data extraction, transformation, and loading (ETL).database/
: Contains scripts or documentation related to the database setup.power_bi/
: Stores files related to the Power BI dashboard.README.md
: You are here, providing an overview of the project.
To replicate this project, follow these steps:
-
Download the dataset from Kaggle or provide a link to the dataset you are using.
-
Set up your database (e.g., Amazon Redshift) and configure the necessary credentials.
-
Use the ETL scripts to process and load the data into your database.
-
Create a Power BI dashboard using the provided data to answer critical questions related to airplane crashes.
- Dataset: Kaggle - Airplane Crashes Since 1908
- Abdul Hanan Nawaz (Associate Data Engineer)
Feel free to contribute to this project or contact the contributors for more information.
This project is licensed under the MIT License.