This project analyzes global earthquake events using cloud technologies and data engineering tools. It showcases how to fetch, process, and visualize large datasets related to earthquake occurrences across the globe.
- Microsoft Fabric:
- Notebook for data preprocessing and exploration.
- Data Factory Pipeline for automating data fetching and cleaning.
- PySpark: For efficient large-scale data processing.
- Power BI: For data visualization and reporting.
The earthquake data is fetched from the USGS Earthquake Catalog. You can view the raw data source at:
https://earthquake.usgs.gov
The workflow follows these steps:
- Data Ingestion: Automated using Data Factory Pipeline, fetching earthquake data from the USGS catalog.
- Data Cleaning: Performed within the pipeline to ensure the data is ready for analysis.
- Data Processing: Using PySpark, the large dataset is processed efficiently for meaningful insights.
- Visualization & Reporting: Earthquake patterns, magnitudes, and geographical distributions are visualized using Power BI.
- Visualized earthquake magnitude and frequency across different regions.
- Analyzed geographical patterns of earthquake occurrences over time.
- Clone the repository to your local machine:
git clone https://github.com/your-username/worldwide-earthquake-events.git