Azure_Covid19_Analysis

This project analyses the No. of cases and Deaths due to covid 19 in year 2020 and transforms the Population data for further use. The Azure Components used here are :

Azure Data Lake Storage Gen 2
Azure Blob Storage
Azure Data Factory
Azure DataBricks
Azure SQL Database
Azure Service Principal

Extract

Pipeline for Ingesting ECDC data

Implement a for-loop to fetch each file from https://github.com/SharadChoudhury/Azure_Covid19_Analysis/raw/ecdc/main using the ecdc_file_list.json
Store the ingested files in raw/ecdc folder in ADLS.

Pipeline for Ingesting Population data from Blob Storage to Data Lake

Store the the raw population file in population_raw/BLOB in Azure Blob container
Implement a pipeline to fetch the raw file from the Blob container if it exists, then fetch its metadata and if the column count matches the required column count, then copy the file to ADLS.

Transform

Pipeline for Processing Cases and Deaths data and storing the processed file in Data lake

Create a dataflow to process the Cases and Deaths file as per the below requirements and store the processed sink file in ADLS.

Pipeline for Processing Hospital admissions data and storing the processed file in Data lake

Create a dataflow to process the Hospital admissions file as per the below requirements and store the processed sink file in ADLS.

Create a pipeline for transforming the population data using Databricks Pyspark Notebook

Create a ADF pipeline that runs the Databricks Notebook for population file transformation and store the processed file in ADLS.

Create a Master pipeline that runs both the child pipelines : 1. Ingesting Population data, 2. Transforming Population data using Databricks

This pipeline should get triggered when the blob for raw population file is created

Load

Create schemas for Cases and Deaths table, Hospital Admissions table in Azure SQL Database

Run SQL scripts in Azure SQL Database to create the table schemas in your database.

Sqlize the processed Cases and Deaths file, Hospital Admissions file

Create pipelines with Copy activity to copy the data from processed Cases and Deaths file, Hospital Admissions file to respective tables in the SQL database.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
lookup_data		lookup_data
population_raw		population_raw
processed		processed
pyspark_notebooks		pyspark_notebooks
raw/main/ecdc_data		raw/main/ecdc_data
sql_scripts		sql_scripts
.gitignore		.gitignore
README.md		README.md
ecdc_file_list.json		ecdc_file_list.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure_Covid19_Analysis

Extract

Pipeline for Ingesting ECDC data

Pipeline for Ingesting Population data from Blob Storage to Data Lake

Transform

Pipeline for Processing Cases and Deaths data and storing the processed file in Data lake

Pipeline for Processing Hospital admissions data and storing the processed file in Data lake

Create a pipeline for transforming the population data using Databricks Pyspark Notebook

Create a Master pipeline that runs both the child pipelines : 1. Ingesting Population data, 2. Transforming Population data using Databricks

Load

Create schemas for Cases and Deaths table, Hospital Admissions table in Azure SQL Database

Sqlize the processed Cases and Deaths file, Hospital Admissions file

About

Languages

SharadChoudhury/Azure_Covid19_Analysis

Folders and files

Latest commit

History

Repository files navigation

Azure_Covid19_Analysis

Extract

Pipeline for Ingesting ECDC data

Pipeline for Ingesting Population data from Blob Storage to Data Lake

Transform

Pipeline for Processing Cases and Deaths data and storing the processed file in Data lake

Pipeline for Processing Hospital admissions data and storing the processed file in Data lake

Create a pipeline for transforming the population data using Databricks Pyspark Notebook

Create a Master pipeline that runs both the child pipelines : 1. Ingesting Population data, 2. Transforming Population data using Databricks

Load

Create schemas for Cases and Deaths table, Hospital Admissions table in Azure SQL Database

Sqlize the processed Cases and Deaths file, Hospital Admissions file

About

Topics

Resources

Stars

Watchers

Forks

Languages