Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
fabriziomiano committed May 24, 2022
1 parent 34321d4 commit 4aa7202
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 15 deletions.
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
# covid-etl-df
The scope of this project is the creation of insightful
The scope of this project is the creation of insightful
analytics on the Italian COVID pandemic and vaccine status.
![img.png](screenshots/img.png)
![pandemic.png](screenshots/pandemic.png)
![vax.png](screenshots/vax.png)
## Description
This repository hosts an Azure Data Factory (ADF) to perform the ingestion of the
This repository hosts an Azure Data Factory (ADF) to perform the ingestion of the
official COVID-19 pandemic- and vaccine-data on an Azure SQL Database.
It also contains for the DDL needed for the creation of the data model
It also contains for the DDL needed for the creation of the data model
on an Azure SQL Database.

## Data Factory
## Data Factory
The ADF consists of:
* 1 tumbling-window trigger
* 3 linked services
* 3 datasets
* 3 datasets
* 5 ingestion pipelines

<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FrealAngryAnalytics%2Fadf%2Fmaster%2Fazuredeploy.json" target="_blank">
Expand All @@ -38,10 +39,10 @@ Represents the CSV file from the civil-protection repository taken from its link
#### GithubOD
Represents the CSV file from the Italia open data repository taken from its linked service
#### SQLDB
Represents the dataset needed for the copy to the Azure SQL DB linked service
Represents the dataset needed for the copy to the Azure SQL DB linked service

### Pipelines
The main pipeline is the `Data Ingestion` pipeline.
The main pipeline is the `Data Ingestion` pipeline.
This calls two pipelines:
* `Ingest Pandemic Data`
* `Ingest Vax Data`
Expand All @@ -51,24 +52,24 @@ which in turn call the two parametric copy-activity pipelines:
* `PCM-DPC 2 SQL`

These are parametric in directory name and file name to be retrieved from the relevant
Github repositories and perform the copy activities from the CSV files to
Github repositories and perform the copy activities from the CSV files to
the relevant tables on SQL, together with a provided stored procedure
that update the age ranges in the
that update the age ranges in the
[Italia-Open-Data adminstrations CSV file](https://github.com/italia/covid19-opendata-vaccini/blob/master/dati/somministrazioni-vaccini-latest.csv)
to harmonize the age-range with their provided [population CSV file](https://github.com/italia/covid19-opendata-vaccini/blob/master/dati/platea.csv)

## The DDL scripts
The needed tables, view, and procedure are defined under `SQL/`.
The scripts create the relevant tables needed for the ingestion;
an update procedure to harmonize the data; the views to be exposed to
The scripts create the relevant tables needed for the ingestion;
an update procedure to harmonize the data; the views to be exposed to
the data model.

## Usage
The repository does not contain any script for the automated deployment of the ADF.
However, in order to deploy the ADF, apart from clicking on the button at the top of this repo
it is needed to:
* fork this repo
* provision an Azure SQL Database (connection string needed in the template)
* provision an Azure SQL Database (connection string needed in the template)
* provision an Azure Data Factory

Once the services have been created, run:
Expand Down
4 changes: 2 additions & 2 deletions linkedService/AzureSqlDatabase.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
"annotations": [],
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": "integrated security=False;encrypt=True;connection timeout=30;data source=covidash-srv.database.windows.net;initial catalog=covid-db;user id=burbusql",
"encryptedCredential": "ew0KICAiVmVyc2lvbiI6ICIyMDE3LTExLTMwIiwNCiAgIlByb3RlY3Rpb25Nb2RlIjogIktleSIsDQogICJTZWNyZXRDb250ZW50VHlwZSI6ICJQbGFpbnRleHQiLA0KICAiQ3JlZGVudGlhbElkIjogIkRBVEFGQUNUT1JZQDg1NUEwMTVCLUFFMDUtNDJBRi1BMjE1LTgxQzExMzUyMERDN184MDUxZmE2My0zYzY4LTQ1YjctOTUwYy1jYjBkMjhlNWFkMzMiDQp9"
"connectionString": "YOUR_LINKED_SERVICE_CONNECTION_STRING",
"encryptedCredential": "YOUR_LINKED_SERVICE_CREDENTIALS"
}
}
}
Binary file removed screenshots/img.png
Binary file not shown.
Binary file added screenshots/pandemic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added screenshots/vax.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4aa7202

Please sign in to comment.