Project Overview

This project will introduce you to the core concepts of Apache Airflow. To complete the project, you will need to create your own custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step.

We have provided you with a project template that takes care of all the imports and provides four empty operators that need to be implemented into functional pieces of a data pipeline. The template also contains a set of tasks that need to be linked to achieve a coherent and sensible data flow within the pipeline.

You'll be provided with a helpers class that contains all the SQL transformations. Thus, you won't need to write the ETL yourselves, but you'll need to execute it with your custom operators.

Directory Structure

This is the folder structure of the project

.
├── LICENSE
├── README.md                 ' Project description
├── create_tables.sql         ' Sql file to create the tables in Redshift
├── dags
│   └── udac_example_dag.py   ' Code for the DAG file
└── plugins
    ├── __init__.py           ' Configuration file for Python packages
    ├── helpers
    │   ├── __init__.py       ' Configuration file for Python packages
    │   └── sql_queries.py    ' File containing all the SQL to be executed
    └── operators
        ├── __init__.py       ' Configuration file for Python packages
        ├── data_quality.py   ' Code for data validation operator
        ├── load_dimension.py ' Code for dimension table data load
        ├── load_fact.py      ' Code for fact table data load
        └── stage_redshift.py ' Code for loading S3 blobs into Redshift

Setup

Prerequisites

AWS Account
Redshift Cluster

Instructions

create_tables.sql needs to be executed on the Redshift cluster to create the tables.
Airflow connections need to be setup to access AWS S3 and AWS Redshift
udac_example_dag.py needs to be updated with the connection ids for redshift redshift_conn_id="<Airflow redshift connection id>" and aws_credentials_id="<Airflow AWS credentials id>"
Run the DAG on Airflow

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dags		dags
plugins		plugins
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_tables.sql		create_tables.sql
p5_dag.png		p5_dag.png
p5_events_structure.png		p5_events_structure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Directory Structure

Setup

Prerequisites

Instructions

About

Releases

Packages

Languages

License

bochap-udacity/dend-p5

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Directory Structure

Setup

Prerequisites

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages