Skip to content

Using Airflow (in Google Cloud Composer) to bring data from a Postgres Server (Amazon AWS) to a Google Cloud Storage Bucket as a CSV or TSV. Transferring the CSV from GCS to a new Google BigQuery Table.

Notifications You must be signed in to change notification settings

jaredfiacco2/Airflow_GoogleCloudComposer_PostGreSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

LinkedIn


Logo

Airflow Dags for Bringing Postgre Data to Google Cloud

Airflow Dags to make it easier to transfer data from one place to another all while having full oversight of the data pipeline. In this project, I take tables/queries from a Postgre instance and transfer them to a Google Cloud Storage Bucket as a CSV or TSV. I also show how you can kick it up an notch and transfer the CSV to a BigQuery Table.

Table of Contents
  1. About The Project
  2. Prerequisites
  3. Contact

About The Project

  • I wanted to set up a process where I can automatically bring all my data from a Postgres instance to a Google Cloud Storage bucket and then transfer the CSV files to Google Cloud BigQuery for analytics/warehousing.

  • Note: The data depicted in the gifs below is from one of my personal servers.

Overview

Built With

Prerequisites

  1. Install Airflow Locally or make a Google Cloud Composer Instance (~$10-$12/day). Pros and cons below.
  • Local Airflow Instance
    • Pros:
      • Free
      • The new Docker image makes it a bit easier to build.
      • Best if you're experienced and don't have money/want to spend money.
    • Cons:
      • Adding a Ubuntu or other Linux distro VM on your computer can slow/overheat it unless it's a souped up PC.
      • Can be very confusing for a newby.
      • Many requirements, so it takes a while to set up (can be north of a full day's time).
  • GCP Cloud Composer
    • Pros:
      • If you have never used Google Cloud before, you can get a FREE $300 credit fro your first month, so this would be a free environment for the first ~15-20 days.
      • Reletively fast setup (~25 Mins to initialize)
      • Easiest set up by far!
      • All the GCP connections come preset. So you can run your dags with minimal connectino set up time.
      • The best choice if you're a newby with $10-$20 to burn on a weekend and are looking for experience.
    • Cons:
      • Cost: $10-$12 USD/day
      • You can't build over time, need to focus and dedicate yourself to coding. Time is money.
  1. Create your Postgres Connection

Airflow Connections

  1. Test your new connection with the Airflow "Data Profiling" Tool

Airflow Data Profiling

  1. Edit the dag(s) so they pull the right data from Postgres and push the right data in the right formats on Google Cloud.

  2. In your Google Cloud Storage Instance (created for this airflow instance), upload these dags to the dag folder.

Google Cloud Storage

  1. Run the DAG(s). Read the Logs if the operation fails. Make sure the data got to the storage bucket and is in the right format.

Airflow

  1. Check out BigQuery to make sure the CSV data got over (if you ran the postgre_table_to_gcs_to_bq dag)

BigQuery

Contact

Jared Fiacco - jaredfiacco2@gmail.com

About

Using Airflow (in Google Cloud Composer) to bring data from a Postgres Server (Amazon AWS) to a Google Cloud Storage Bucket as a CSV or TSV. Transferring the CSV from GCS to a new Google BigQuery Table.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages