Taking account all the steps is a hard task.
In this manual are presented the most important steps to deploy the solution.
Please, do not hesitate to contactme for improvement proporsals.
- credentials: The .json credential files.
- data: The required data for the project, including backups.
- dataflows: The orchestrated pipeline (written in python) in prefect.
- dbt: The dbt solution.
- environment: The required files to create the excecution environment.
- images: The available images for the repo.
- terraform: The terraform scripts for IaC deplyment.
- A local instance (Ubuntu 22.04.2 LTS is recommended):
- git installed.
- python installed.
- Terraform v1.4.5+ installed.
- A Google account and a Google Cloud Platform account. If you do not have a GCP account, create one. This project can be completed using only the services included in the GCP free tier.
- A Kaggle account and API credential.
- A dbt cloud account.
-
Download this repo on the folder latam-cooliving (the home path ~/latam-cooliving is recommended).
-
Setup the kaggle account for using the API. Download the file kaggle.json and copy it to the folder ~.kaggle
The kaggle file should look like this:
{"username":"<YOUR USERNAME>","key":"46s4f56a4fd98f47da416dc98e4cd89c486c4e894da6d16a8d4a984c9a614c6sa8f49a8f74ed89f489f4c5c61as62c1a63dc"}
-
Setup the GCP account. After creating your GCP account, create or modify the following resources to enable Terraform to provision your infrastructure, Prefect access to storage and BigQuery, and dbt to perform BigQuery operations:
-
A GCP Project: GCP organizes resources into projects. Create one now in the GCP console and make note of the project ID. You can see a list of your projects in the cloud resource manager (It's recommended to create a project with a name like this: "latam-cooliving13987".).
-
A GCP service account key: Create a service account key to enable Terraform, Prefect and dbt to access your GCP account. When creating the key, use the following steps:
-
Select the project you created in the previous step.
-
Click "Create Service Account".
-
Give it any name you like and click "Create".
-
For the Role, choose the editor roles ["BigQuery Data Editor", "BigQuery Resource Editor", "Editor"], then click "Continue".
-
Skip granting other users access, and click "Done".
After you create your service account, download your service account key.
-
Select your service account from the list.
-
Select the "Keys" tab.
-
In the drop down menu, select "Create new key".
-
Leave the "Key Type" as JSON.
-
Click "Create" to create the key and save the key file to your system.
-
Rename of the file to "gcp_service_account.json". The file should look like this:
{ "type": "service_account", "project_id": "<YOUR PROJECT ID>", "private_key_id": "f6sg196fwef691w589f4169wf4156ewf", "private_key": "-----BEGIN PRIVATE KEY-----\****hash****\n-----END PRIVATE KEY-----\n", "client_email": "dfs1f51sd5f61sdf@f6df4156asd41f56a4d54fd.gserviceaccount.com", "client_id": "6f1s5e6f1ws65f41wfe", "auth_uri": "https://accounts.google.com", "token_uri": "https://oauth2.googleapis.com", "auth_provider_x509_cert_url": "https://www.googleapis.com", "client_x509_cert_url": "https://www.googleapis.com" }
Warning ! The service account key file provides access to your GCP project. It should be treated like any other secret credentials. Specifically, it should never be checked into source control.
-
Copy the file to the credentials folder. So, the folder should like this:
-
-
Save the project name and use it for terraform setup:
-
-
Setup the dbt project:
-
Setup the local environment
-
Deploy the terraform infrastructure:
terraform init terraform plan terraform apply
-
Run prefect server and open the prefect UI:
prefect server start
-
Run the prefect dataflow (using the created environment):
python dataflows/orch_flow.py
If the excution ends ok, the should appear a meesage like this:
-
Excecute dbt run
dbt run --var 'is_test: false'
-
(optional) At this point, the data shuold be available to create the dashboard:
-
(optional) Destroy the terraform infrastructure:
terraform destroy