About this documentation

Taking account all the steps is a hard task.

In this manual are presented the most important steps to deploy the solution.

Please, do not hesitate to contactme for improvement proporsals.

Repository structure

credentials: The .json credential files.
data: The required data for the project, including backups.
dataflows: The orchestrated pipeline (written in python) in prefect.
dbt: The dbt solution.
environment: The required files to create the excecution environment.
images: The available images for the repo.
terraform: The terraform scripts for IaC deplyment.

Prerequisites

A local instance (Ubuntu 22.04.2 LTS is recommended):
- git installed.
- python installed.
- Terraform v1.4.5+ installed.
A Google account and a Google Cloud Platform account. If you do not have a GCP account, create one. This project can be completed using only the services included in the GCP free tier.
A Kaggle account and API credential.
A dbt cloud account.

Preparations

Download this repo on the folder latam-cooliving (the home path ~/latam-cooliving is recommended).
Setup the kaggle account for using the API. Download the file kaggle.json and copy it to the folder ~.kaggle

The kaggle file should look like this:
```
{"username":"<YOUR USERNAME>","key":"46s4f56a4fd98f47da416dc98e4cd89c486c4e894da6d16a8d4a984c9a614c6sa8f49a8f74ed89f489f4c5c61as62c1a63dc"}
```
Setup the GCP account. After creating your GCP account, create or modify the following resources to enable Terraform to provision your infrastructure, Prefect access to storage and BigQuery, and dbt to perform BigQuery operations:
- A GCP Project: GCP organizes resources into projects. Create one now in the GCP console and make note of the project ID. You can see a list of your projects in the cloud resource manager (It's recommended to create a project with a name like this: "latam-cooliving13987".).
- A GCP service account key: Create a service account key to enable Terraform, Prefect and dbt to access your GCP account. When creating the key, use the following steps:
  - Select the project you created in the previous step.
  - Click "Create Service Account".
  - Give it any name you like and click "Create".
  - For the Role, choose the editor roles ["BigQuery Data Editor", "BigQuery Resource Editor", "Editor"], then click "Continue".
  - Skip granting other users access, and click "Done".
  After you create your service account, download your service account key.
  - Select your service account from the list.
  - Select the "Keys" tab.
  - In the drop down menu, select "Create new key".
  - Leave the "Key Type" as JSON.
  - Click "Create" to create the key and save the key file to your system.
  - Rename of the file to "gcp_service_account.json". The file should look like this:
```
{
"type": "service_account",
"project_id": "<YOUR PROJECT ID>",
"private_key_id": "f6sg196fwef691w589f4169wf4156ewf",
"private_key": "-----BEGIN PRIVATE KEY-----\****hash****\n-----END PRIVATE KEY-----\n",
"client_email": "dfs1f51sd5f61sdf@f6df4156asd41f56a4d54fd.gserviceaccount.com",
"client_id": "6f1s5e6f1ws65f41wfe",
"auth_uri": "https://accounts.google.com",
"token_uri": "https://oauth2.googleapis.com",
"auth_provider_x509_cert_url": "https://www.googleapis.com",
"client_x509_cert_url": "https://www.googleapis.com"
}
```
    Warning ! The service account key file provides access to your GCP project. It should be treated like any other secret credentials. Specifically, it should never be checked into source control.
  - Copy the file to the credentials folder. So, the folder should like this:
- Save the project name and use it for terraform setup:
  - Open the file variables.tf in the terraform folder. Modify the name of the project, so it should look like this:
    
    Modify to the preffered region and the filepath to the gcp_service_account.json file with the GCP credentials
Setup the dbt project:
- Create a new project.
  - Set the subfolder option to "dbt".
  - Select the bigQuery connection using the credential in the "gcp_service_account.json" file.
    
    Set "dbt_latamcooliving" as the dataset name.
- Import the dbt files (frm the dbt repo folder) so you can see the following File explorer structure:
Setup the local environment
- Create a virtual environment and install the environment requirements using pip:
```
pip install -r requirements
```
- Excecute the "build_environment.py":
```
python environment/build_environment.py 
```
  The following message should appear if the excecution was succesful:

Run the project

Deploy the terraform infrastructure:

terraform init
terraform plan
terraform apply

Run prefect server and open the prefect UI:
```
prefect server start
```
Run the prefect dataflow (using the created environment):
```
python dataflows/orch_flow.py
```
If the excution ends ok, the should appear a meesage like this:
Excecute dbt run
```
dbt run --var 'is_test: false'
```
(optional) At this point, the data shuold be available to create the dashboard:
(optional) Destroy the terraform infrastructure:
```
terraform destroy
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HowToRun.md

HowToRun.md

About this documentation

Repository structure

Prerequisites

Preparations

Run the project

Files

HowToRun.md

Latest commit

History

HowToRun.md

File metadata and controls

About this documentation

Repository structure

Prerequisites

Preparations

Run the project