This is a 4 Google Cloud Functions with shared logic written in python. The functions are triggered by a PubSub, which is triggered by a cron job. The functions retrieve data from different API endpoints that each output a .csv
file, then proceeds to ETL (extract, transform, and load) that data into a BigQuery Table.
All the commands necessary to setup, configure, and deploy the function are written into the scripts
folder and accessible by make
commands.
In order to run this project, you'll need
- make
- python 3.11
.env
file based upon the.env.example
file, with the values filled out- Google Cloud CLI (gcloud)
- functions-framework for python
To install ALL the necessary python packages for development:
pip install -r requirements-dev.txt
To install just the production python packages:
pip install -r requirements.txt
To publish the cloud function:
make gcp_functions_publish
To delete the cloud function:
make gcp_functions_delete
To create the PubSub that triggers the cloud function:
make gcp_pubsub_create
To delete the PubSub that triggers the cloud function:
make gcp_pubsub_delete
To create the cronjob for the PubSub:
make gcp_cronjob_create
To edit the cronjob for the PubSub, edit the file:
scripts/google-cloud/cronjob/edit.sh
Then run the command:
make gcp_cronjob_edit
To delete the cronjob for the PubSub
make gcp_cronjob_delete
To run all tests
make test
To test the retrieval of the api endpoint csv data:
make test_endpoint
To test setting up of the Pandas Dataframe and data transformations:
make test_transactions
To test the uploading of the transaction data to the BigQuery table:
make test_upload
To run the integration test on a function, you first need to select which function you want to test by setting the FUNCTIONS_FRAMEWORK_TARGET
env var to the function you want to test.
Next, you will need 3 separate shells.
In the first shell, start the function framework:
make ff_start
Next, in your second shell, start the pubsub emulators:
make gcp_em_pubsub_start
In your third shell, run the following command:
make gcp_em_pubsub_env_init
The output of the command should look something like:
export PUBSUB_EMULATOR_HOST=localhost:8085
Copy that output, paste it into your (third) shell, and hit enter.
That step is important, as the next set of commands will not work without it.
Ensure for these next steps that your .env
file has all the variables filled out.
Then, run the following command (in your third shell) to create a PubSub topic:
make gcp_em_pubsub_create_topic
If successful, you should see something like:
Created topic: projects/my-project/topics/my-topic
Next, run the following command (in your third shell) to create a subscription for the PubSub topic:
make gcp_em_pubsub_create_sub
If successful, you should see something like:
Push subscription created: name: "projects/my-project/subscriptions/my-subscription" \\n topic: "projects/my-project/topics/my-topic" push_config { push_endpoint: "http://localhost:8080" } ack_deadline_seconds: 10 message_retention_duration { seconds: 604800 } . Endpoint for subscription is: http://localhost:8080
Lastly, run the following command (in your third shell) to publish the topic:
make gcp_em_pubsub_publish_topic
If successful, you should see something like:
1 2 3 4 5 6 7 8 9 Published messages to projects/my-project/topics/my-topic.
Now return to your first shell, you should see the output of your integration tests