Megalista

Sample integration code for onboarding offline/CRM data from BigQuery as custom audiences or offline conversions in Google Ads, Google Analytics 360, Google Display & Video 360 and Google Campaign Manager.

Disclaimer: This is not an officially supported Google product.

Supported integrations

Google Ads
- Contact Info Customer Match (email, phone, address) [details]
- Id Based Customer Match (device Id, user id)
- Offline Conversions through gclid [details]
- Store Sales Direct (SSD) conversions [details]
Google Analytics (Universal analytics)
- Custom segments through Data Import [details]
- Measurement Protocol [details]
Campaign Manager
- Offline Conversions API (user id, device id, match id, gclid, dclid) [details]
Google Analytics 4
- Measurement protocol (Web + App) [details]
Appsflyer
- S2S Offline events API (conversion upload), to be used for audience creation and in-app events with Google Ads and DV360 [details]

How does it work

Megalista was design to separate the configuration of conversion/audience upload rules from the engine, giving more freedom for non-technical teams (i.e. Media and Business Inteligence) to setup multiple upload rules on their own.

The solution consists in #1 a Google Spreadsheet (template) in which all rules are defined by mapping a data source (BigQuery Table) to a destination (data upload endpoint) and #2, an apache beam workflow running on Google Dataflow, scheduled to upload the data in batch mode.

Prerequisites

Google Cloud Services

Google Cloud Platform account
- Billing enabled
- BigQuery enabled
- Dataflow enabled
- Cloud storage enabled
- Cloud scheduler enabled
At least one of:
- Google Ads API Access
- Campaign Manager API Access
- Google Analytics API Access
Python3
Google Cloud SDK

Access Requirements

Those are the minimum roles necessary to deploy Megalista:

OAuth Config Editor
BigQuery User
BigQuery Job User
BigQuery Data Viewer
Cloud Scheduler Admin
Storage Admin
Dataflow Admin
Service Account Admin
Logs Viewer
Service Consumer

APIs

Required APIs will depend on upload endpoints in use. We recomend you to enable all of them:

Google Sheets (required for any use case) [link]
Google Analytics [link]
Google Analytics Reporting [link]
Google Ads [link]
Campaign Manager [link]

Installation

Create a copy of the configuration Spreadsheet

WIP

Creating required access tokens

To access campaigns and user lists on Google's platforms, this dataflow will need OAuth tokens for a account that can authenticate in those systems.

In order to create it, follow these steps:

Access GCP console
Go to the API & Services section on the top-left menu.
On the OAuth Consent Screen and configure an Application name
Then, go to the Credentials and create an OAuth client Id with Application type set as Desktop App
This will generate a Client Id and a Client secret
Run the generate_megalist_token.sh script in this folder providing these two values and follow the instructions
- Sample: ./generate_megalist_token.sh client_id client_secret
This will generate the Access Token and the Refresh token

Creating a bucket on Cloud Storage

This bucket will hold the deployed code for this solution. To create it, navigate to the Storage link on the top-left menu on GCP and click on Create bucket. You can use Regional location and Standard data type for this bucket.

Running Megalista

We recommend first running it locally and make sure that everything works. Make some sample tables on BigQuery for one of the uploaders and make sure that the data is getting correctly to the destination. After that is done, upload the Dataflow template to GCP and try running it manually via the UI to make sure it works. Lastly, configure the Cloud Scheduler to run Megalista in the frequency desired and you'll have a fully functional data integration pipeline.

Running locally

python3 megalist_dataflow/main.py \
  --runner DirectRunner \
  --developer_token ${GOOGLE_ADS_DEVELOPER_TOKEN} \
  --setup_sheet_id ${CONFIGURATION_SHEET_ID} \
  --refresh_token ${REFRESH_TOKEN} \
  --access_token ${ACCESS_TOKEN} \
  --client_id ${CLIENT_ID} \
  --client_secret ${CLIENT_SECRET} \
  --project ${GCP_PROJECT_ID} \
  --region us-central1 \
  --temp_location gs://{$GCS_BUCKET}/tmp

Deploying Pipeline

To deploy, use the following command: ./deploy_cloud.sh project_id bucket_name region_name

Manually executing pipeline using Dataflow UI

To execute the pipeline, use the following steps:

Go to Dataflow on GCP console
Click on Create job from template
On the template selection dropdown, select Custom template
Find the megalist file on the bucket you've created, on the templates folder
Fill in the parameters required and execute

Scheduling pipeline

To schedule daily/hourly runs, go to Cloud Scheduler:

Click on create job
Add a name and frequency as desired
For target set as HTTP
Configure a POST for url: https://dataflow.googleapis.com/v1b3/projects/${YOUR_PROJECT_ID}/locations/${LOCATION}/templates:launch?gcsPath=gs://${BUCKET_NAME}/templates/megalist, replacing the params with the actual values
For a sample on the body of the request, check cloud_config/scheduler.json
Add OAuth Headers
Scope: https://www.googleapis.com/auth/cloud-platform

Creating a Service Account

It's recommended to create a new Service Account to be used with the Cloud Scheduler

Go to IAM & Admin > Service Accounts
Create a new Service Account with the following roles:
- Cloud Dataflow Service Agent
- Dataflow Admin
- Storage Objects Viewer

Usage

Every upload method expects as source a BigQuery data with specific fields, in addition to specific configuration metadata. For details on how to setup your upload routines, refer to the Megalista Wiki or the Megalista user guide.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
.github/workflows		.github/workflows
cloud_config		cloud_config
documentation		documentation
megalist_dataflow		megalist_dataflow
terraform		terraform
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
generate_megalist_token.sh		generate_megalist_token.sh
mypy.ini		mypy.ini
run_cloud.sh		run_cloud.sh
run_tests.sh		run_tests.sh
terraform_deploy.sh		terraform_deploy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Megalista

Supported integrations

How does it work

Prerequisites

Google Cloud Services

Access Requirements

APIs

Installation

Create a copy of the configuration Spreadsheet

Creating required access tokens

Creating a bucket on Cloud Storage

Running Megalista

Running locally

Deploying Pipeline

Manually executing pipeline using Dataflow UI

Scheduling pipeline

Creating a Service Account

Usage

About

Releases

Packages

Languages

License

joaquimsn/marketing-data-sync

Folders and files

Latest commit

History

Repository files navigation

Megalista

Supported integrations

How does it work

Prerequisites

Google Cloud Services

Access Requirements

APIs

Installation

Create a copy of the configuration Spreadsheet

Creating required access tokens

Creating a bucket on Cloud Storage

Running Megalista

Running locally

Deploying Pipeline

Manually executing pipeline using Dataflow UI

Scheduling pipeline

Creating a Service Account

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages