Skip to content

Latest commit

 

History

History

lineage

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

BigQuery Lineage Connector

This connector extracts BigQuery lineage information from a Google cloud project using Python Client for Cloud Logging. It computes dataset lineage from jobChange event from the BigQuery audit logs.

Setup

Create a Service Accounts based on the Setup guide for the general BigQuery connector.

See Access Control for more information.

Config File

The config file inherits all the required and optional fields from the general BigQuery connector Config File. In addition, you can specify the following configurations:

# (Optional) Whether to enable parsing view definition to build view lineage, default True
enable_view_lineage: <boolean>

# (Optional) Whether to enable parsing audit log to find table lineage information, default True
enable_lineage_from_log: <boolean>

# (Optional) Whether to include self-referencing loops in lineage, default True
include_self_lineage: <boolean>

# (Optional) Number of days of logs to extract for lineage analysis. Default to 7.
lookback_days: <days>

# (Optional) The number of access logs fetched in a batch, default to 1000, value must be in range 0 - 1000
batch_size: <batch_size>

Testing

Follow the Installation instructions to install metaphor-connectors in your environment (or virtualenv). Make sure to include either all or bigquery extra.

Run the following command to test the connector locally:

metaphor bigquery.lineage <config_file>

Manually verify the output after the run finishes.