Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new experimental ext BigQuery DLP #113

Open
wants to merge 37 commits into
base: next
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a5044f1
feat: initiate BigQuery DLP extension
pr-Mais Dec 8, 2022
eb695e0
add path to emulator
pr-Mais Dec 8, 2022
64ae97d
lifecyle event
pr-Mais Dec 9, 2022
e2f15c6
fix the connectionId & add a new role
pr-Mais Dec 9, 2022
5d8b17e
fix: creating functions query
pr-Mais Dec 12, 2022
f385abd
chore: remove unused code
pr-Mais Dec 13, 2022
601ca69
fix: dlp role
pr-Mais Dec 14, 2022
8eca3cc
feat: JSON approach
pr-Mais Dec 15, 2022
30bd5ba
feat: record type transformation
pr-Mais Dec 19, 2022
34a0820
chore: todo fo supporting record transformation
pr-Mais Dec 21, 2022
0b3dffc
feat: handle connection ALREADY_EXISTS
pr-Mais Dec 21, 2022
5d6fb02
docs: generate readme
pr-Mais Dec 21, 2022
552906e
-
pr-Mais Dec 21, 2022
c4f7c28
feat: transformation types
pr-Mais Dec 21, 2022
1cfd5d3
feat: add technique as param
pr-Mais Dec 21, 2022
125a9e1
fix: reidentify exceptions
pr-Mais Dec 23, 2022
07f51b2
docs: preinstall
pr-Mais Dec 23, 2022
6eb5dde
docs: postinstall
pr-Mais Dec 23, 2022
f581cc8
chore: added changelog
dackers86 Dec 23, 2022
b3475d2
chore: renamed to functions
dackers86 Dec 23, 2022
a4fbe89
fix: remove default value in DATASET_ID param
pr-Mais Dec 23, 2022
fc72158
fix: build
pr-Mais Dec 23, 2022
081a2a8
chore: updated changelog before publish
dackers86 Dec 23, 2022
abbfe4b
fix: identation
pr-Mais Dec 23, 2022
f413f6d
chore: ext version bump
dackers86 Dec 23, 2022
fb45b82
fix: default value
pr-Mais Dec 23, 2022
e797b90
fix: instance id
pr-Mais Dec 23, 2022
9f48630
fix: warn when remote functions exists
pr-Mais Dec 23, 2022
e2ba67f
feat: record transformation
pr-Mais Dec 29, 2022
2e3ff75
chore(bigquery-dlp-function): updated extension version and chnagelog
dackers86 Dec 29, 2022
97c1296
feat: added replace transformations
pr-Mais Feb 28, 2023
7472bf4
fix: use the method declared in config
pr-Mais Mar 1, 2023
4dbf893
chore: added gitignore
dackers86 Apr 7, 2023
7c741cd
feat: update documentation
dackers86 Apr 7, 2023
8b7ab2e
chore: ran formatting
dackers86 Apr 7, 2023
4677ce7
frat: renamed folder and added jest testing
dackers86 Apr 13, 2023
e34bfd3
docs: re-generate readme
dackers86 Apr 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
3 changes: 1 addition & 2 deletions _emulator/firebase.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{
"extensions": {
"firestore-record-user-acknowledgements": "../firestore-record-user-acknowledgements",
"firestore-bundle-server": "../firestore-bundle-server"
"bigquery-dlp-functions": "../bigquery-dlp-functions"
},
"storage": {
"rules": "storage.rules"
Expand Down
11 changes: 11 additions & 0 deletions bigquery-dlp-functions/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Version 0.0.3

feat: added record transformations

## Version 0.0.2

fix: remove default value in DATASET_ID param

## Version 0.0.1

Alpha release allowing deidentify/reidentify with documentation.
28 changes: 28 additions & 0 deletions bigquery-dlp-functions/POSTINSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
### See it in action

1. Go to your project's [BigQuery](https://console.cloud.google.com/bigquery?cloudshell=false&project=${param:PROJECT_ID}) in the Google Cloud console.
2. If it doesn't exist already, create a dataset called `${param:DATASET_ID}`.
3. Create a table that contains the data you want to de-identify.
4. Run the following query to de-identify the data in the table:

```sql
SELECT
val,
`dev-extensions-testing.bq_testing`.deindetify(TO_JSON(val))
FROM
`dev-extensions-testing.bq_testing.users` AS val
```

5. Run the following query to re-identify the data in the table:

```sql
SELECT
val,
`dev-extensions-testing.bq_testing`.reindetify(TO_JSON(val))
FROM
`dev-extensions-testing.bq_testing.users` AS val
```

### Monitoring

As a best practice, you can [monitor the activity](https://firebase.google.com/docs/extensions/manage-installed-extensions#monitor) of your installed extension, including checks on its health, usage, and logs.
25 changes: 25 additions & 0 deletions bigquery-dlp-functions/PREINSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Use this extension to de-identify sensitive data in BigQuery using the [Data Loss Prevention API](https://cloud.google.com/dlp/docs/).

This extension deploys 2 BigQuery remote functions, this extension:

- Perform de-identifaction on sensitive data passed as JSON from BigQuery.
- Re-identify sensitive data that were de-identified with reversable techniques.

You specify the desired DLP technique. All techniques are powered by the Google [Data Loss Prevention API](https://cloud.google.com/dlp/docs/transformations-reference). The options offered are:

- Replace with Masking.
- Redact a value (remove it from the data).

#### Additional setup

Before installing this extension, make sure that you've set up a BigQuery [dataset](https://cloud.google.com/bigquery/docs/datasets) and [table](https://cloud.google.com/bigquery/docs/tables).

#### Billing

This extension uses other Firebase or Google Cloud Platform services which may have associated charges:

- Cloud Data Loss Prevention API
- BigQuery
- Cloud Functions

When you use Firebase Extensions, you're only charged for the underlying resources that you use. A paid-tier billing plan is only required if the extension uses a service that requires a paid-tier plan, for example calling to a Google Cloud Platform API or making outbound network requests to non-Google services. All Firebase services offer a free tier of usage. [Learn more about Firebase billing.](https://firebase.google.com/pricing)
92 changes: 92 additions & 0 deletions bigquery-dlp-functions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# BigQuery DLP Remote Function

**Author**: Firebase (**[https://firebase.google.com](https://firebase.google.com)**)

**Description**: This extension creates BigQuery functions to facilitate de-identification and re-identification in queries, providing configurable techniques, seamless integration, and ensuring better data privacy and compliance.

---

## 🧩 Install this experimental extension

> ⚠️ **Experimental**: This extension is available for testing as an _experimental_ release. It has not been as thoroughly tested as the officially released extensions, and future updates might introduce breaking changes. If you use this extension, please [report bugs and make feature requests](https://github.com/firebase/experimental-extensions/issues/new/choose) in our GitHub repository.

### Console

[![Install this extension in your Firebase project](../install-extension.png?raw=true "Install this extension in your Firebase project")](https://console.firebase.google.com/project/_/extensions/install?ref=firebase/bigquery-dlp-functions)

### Firebase CLI

```bash
firebase ext:install firebase/bigquery-dlp-functions --project=<your-project-id>
```

> Learn more about installing extensions in the Firebase Extensions documentation: [console](https://firebase.google.com/docs/extensions/install-extensions?platform=console), [CLI](https://firebase.google.com/docs/extensions/install-extensions?platform=cli)

---

**Details**: Use this extension to de-identify sensitive data in BigQuery using the [Data Loss Prevention API](https://cloud.google.com/dlp/docs/).

This extension deploys 2 BigQuery remote functions, this extension:

- Perform de-identifaction on sensitive data passed as JSON from BigQuery.
- Re-identify sensitive data that were de-identified with reversable techniques.

You specify the desired DLP technique. All techniques are powered by the Google [Data Loss Prevention API](https://cloud.google.com/dlp/docs/transformations-reference). The options offered are:

- Replace with Masking.
- Redact a value (remove it from the data).

#### Additional setup

Before installing this extension, make sure that you've set up a BigQuery [dataset](https://cloud.google.com/bigquery/docs/datasets) and [table](https://cloud.google.com/bigquery/docs/tables).

#### Billing

This extension uses other Firebase or Google Cloud Platform services which may have associated charges:

- Cloud Data Loss Prevention API
- BigQuery
- Cloud Functions

When you use Firebase Extensions, you're only charged for the underlying resources that you use. A paid-tier billing plan is only required if the extension uses a service that requires a paid-tier plan, for example calling to a Google Cloud Platform API or making outbound network requests to non-Google services. All Firebase services offer a free tier of usage. [Learn more about Firebase billing.](https://firebase.google.com/pricing)

**Configuration Parameters:**

- DLP Transformation Method: The method used by Data Loss Prevention API to deidentify and/or encrypt sensitive information in the data.

- DLP Transformation Technique: The technique used by Data Loss Prevention API to deidentify and/or encrypt sensitive information in the data.

- List of fields to transform using record transformation (comma separated): The list of fields to transform using record transformation. This is only used when the transformation method is set to `RECORD`.

- BigQuery Dataset ID: The ID of the dataset where the extension will create a connection.

- Cloud Functions location: Where do you want to deploy the functions created for this extension? You usually want a location close to your database. For help selecting a location, refer to the [location selection guide](https://firebase.google.com/docs/functions/locations).

**Cloud Functions:**

- **createBigQueryConnection:** Creates a BigQuery connection.

- **deidentifyData:** TODO

- **reidentifyData:** TODO

**APIs Used**:

- bigquery.googleapis.com (Reason: Powers all BigQuery tasks performed by the extension.)

- bigqueryconnection.googleapis.com (Reason: Allows the extension to create a BigQuery connection.)

- dlp.googleapis.com (Reason: Allows the extension to use DLP services.)

**Access Required**:

This extension will operate with the following project IAM roles:

- bigquery.jobUser (Reason: Allows the extension to create BigQuery jobs.)

- bigquery.dataOwner (Reason: Allows the extension to create BigQuery routines.)

- bigquery.connectionAdmin (Reason: Allows the extension to create a BigQuery connection.)

- dlp.user (Reason: Allows the extension to use DLP services.)

160 changes: 160 additions & 0 deletions bigquery-dlp-functions/extension.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Learn detailed information about the fields of an extension.yaml file in the docs:
# https://firebase.google.com/docs/extensions/alpha/ref-extension-yaml

name: bigquery-dlp-functions # Identifier for your extension
version: 0.0.3 # Follow semver versioning
specVersion: v1beta # Version of the Firebase Extensions specification

author:
authorName: Firebase
url: https://firebase.google.com

displayName: BigQuery DLP Remote Function

description: This extension creates BigQuery functions to facilitate de-identification and re-identification in queries, providing configurable techniques, seamless integration, and ensuring better data privacy and compliance.

license: Apache-2.0 # https://spdx.org/licenses/

sourceUrl: TODO

billingRequired: true

apis:
- apiName: bigquery.googleapis.com
reason: Powers all BigQuery tasks performed by the extension.
- apiName: bigqueryconnection.googleapis.com
reason: Allows the extension to create a BigQuery connection.
- apiName: dlp.googleapis.com
reason: Allows the extension to use DLP services.

roles:
- role: bigquery.jobUser
reason: Allows the extension to create BigQuery jobs.
- role: bigquery.dataOwner
reason: Allows the extension to create BigQuery routines.
- role: bigquery.connectionAdmin
reason: Allows the extension to create a BigQuery connection.
- role: dlp.user
reason: Allows the extension to use DLP services.

resources:
- name: createBigQueryConnection
type: firebaseextensions.v1beta.function
description: Creates a BigQuery connection.
properties:
location: ${param:LOCATION}
runtime: nodejs14
taskQueueTrigger: {}
- name: deidentifyData
type: firebaseextensions.v1beta.function
description: TODO
properties:
location: ${param:LOCATION}
runtime: nodejs14
httpsTrigger: {}
- name: reidentifyData
type: firebaseextensions.v1beta.function
description: TODO
properties:
location: ${param:LOCATION}
runtime: nodejs14
httpsTrigger: {}

params:
- param: TRANSFORMATION_METHOD
label: DLP Transformation Method
description: >-
The method used by Data Loss Prevention API to deidentify and/or encrypt sensitive information in the data.
type: select
options:
- label: Info Type Transformations
value: INFO_TYPE
- label: Record Type Transformations
value: RECORD
default: INFO_TYPE

- param: TRANSFORMATION_TECHNIQUE
label: DLP Transformation Technique
description: >-
The technique used by Data Loss Prevention API to deidentify and/or encrypt sensitive information in the data.
type: select
options:
- label: Replace with Masking Character
value: masking
- label: Redact a value (remove it from the data)
value: redact
- label: Replace with a fixed value
value: fixed
- label: Replace with InfoType value
value: replaceWithInfoType
default: masking

- param: FIELDS_TO_TRANSFORM
label: List of fields to transform using record transformation (comma separated)
description: >-
The list of fields to transform using record transformation. This is only used when the transformation method is set to `RECORD`.
type: string

- param: DATASET_ID
label: BigQuery Dataset ID
description: >-
The ID of the dataset where the extension will create a connection.
type: string
required: true
immutable: true

- param: LOCATION
label: Cloud Functions location
description: >-
Where do you want to deploy the functions created for this extension? You
usually want a location close to your database. For help selecting a
location, refer to the [location selection
guide](https://firebase.google.com/docs/functions/locations).
type: select
options:
- label: Iowa (us-central1)
value: us-central1
- label: South Carolina (us-east1)
value: us-east1
- label: Northern Virginia (us-east4)
value: us-east4
- label: Los Angeles (us-west2)
value: us-west2
- label: Salt Lake City (us-west3)
value: us-west3
- label: Las Vegas (us-west4)
value: us-west4
- label: Belgium (europe-west1)
value: europe-west1
- label: London (europe-west2)
value: europe-west2
- label: Frankfurt (europe-west3)
value: europe-west3
- label: Zurich (europe-west6)
value: europe-west6
- label: Hong Kong (asia-east2)
value: asia-east2
- label: Tokyo (asia-northeast1)
value: asia-northeast1
- label: Osaka (asia-northeast2)
value: asia-northeast2
- label: Seoul (asia-northeast3)
value: asia-northeast3
- label: Mumbai (asia-south1)
value: asia-south1
- label: Jakarta (asia-southeast2)
value: asia-southeast2
- label: Montreal (northamerica-northeast1)
value: northamerica-northeast1
- label: Sao Paulo (southamerica-east1)
value: southamerica-east1
- label: Sydney (australia-southeast1)
value: australia-southeast1
default: us-central1
required: true
immutable: true

lifecycleEvents:
onInstall:
function: createBigQueryConnection
processingMessage: "Creating BigQuery connections"
8 changes: 8 additions & 0 deletions bigquery-dlp-functions/functions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Compiled JavaScript files
**/*.js
**/*.js.map

# Typescript v1 declaration files
typings/

node_modules/
49 changes: 49 additions & 0 deletions bigquery-dlp-functions/functions/__tests__/__mocks__/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
export function createMockDeidentifyContentResponse(itemType: any, value: any) {
return [
{
item: {
[itemType]: value,
},
},
];
}

export const mockProtos = {
google: {
privacy: {
dlp: {
v2: {
FieldId: {
create: jest.fn().mockImplementation(() => {
return { test: "data" };
}),
},
Table: {
Row: {
create: jest.fn().mockImplementation(() => {
return { test: "data" };
}),
},
},
Value: {
create: jest.fn().mockImplementation(() => {
return { test: "data" };
}),
},
},
},
},
},
};

export const getFunctions = () => {
return {
taskQueue: (functionName: any, queueName: any) => {
return {
enqueue: async (payload: any) => {
console.log("Enqueue payload:", payload);
},
};
},
};
};
Loading