Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

Commit

Permalink
Add and document custom example (close #3)
Browse files Browse the repository at this point in the history
  • Loading branch information
adatzer committed Apr 1, 2021
1 parent d82bf22 commit a267ce3
Show file tree
Hide file tree
Showing 8 changed files with 332 additions and 2 deletions.
85 changes: 84 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,11 +156,91 @@ To configure your data model you only need to check the `sp.js` file in your `in
- `ua_parser`: (default: `false`). Whether to include data from the UA Parser enrichment.
- `yauaa`: (default: `false`). Whether to include data from the YAUAA enrichment.

8. **Custom staged dependencies**
- `custom_staged_dependencies` : This array parameter allows you ensure custom steps that depend on standard model's `staged` tables run before the complete steps that truncate them. You can read more in [Customization][customization] below.

Now your model is configured and you can finish by setting up your Dataform environment(s).


## Join the Snowplow community
## Customization

Following the [customization plugin principles][custom-sql] of the parent [v1 Bigquery web model][bq-web-model-dir], it is also possible to customize the Dataform model and leverage the incrementalization logic of the standatd modules.

In the `definitions/custom` subdirectory you can find the [same custom example][custom-sql] featured in the v1 BigQuery web model implemented for Dataform. In the directory structure you can see the custom modules parallel to standard:

```
├── dataform.json
├── definitions
│ ├── assertions
│ ├── custom
│ ├── standard
│ └── utils
├── environments.json
├── includes
│ └── sp.js
├── package.json
└── package-lock.json
```

In order to compliment the standard model with your custom incremental module, you only need to know which module's `_staged` tables it is meant to consume. Those modules' commit actions will be the dependencies of your custom module.

For example, the featured custom module, consumes both the `events_staged` and the `page_views_staged` tables, that are outputs of the `01_base` and `02_page_views` modules respectively. Since the `02_page_views` module already depends on the `01_base` module, it suffices to depend the first action of the custom module only on the last step of the page_views module (`07_commit_page_views`).

```
config {
type: "operations",
name: "01_link_clicks",
disabled: sp.model_disabled,
dependencies: [
"07_commit_page_views"
],
hermetic: true,
hasOutput: false,
tags: ["sp_web", "sp_custom"]
}
```

Since the `99_complete` steps of the standard model complete the incremental logic by truncating the `staged` tables, you also need to ensure that the part of custom module that depends on those tables, also runs before the `99_complete` steps. In order to do so, you only need to add the dependent step(s) in the `custom_staged_dependencies` array defined in the `sp.js` file.

```
const custom_staged_dependencies = [
"02_channel_engagement"
];
```

As you can see, for our custom example, the last step depending on `staged` tables is the `02_channel_engagement`, which gets added in `custom_staged_dependencies`.

Further notes:
1. The custom module follows the incrementalization logic of standard modules, by having the main, complete and detroy steps of its own:

- The main steps of the custom module run along the standard model, that's why in their config we set:

```
disabled: sp.model_disabled,
```

- The `complete` step of the custom module, need to _also_ run for a complete destruction of the whole model, that is why in `custom/99_custom_complete.sqlx`:

```
disabled: sp.model_disabled && sp.destroy_disabled,
```

- Similarly the destroy step of the custom module runs along the destroy steps of the standard model:

```
disabled: sp.destroy_disabled,
```


2. We generally suggest that any customization:

- Follows in analogy the same [guidelines and best practices][custom-best-practices] mentioned in Snowplow's v1 web model.
- Interacts with the standard model only through the `sp.js` file, avoiding to change the standard model's files directly.


# Join the Snowplow community

We welcome all ideas, questions and contributions!

Expand Down Expand Up @@ -203,3 +283,6 @@ limitations under the License.
[docs-datamodeling]: https://docs.snowplowanalytics.com/docs/modeling-your-data/the-snowplow-web-data-model/
[javascript-tracker]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/javascript-trackers/javascript-tracker/
[enrichments]: https://docs.snowplowanalytics.com/docs/enriching-your-data/available-enrichments/
[customization]: https://github.com/snowplow-incubator/dataform-data-models/tree/master#customization
[custom-sql]: https://github.com/snowplow/data-models/tree/master/web/v1/bigquery/sql-runner/sql/custom
[custom-best-practices]: https://github.com/snowplow/data-models/tree/master/web/v1/bigquery/sql-runner/sql/custom#guidelines--best-practice
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*
Copyright 2021 Snowplow Analytics Ltd. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

config {
type: "operations",
name: "01_link_clicks",
disabled: sp.model_disabled,
dependencies: [
"07_commit_page_views"
],
hermetic: true,
hasOutput: false,
tags: ["sp_web", "sp_custom"]
}

CREATE OR REPLACE TABLE `${sp.scratch_schema}.link_clicks${sp.entropy}` AS(
WITH click_count AS(
SELECT
page_view_id,
COUNT(DISTINCT event_id) AS link_clicks,

FROM
`${sp.scratch_schema}.events_staged${sp.entropy}`
WHERE
event_name = 'link_click'
GROUP BY 1

), first_and_last AS(
SELECT
page_view_id,

FIRST_VALUE(unstruct_event_com_snowplowanalytics_snowplow_link_click_1_0_1.target_url) OVER(PARTITION BY page_view_id
ORDER BY derived_tstamp desc
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_link_clicked,

LAST_VALUE(unstruct_event_com_snowplowanalytics_snowplow_link_click_1_0_1.target_url) OVER(PARTITION BY page_view_id
ORDER BY derived_tstamp desc
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_link_clicked

FROM `${sp.scratch_schema}.events_staged${sp.entropy}`
WHERE event_name = 'page_view'
)
SELECT
b.page_view_id,
COALESCE(a.link_clicks, 0) AS link_clicks,
b.first_link_clicked,
b.last_link_clicked

FROM
click_count a
LEFT JOIN
first_and_last b
ON b.page_view_id = a.page_view_id
);
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
/*
Copyright 2021 Snowplow Analytics Ltd. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

config {
type: "operations",
name: "02_channel_engagement",
disabled: sp.model_disabled,
dependencies: [
"01_link_clicks"
],
hermetic: true,
hasOutput: false,
tags: ["sp_web", "sp_custom"]
}

CREATE OR REPLACE TABLE `${sp.scratch_schema}.channel_engagement_staged${sp.entropy}` AS(
SELECT
-- get some basic info
pv.page_view_id,
pv.start_tstamp,
pv.page_url,

-- Add in our custom link click metrics
lc.link_clicks,
lc.first_link_clicked,
lc.last_link_clicked,

-- channel definition
CASE
WHEN pv.refr_medium = 'search'
AND (REGEXP_CONTAINS(lower(pv.mkt_medium), '%(cpc|ppc|sem|paidsearch)%')
OR REGEXP_CONTAINS(lower(pv.mkt_source), '%(cpc|ppc|sem|paidsearch)%')) THEN 'paidsearch'
WHEN lower(pv.mkt_medium) LIKE '%paidsearch%'
OR lower(pv.mkt_source) LIKE '%paidsearch%' THEN 'paidsearch'
WHEN REGEXP_CONTAINS(lower(pv.mkt_source), '%(adwords|google_paid|googleads)%')
OR REGEXP_CONTAINS(lower(pv.mkt_medium), '%(adwords|google_paid|googleads)%') THEN 'paidsearch'
WHEN lower(pv.mkt_source) LIKE '%google%'

AND lower(pv.mkt_medium) LIKE '%ads%' THEN 'paidsearch'
WHEN pv.refr_urlhost in ('www.googleadservices.com','googleads.g.doubleclick.net') then 'paidsearch'

WHEN REGEXP_CONTAINS(lower(pv.mkt_medium), '%(cpv|cpa|cpp|content-text|advertising|ads)%') THEN 'advertising'
WHEN REGEXP_CONTAINS(lower(pv.mkt_medium), '%(display|cpm|banner)%') THEN 'display'

WHEN pv.refr_medium IS NULL AND pv.page_url NOT LIKE '%utm_%' THEN 'direct'
WHEN (LOWER(pv.refr_medium) = 'search' AND pv.mkt_medium IS NULL)
OR (LOWER(pv.refr_medium) = 'search' AND LOWER(pv.mkt_medium) = 'organic') THEN 'organicsearch'
WHEN pv.refr_medium = 'social'
OR REGEXP_CONTAINS(LOWER(pv.mkt_source),'^((.*(facebook|linkedin|instagram|insta|slideshare|social|tweet|twitter|youtube|lnkd|pinterest|googleplus|instagram|plus.google.com|quora|reddit|t.co|twitch|viadeo|xing|youtube).*)|(yt|fb|li))$')
OR REGEXP_CONTAINS(LOWER(pv.mkt_medium),'^(.*)(social|facebook|linkedin|twitter|instagram|tweet)(.*)$') THEN 'social'
WHEN pv.refr_medium = 'email'
OR lower(pv.mkt_medium) LIKE '_mail' THEN 'email'
WHEN lower(pv.mkt_medium) LIKE 'affiliate' THEN 'affiliate'
WHEN pv.refr_medium = 'unknown' or lower(pv.mkt_medium) LIKE 'referral' OR lower(pv.mkt_medium) LIKE 'referal' THEN 'referral'
WHEN pv.refr_medium = 'internal' then 'internal'
ELSE 'others'
END AS channel,

-- some metrics to measure engagement
CASE
WHEN pv.engaged_time_in_s = 0 THEN TRUE
ELSE FALSE
END AS bounced_page_view,

(pv.vertical_percentage_scrolled / 100) * 0.3 + (pv.engaged_time_in_s / 600) * 0.7 AS engagement_score

FROM
-- Query from staged table to get most recent according to incremental logic
`${sp.scratch_schema}.page_views_staged${sp.entropy}` pv
LEFT JOIN
-- Join on the joinkey
`${sp.scratch_schema}.link_clicks${sp.entropy}` lc
ON pv.page_view_id = lc.page_view_id
);
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
Copyright 2021 Snowplow Analytics Ltd. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

config {
type: "operations",
name: "03_commit_custom",
disabled: sp.model_disabled,
dependencies: [
"02_channel_engagement"
],
hermetic: true,
hasOutput: false,
tags: ["sp_web", "sp_custom"]
}

-- Commit table procedure handles committing to prod, including table creation, and creation of new columns if 'automigrate' is set to TRUE

CALL `${sp.output_schema}.commit_table` (
'${sp.scratch_schema}', -- sourceDataset
'channel_engagement_staged${sp.entropy}', -- sourceTable
'${sp.output_schema}', -- targetDataset
'channel_engagement${sp.entropy}', -- targetTable
'page_view_id', -- joinKey
'start_tstamp', -- partitionKey
TRUE); -- automigrate
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
Copyright 2021 Snowplow Analytics Ltd. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

config {
type: "operations",
name: "99_custom_cleanup",
disabled: sp.model_disabled && sp.destroy_disabled,
dependencies: [
"03_commit_custom"
],
hermetic: true,
hasOutput: false,
tags: ["sp_web", "sp_custom"]
}

-- We can run this cleanup straight after the commit step, as long as no subsequent logic depends on it.

DROP TABLE IF EXISTS `${sp.scratch_schema}.link_clicks${sp.entropy}`;
DROP TABLE IF EXISTS `${sp.scratch_schema}.channel_engagement_staged${sp.entropy}`;
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
Copyright 2021 Snowplow Analytics Ltd. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

config {
type: "operations",
name: "XX_destroy_custom",
disabled: sp.destroy_disabled,
dependencies: [
"99_custom_cleanup"
],
hermetic: true,
hasOutput: false,
tags: ["sp_custom"]
}

DROP TABLE IF EXISTS `${sp.output_schema}.channel_engagement${sp.entropy}`;
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ config {
disabled: sp.model_disabled,
dependencies: [
"08_commit_users",
...sp.custom_staged_dependencies,
"010_events_staged",
"010_events_this_run",
"011_events_staged",
Expand Down
10 changes: 9 additions & 1 deletion web/bigquery/includes/sp.js
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,13 @@ const cleanup_mode = {
users: "all"
};

// For custom modules that depend on `staged` tables of the model:
// - add their (last) dependent steps in the `custom_staged_dependencies` array
// to ensure they run before the complete steps of the web model
const custom_staged_dependencies = [
"02_channel_engagement"
];


module.exports = {
model_disabled,
Expand All @@ -102,5 +109,6 @@ module.exports = {
yauaa,
ends_run,
cleanup_mode,
assertions_disabled
assertions_disabled,
custom_staged_dependencies
};

0 comments on commit a267ce3

Please sign in to comment.