Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/reduce-date-spine #57

Merged
merged 13 commits into from
Mar 5, 2025
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# dbt_mixpanel v0.12.0
[PR #57](https://github.com/fivetran/dbt_mixpanel/pull/57) includes the following updates:

## Breaking Changes
> To ensure all updates are applied correctly, you must run `dbt run --full-refresh` after upgrading.
- To reduce compute, the default date spine now starts from the earliest `first_event_day` of the `stg_mixpanel__user_first_event` model instead of the fixed date `'2010-01-01'`.
- If you need to override this behavior, you can still set a custom `date_range_start` in your `dbt_project.yml`. See the [README](https://github.com/fivetran/dbt_mixpanel?tab=readme-ov-file#event-date-range) for more details.

## Under the Hood
- Several variable declarations have been removed from `dbt_project.yml` as they were redundant with the inline defaults in the models. No action is needed from users.
- Removed the `date_today` macro as it is no longer necessary.

## Documentation
- Update missing definitions from `src_mixpanel.yml`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also recommend adding an under the hood piece detailing why we're removing the variable declarations in the dbt_project.yml. Mainly for our reference in the future, but also good to callout since that may look like a big change, when it's really not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

# dbt_mixpanel v0.11.0
[PR #53](https://github.com/fivetran/dbt_mixpanel/pull/53) and [PR #55](https://github.com/fivetran/dbt_mixpanel/pull/55) include the following updates:

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright © 2025 Fivetran Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Include the following mixpanel package version in your `packages.yml` file:
```yaml
packages:
- package: fivetran/mixpanel
version: [">=0.11.0", "<0.12.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.12.0", "<0.13.0"] # we recommend using ranges to capture non-breaking changes automatically
```

### Step 3: Define database and schema variables
Expand Down Expand Up @@ -206,17 +206,20 @@ vars:
mixpanel__event_frequency_limit: 500 ## Default is 1000
```
#### Event Date Range
Because of the typical volume of event data, you may want to limit this package's models to work with a recent date range of your Mixpanel data (however, note that all final models are materialized as [incremental](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#incremental) tables).
Because of the typical volume of event data, you may want to limit this package's models to work with a more recent date range.

By default, the package looks at all events since January 1, 2010. To change this start date, add the following variable to your `dbt_project.yml` file:
By default, the package processes all events from your first recorded event. To override this and set a custom start date, add the following to your `dbt_project.yml`:

```yml
vars:
mixpanel:
date_range_start: 'yyyy-mm-dd'
```

**Note:** This date range will not affect the `number_of_new_users` column in the `mixpanel__daily_events` or `mixpanel__monthly_events` models. This metric will be *true* new users.
> NOTE:
> This date range will not affect the `number_of_new_users` column in the `mixpanel__daily_events` or `mixpanel__monthly_events` models. This metric will be *true* new users.
>
> Additionally, all final models are materialized as [incremental](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#incremental). Updating the `date_range_start` in `dbt_project.yml` will only apply to newly ingested data. If you modify the `date_range_start`, we recommend running `dbt run --full-refresh` to ensure consistency across the adjusted date range.

#### Global Event Filters
In addition to limiting the date range, you may want to employ other filters to remove noise from your event data.
Expand Down
9 changes: 1 addition & 8 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'mixpanel'
version: '0.11.0'
version: '0.12.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
mixpanel:
Expand All @@ -10,17 +10,10 @@ models:
vars:
mixpanel:
event_table: "{{ source('mixpanel', 'event') }}"
mixpanel__event_frequency_limit: 1000
date_range_start: '2010-01-01' # mostly global filter placed on mixpanel__event to limit the date range. does not apply to stg_mixpanel__event and stg_mixpanel__user_first_event
# global_event_filter: # global filter to place on this whole package in order to remove noise from events
event_custom_columns: [] # any custom column names in the source mixpanel.event table to include in mixpanel__event
# - name: "app_version" <- example format
# alias: "app_version_alias"
# transform_sql: "cast(app_version as string)"
event_properties_to_pivot: [] # list of events in mixpanel__event.event_properties (in the source table, event.properties) to pivot out into columns in mixpanel__event

sessionization_inactivity: 30 # number of minutes it takes for a session to timeout due to inactivity
# session_event_criteria: # filter to place on events in order to qualify for sessionization
sessionization_trailing_window: 3 # number of hours to look back at for each mixpanel__sessions run. this allows you to sessionize events that arrive late without requiring a full refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's really appropriate to remove this since we essentially deprecated a few releases ago.

session_passthrough_columns: [] # choose event columns to pass through to mixpanel__sessions (values taken from first event of session)
mixpanel_sources: []
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'mixpanel_integration_tests'
version: '0.11.0'
version: '0.12.0'
config-version: 2
profile: 'integration_tests'

Expand Down
4 changes: 4 additions & 0 deletions integration_tests/models/srcs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ sources:
tables:
- name: event
description: Table of all events tracked by Mixpanel across web, ios, and android platforms.
config:
enabled: "{{ var('has_defined_sources', false) }}"
columns: &columns
# default properties regardless of platforms used
- name: _fivetran_id
Expand Down Expand Up @@ -193,5 +195,7 @@ sources:

tables:
- name: event
config:
enabled: "{{ var('has_defined_sources', false) }}"
description: Table of all events tracked by Mixpanel across web, ios, and android platforms.
columns: *columns
9 changes: 6 additions & 3 deletions integration_tests/seeds/event.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
_file,_line,_fivetran_id,event_id,name,time,distinct_id,properties,insert_id,mp_processing_time_ms,_fivetran_synced,screen_width,wifi,app_release,app_version,os,mp_device_model,city,os_version,mp_country_code,lib_version,manufacturer,radio,carrier,screen_height,app_build_number,model,region,app_version_string,mp_lib,initial_referring_domain,device_id,referrer,current_url,browser,browser_version,initial_referrer,search_engine,referring_domain,bluetooth_version,has_nfc,brand,has_telephone,screen_dpi,google_play_services,had_persisted_distinct_id,bluetooth_enabled,ios_ifa,device,mp_keyword,distinct_id_before_identity,ae_session_length,insert_id_
sample_events.csv,10008,k4G7CSSjxr2mRtBxpVCxKHx+E8c=,9406000,plays,2020-08-01 19:16:46,9e308c27-42ab-57af-beef-555708cde092,"{""Operating System"": ""Roku"",""title"": ""spongebob"",""user_id"": ""114521"",""video_id"": 5279}",pnreuCmtCqgcDgkCaFhymwDgmwajggzaiDdn,406572,2020-08-07 08:59:49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
sample_events.csv,10009,tUc30K5Umuk03m8f3Vo3Np0uCAA=,1744000,playthrough_25,2020-08-01 11:35:44,622b169c-2187-5121-85e6-e5960b2512e3,"{""Operating System"": ""Roku"",""title"": ""rupaul's drag race"",""user_id"": ""129997"",""video_id"": 6369}",vqrpqghnlgcCzfjBmdsjcgqgDtwchbxAkuFu,744380,2020-08-07 08:58:02,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
_file,_line,_fivetran_id,event_id,name,time,distinct_id,properties,insert_id,mp_processing_time_ms,_fivetran_synced,screen_width,wifi,app_release,app_version,os,mp_device_model,city,os_version,mp_country_code,lib_version,manufacturer,radio,carrier,screen_height,app_build_number,model,region,app_version_string,mp_lib,initial_referring_domain,device_id,referrer,current_url,browser,browser_version,initial_referrer,search_engine,referring_domain,bluetooth_version,has_nfc,brand,has_telephone,screen_dpi,google_play_services,had_persisted_distinct_id,bluetooth_enabled,ios_ifa,device,mp_keyword,distinct_id_before_identity,ae_session_length
sample_events.csv,10008,k4G7CSSjxr2mRtBxpVCxKHx+E8c=,9406000,plays,2020-08-01 10:16:46,9e308c27-42ab-57af-beef-555708cde092,"{""Operating System"": ""Roku"",""title"": ""spongebob"",""user_id"": ""114521"",""video_id"": 5279}",pnreuCmtCqgcDgkCaFhymwDgmwajggzaiDdn,406572,2020-08-07 08:59:49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
sample_events.csv,10009,tUc30K5Umuk03m8f3Vo3Np0uCAA=,1744000,playthrough_25,2020-08-01 11:35:44,622b169c-2187-5121-85e6-e5960b2512e3,"{""Operating System"": ""Roku"",""title"": ""rupaul's drag race"",""user_id"": ""129997"",""video_id"": 6369}",vqrpqghnlgcCzfjBmdsjcgqgDtwchbxAkuFu,744380,2020-08-07 08:58:02,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
sample_events.csv,10010,tUc30K5Umuk03m8f3Vo3Np0uCAB=,1744000,playthrough_25,2024-02-02 08:58:02,622b169c-2187-5121-85e6-e5960b2512e3,"{""Operating System"": ""Roku"",""title"": ""rupaul's drag race"",""user_id"": ""129997"",""video_id"": 6369}",vqrpqghnlgcCzfjBmdsjcgqgDtwchbxAkuFu,744380,2024-01-31 08:58:02,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
new.csv,10011,tUc30K5Umuk03m8f3Vo3Np0uCAR=,1744001,playthrough_29,2024-01-31 08:58:02,622b169c-2187-5121-85e6-e5960b2512e7,"{""Operating System"": ""Roku"",""title"": ""spongebob"",""user_id"": ""129998"",""video_id"": 6367}",vqrpqghnlgcCzfjBmdsjcgqgDtwchbxAkuGu,744380,2024-01-31 08:58:02,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
new.csv,100121,tUc30K5Umuk03m8f3Vo3Np0uCAT=,1744001,playthrough_30,2025-02-27 08:58:02,622b169c-2187-5121-85e6-e5960b2512e7,"{""Operating System"": ""Roku"",""title"": ""spongebob"",""user_id"": ""129998"",""video_id"": 6367}",vqrpqghnlgcCzfjBmdsjcgqgDtwchbxAkuGu,744380,2024-01-31 08:58:02,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
17 changes: 0 additions & 17 deletions macros/date_today.sql

This file was deleted.

2 changes: 1 addition & 1 deletion models/mixpanel__daily_events.sql
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ final as (
trailing_users_28d,
trailing_users_7d,
event_type || '-' || date_day || '-' || source_relation as unique_key,
{{ mixpanel.date_today('dbt_run_date')}}
current_date as dbt_run_date

from agg_event_days

Expand Down
2 changes: 1 addition & 1 deletion models/mixpanel__event.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default date '2010-01-01' in line 28 also be removed?

Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it, but it should stay. The issue with the date spine was that it internally generates all dates back to 2010 even if there weren't records that went back that far, whereas this model only references existing records. We need a default value of some sort but I wanted to avoid pinging the source and incurring costs if we didn't need to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Thanks for explaining!

Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ pivot_properties as (

select
*,
{{ mixpanel.date_today('dbt_run_date')}}
current_date as dbt_run_date
{% if var('event_properties_to_pivot') %}
, {{ fivetran_utils.pivot_json_extract(string = 'event_properties', list_of_properties = var('event_properties_to_pivot')) }}
{% endif %}
Expand Down
2 changes: 1 addition & 1 deletion models/mixpanel__monthly_events.sql
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ final as (
-- note: churned users refer to users who did something last month and not this month
coalesce(lag(number_of_users, 1) over(partition by event_type, source_relation order by date_month asc) - number_of_repeat_users, 0) as number_of_churn_users,
date_month || '-' || event_type || '-' || source_relation as unique_key, -- for incremental model :)
{{ mixpanel.date_today('dbt_run_date')}}
current_date as dbt_run_date

from monthly_metrics
)
Expand Down
2 changes: 1 addition & 1 deletion models/mixpanel__sessions.sql
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ session_join as (
session_ids.device_id,
session_ids.total_number_of_events,
agg_event_types.event_frequencies,
{{ mixpanel.date_today('dbt_run_date')}}
current_date as dbt_run_date

{% if var('session_passthrough_columns', []) != [] %}
,
Expand Down
25 changes: 24 additions & 1 deletion models/staging/src_mixpanel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,4 +186,27 @@ sources:
description: >
Verifies that Google Play services is installed and enabled on this device,
and that the version installed on this device is no older than the one required by this client.


- name: app_version
description: The version of the application from which the event originated.

- name: app_release
description: The specific release version or build of the application.

- name: mp_device_model
description: The model of the device on which the event was triggered.

- name: had_persisted_distinct_id
description: Indicates whether the user had a persistent distinct ID before the current session.

- name: ios_ifa
description: The iOS Identifier for Advertisers (IFA), used for advertising tracking.

- name: insert_id
description: A unique identifier for each event to prevent duplication.

- name: _file
description: The filename in the source code where the event was triggered.

- name: _line
description: The line number in the source code where the event was initiated.
52 changes: 38 additions & 14 deletions models/staging/stg_mixpanel__user_event_date_spine.sql
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,44 @@ with user_first_events as (
),

spine as (
{% if execute and flags.WHICH in ('run', 'build') %}
{% if is_incremental() %}
-- For incremental runs, the first_date is 14 days prior to the max date since we need to
-- account for the week that is added to the end_date.
{%- set first_date_query %}
select
cast({{ mixpanel.mixpanel_lookback(from_date="max(date_day)", interval=14, datepart='day') }} as date)
{% endset -%}
{%- set first_date = dbt_utils.get_single_value(first_date_query) %}

select *

from (
{{ dbt_utils.date_spine(
datepart = "day",
start_date = "cast('" ~ var('date_range_start', '2010-01-01') ~ "' as date)",
end_date = dbt.dateadd("week", 1, dbt.date_trunc('day', dbt.current_timestamp_backcompat()))
)
}}
) as spine
{% if is_incremental() %}
-- every user-event_type will have the same last day. Add 7 days to the lookback to account for the week added above.
where date_day >= {{ mixpanel.mixpanel_lookback(from_date="max(date_day)", interval=14, datepart='day') }}
{% else %}
-- For full-refresh runs, use either the date from var(date_range_start) or the min date.
{%- set first_date_query %}
select
coalesce(
min(cast(first_event_day as date)),
cast({{ dbt.dateadd("month", -1, "current_date") }} as date)
) as min_date
from {{ ref('stg_mixpanel__user_first_event') }}
{% endset -%}
{%- set first_date = var('date_range_start', dbt_utils.get_single_value(first_date_query)) %}
{% endif %}

{% else %}
{%- set first_date_query %}
select
cast({{ dbt.dateadd("month", -1, "current_date") }} as date)
{% endset -%}
{%- set first_date = dbt_utils.get_single_value(first_date_query) %}
{% endif %}

-- Every user-event_type shares the same final date.
{{ dbt_utils.date_spine(
datepart = "day",
start_date = "cast('" ~ first_date ~ "' as date)",
end_date = dbt.dateadd("week", 1, "current_date")
)
}}
),

user_event_spine as (
Expand All @@ -57,4 +80,5 @@ user_event_spine as (

)

select * from user_event_spine
select *
from user_event_spine