-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add weighted mean with the migration fixes #28458
base: master
Are you sure you want to change the base?
fix: Add weighted mean with the migration fixes #28458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds weighted mean calculation functionality for retention insights and fixes migration issues that caused dashboard failures. Here's a summary of the key changes:
- Added RetentionMeanDropdown component with options for 'none', 'simple', and 'weighted' mean calculations in
/frontend/src/scenes/insights/filters/RetentionMeanDropdown.tsx
- Modified schema to change
showMean
from boolean to string enum ('none', 'simple', 'weighted') in/frontend/src/queries/schema.json
- Added migration script in
/posthog/migrations/0560_migrate_retention_show_mean.py
to convert existing boolean values to string format - Fixed bug in
RetentionTable.tsx
to exclude cohorts with 0 users from mean calculations - Updated test files to reflect new string-based mean type instead of boolean
The changes appear well-structured and address both the weighted mean feature request and the zero-user cohort bug while ensuring backward compatibility.
💡 (5/5) You can turn off certain types of comments like style here!
17 file(s) reviewed, 5 comment(s)
Edit PR Review Bot Settings | Greptile
const { retentionFilter } = useValues(insightVizDataLogic(insightProps)) | ||
const { updateInsightFilter } = useActions(insightVizDataLogic(insightProps)) | ||
|
||
const showMean = retentionFilter?.showMean || RETENTION_MEAN_NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Default fallback to RETENTION_MEAN_NONE is correct but could be more explicit with nullish coalescing (??) instead of logical OR (||) to handle edge cases where showMean is intentionally set to false
onChange={(showMean) => { | ||
updateInsightFilter({ showMean }) | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider adding type safety by explicitly typing the onChange parameter as RetentionMeanType
onChange={(showMean) => { | |
updateInsightFilter({ showMean }) | |
}} | |
onChange={(showMean: RetentionMeanType) => { | |
updateInsightFilter({ showMean }) | |
}} |
const weights = validRows.map((row) => | ||
parseInt(row[1]?.toString() || '0') | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: parseInt without radix parameter could lead to unexpected results with certain string formats. Consider using parseInt(x, 10) for explicit base-10 parsing.
const weights = validRows.map((row) => | |
parseInt(row[1]?.toString() || '0') | |
) | |
const weights = validRows.map((row) => | |
parseInt(row[1]?.toString() || '0', 10) | |
) |
return !( | ||
(columnIndex >= row.length - 1 && isLatestPeriod) || | ||
!row[columnIndex] || | ||
row[columnIndex].count <= 0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The row filtering logic is duplicated between weighted and simple mean calculations. Consider extracting this into a shared helper function.
) | ||
|
||
for insight in retention_insights.iterator(chunk_size=100): | ||
show_mean_value = insight.query["source"]["retentionFilter"]["showMean"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Accessing nested dictionary keys without .get() could raise KeyError if structure is malformed
📸 UI snapshots have been updated12 snapshot changes in total. 0 added, 12 modified, 0 deleted:
Triggered by this commit. |
Size Change: +441 B (0%) Total Size: 9.73 MB ℹ️ View Unchanged
|
📸 UI snapshots have been updated32 snapshot changes in total. 0 added, 32 modified, 0 deleted:
Triggered by this commit. |
a594a3b
to
8e50320
Compare
📸 UI snapshots have been updated22 snapshot changes in total. 0 added, 22 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated10 snapshot changes in total. 0 added, 10 modified, 0 deleted:
Triggered by this commit. |
599b8e0
to
537f6e6
Compare
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
I tried testing this locally, but if you have a saved insight that has 'showMean' set to true, I'm not sure that it ends up being properly set to 'simple' once you switch to this branch. I don't want to give bad advice here - @thmsobrmlr is there a place where you think it would best to have that sort of logic at the current time? |
@aspicer That behavior was expected. Since it's just a config value, I thought this would be okay in the brief time we do the migration to port the values. The only impact it would have on our users would be to set the mean retention once again. After which, the setting would be correctly set on the showMeanRetention attribute if they were to toggle it. I didn't want to allow the value of showMean to override the value of showMeanRetention because if a user were to change the showMeanRetention value using the updated UI they wouldn't see the correct data due to the override, which I think would be more confusing. I do agree that this isn't ideal, but it felt the least confusing/annoying. If you or @thmsobrmlr think some other approach would be better, more than happy to make any changes. |
654be72
to
d2c92b1
Compare
📸 UI snapshots have been updated12 snapshot changes in total. 0 added, 12 modified, 0 deleted:
Triggered by this commit. |
58fd242
to
e1df09c
Compare
📸 UI snapshots have been updated8 snapshot changes in total. 0 added, 8 modified, 0 deleted:
Triggered by this commit. |
313c7f8
to
2dc83f7
Compare
📸 UI snapshots have been updated30 snapshot changes in total. 0 added, 30 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated30 snapshot changes in total. 0 added, 30 modified, 0 deleted:
Triggered by this commit. |
e29f25e
to
a9a27b2
Compare
Just cross posting, I think it's easier + makes sense to just default to showing the mean. It makes sense to always show the mean and the user can hide it if they don't want it. |
f724c9b
to
424b98f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work @Sriram-bk ! I left a couple of comments inline, but the PR should be safe to merge in the sense that the insights won't break. In the edge cases users have to configure the show mean setting again, which is acceptable I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarification: This function is used for migrating filter-based insights to query-based ones. We've already converted all existing insights a couple of weeks ago. Only people who use our API directly can still generate these insights. Today we have about 130 filter-based insights and only two of these are retention ones.
Since the retention mean feature was never available with filter based insights, I imagine we don't need to add any handling for this legacy use case. If we were to do so however, it would only work if we also added a mixin here https://github.com/PostHog/posthog/blob/master/posthog/models/filters/filter.py, as otherwise the filter will get lost when serializing. We'd probably also want to handle both showMean
and showMeanRetention
then.
@@ -669,7 +669,7 @@ describe('filtersToQueryNode', () => { | |||
returning_entity: { id: '1' }, | |||
target_entity: { id: '1' }, | |||
period: RetentionPeriod.Day, | |||
show_mean: true, | |||
mean_retention_calculation: 'simple', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be better to test that a show_mean
filter gets converted to a meanRetentionCalculation
field on the query. legacy queries exist with legacy properties.
@@ -539,6 +539,7 @@ def _insight_filter(filter: dict, allow_variables: bool = False): | |||
), | |||
period=filter.get("period"), | |||
showMean=filter.get("show_mean"), | |||
meanRetentionCalculation=filter.get("mean_retention_calculation"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the conversion in this method should do the same thing that we do frontend side i.e. in frontend/src/queries/nodes/InsightQuery/utils/filtersToQueryNode.ts. here you had this code:
meanRetentionCalculation: filters.mean_retention_calculation || (typeof filters.show_mean === 'boolean' ? (filters.show_mean ? 'simple' : 'none') : 'simple'),
it doesn't really matter in this case, as there shouldn't be legacy insights with a show_mean filter. just commenting this for completeness.
export function parseDraftQueryFromLocalStorage( | ||
query: string | ||
): { query: Node<Record<string, any>>; timestamp: number } | null { | ||
function parseAndMigrateQuery<T>(query: string): T | null { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added to the notebooks migration https://github.com/PostHog/posthog/blob/master/frontend/src/scenes/notebooks/Notebook/migrations/migrate.ts#L44-L56 as well. Ideally there should also be a unit test.
Migrate the showMean boolean field to meanRetentionCalculation string field in retention insights. | ||
""" | ||
retention_insights = Insight.objects.filter( | ||
deleted=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to add deleted=False
, as that's the default for the InsightsManager. I'd argue that we want to migrate delete insights as well though, and thus we should use objects_including_soft_deleted
.
if live_run: | ||
with transaction.atomic(): | ||
# Convert boolean to string - if True, use 'simple' else 'none' | ||
insight.query["source"]["retentionFilter"]["meanRetentionCalculation"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we default to show the mean retention now, the field should only be None
if showMean
is explicitly False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work @Sriram-bk ! I left a couple of comments inline, but the PR should be safe to merge in the sense that the insights won't break. In the edge cases users have to configure the show mean setting again, which is acceptable I think.
2605cf4
to
b87d496
Compare
Fixed another test
Update UI snapshots for `chromium` (1)
5ef7fe3
to
8617865
Compare
Problem
This PR fixes the issues with the migration in this PR. The issues with the migration caused dashboards for certain customers to fail.
The original PR added the functionality to allow our customers to get a weighted mean for their retention insights. It also fixes a bug where cohort sizes of 0 were being included in the mean calculations.
All the changes from the original PR are included in this one since it was reverted.
Closes #25998
Closes #26217
Changes
No mean
Simple Mean
Weighted Mean
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?
Tested locally.