Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add weighted mean with the migration fixes #28458

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

Sriram-bk
Copy link
Contributor

Problem

This PR fixes the issues with the migration in this PR. The issues with the migration caused dashboards for certain customers to fail.

The original PR added the functionality to allow our customers to get a weighted mean for their retention insights. It also fixes a bug where cohort sizes of 0 were being included in the mean calculations.

All the changes from the original PR are included in this one since it was reverted.

Closes #25998
Closes #26217

Changes

No mean

Screenshot 2025-02-04 at 1 06 55 PM

Simple Mean

Screenshot 2025-02-04 at 12 56 18 PM 6-bea6-19c71efdaad4" />

Weighted Mean

Screenshot 2025-02-04 at 12 56 39 PM

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Tested locally.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds weighted mean calculation functionality for retention insights and fixes migration issues that caused dashboard failures. Here's a summary of the key changes:

  • Added RetentionMeanDropdown component with options for 'none', 'simple', and 'weighted' mean calculations in /frontend/src/scenes/insights/filters/RetentionMeanDropdown.tsx
  • Modified schema to change showMean from boolean to string enum ('none', 'simple', 'weighted') in /frontend/src/queries/schema.json
  • Added migration script in /posthog/migrations/0560_migrate_retention_show_mean.py to convert existing boolean values to string format
  • Fixed bug in RetentionTable.tsx to exclude cohorts with 0 users from mean calculations
  • Updated test files to reflect new string-based mean type instead of boolean

The changes appear well-structured and address both the weighted mean feature request and the zero-user cohort bug while ensuring backward compatibility.

💡 (5/5) You can turn off certain types of comments like style here!

17 file(s) reviewed, 5 comment(s)
Edit PR Review Bot Settings | Greptile

const { retentionFilter } = useValues(insightVizDataLogic(insightProps))
const { updateInsightFilter } = useActions(insightVizDataLogic(insightProps))

const showMean = retentionFilter?.showMean || RETENTION_MEAN_NONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Default fallback to RETENTION_MEAN_NONE is correct but could be more explicit with nullish coalescing (??) instead of logical OR (||) to handle edge cases where showMean is intentionally set to false

Comment on lines 26 to 27
onChange={(showMean) => {
updateInsightFilter({ showMean })
}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider adding type safety by explicitly typing the onChange parameter as RetentionMeanType

Suggested change
onChange={(showMean) => {
updateInsightFilter({ showMean })
}}
onChange={(showMean: RetentionMeanType) => {
updateInsightFilter({ showMean })
}}

Comment on lines 62 to 69
const weights = validRows.map((row) =>
parseInt(row[1]?.toString() || '0')
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: parseInt without radix parameter could lead to unexpected results with certain string formats. Consider using parseInt(x, 10) for explicit base-10 parsing.

Suggested change
const weights = validRows.map((row) =>
parseInt(row[1]?.toString() || '0')
)
const weights = validRows.map((row) =>
parseInt(row[1]?.toString() || '0', 10)
)

Comment on lines 53 to 62
return !(
(columnIndex >= row.length - 1 && isLatestPeriod) ||
!row[columnIndex] ||
row[columnIndex].count <= 0
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The row filtering logic is duplicated between weighted and simple mean calculations. Consider extracting this into a shared helper function.

)

for insight in retention_insights.iterator(chunk_size=100):
show_mean_value = insight.query["source"]["retentionFilter"]["showMean"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Accessing nested dictionary keys without .get() could raise KeyError if structure is malformed

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

12 snapshot changes in total. 0 added, 12 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Contributor

github-actions bot commented Feb 7, 2025

Size Change: +441 B (0%)

Total Size: 9.73 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 9.73 MB +441 B (0%)

compressed-size-action

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

32 snapshot changes in total. 0 added, 32 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from a594a3b to 8e50320 Compare February 8, 2025 05:33
@Sriram-bk Sriram-bk requested review from a team and aspicer February 8, 2025 05:38
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

22 snapshot changes in total. 0 added, 22 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

10 snapshot changes in total. 0 added, 10 modified, 0 deleted:

  • chromium: 0 added, 10 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from 599b8e0 to 537f6e6 Compare February 10, 2025 18:25
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@aspicer
Copy link
Contributor

aspicer commented Feb 11, 2025

I tried testing this locally, but if you have a saved insight that has 'showMean' set to true, I'm not sure that it ends up being properly set to 'simple' once you switch to this branch.

I don't want to give bad advice here - @thmsobrmlr is there a place where you think it would best to have that sort of logic at the current time?

@Sriram-bk
Copy link
Contributor Author

I tried testing this locally, but if you have a saved insight that has 'showMean' set to true, I'm not sure that it ends up being properly set to 'simple' once you switch to this branch.

@aspicer That behavior was expected. Since it's just a config value, I thought this would be okay in the brief time we do the migration to port the values. The only impact it would have on our users would be to set the mean retention once again. After which, the setting would be correctly set on the showMeanRetention attribute if they were to toggle it.

I didn't want to allow the value of showMean to override the value of showMeanRetention because if a user were to change the showMeanRetention value using the updated UI they wouldn't see the correct data due to the override, which I think would be more confusing.

I do agree that this isn't ideal, but it felt the least confusing/annoying. If you or @thmsobrmlr think some other approach would be better, more than happy to make any changes.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from 654be72 to d2c92b1 Compare February 13, 2025 07:59
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

12 snapshot changes in total. 0 added, 12 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from 58fd242 to e1df09c Compare February 18, 2025 20:41
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

8 snapshot changes in total. 0 added, 8 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@Sriram-bk Sriram-bk requested a review from thmsobrmlr February 18, 2025 21:04
@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from 313c7f8 to 2dc83f7 Compare February 18, 2025 21:06
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

30 snapshot changes in total. 0 added, 30 modified, 0 deleted:

  • chromium: 0 added, 22 modified, 0 deleted (wasn't pushed!)
  • webkit: 0 added, 8 modified, 0 deleted (wasn't pushed!)

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

30 snapshot changes in total. 0 added, 30 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from e29f25e to a9a27b2 Compare February 18, 2025 23:14
@anirudhpillai
Copy link
Contributor

Just cross posting, I think it's easier + makes sense to just default to showing the mean.
https://posthog.slack.com/archives/C0368RPHLQH/p1739870843981749

It makes sense to always show the mean and the user can hide it if they don't want it.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from f724c9b to 424b98f Compare February 20, 2025 18:49
Copy link
Contributor

@thmsobrmlr thmsobrmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work @Sriram-bk ! I left a couple of comments inline, but the PR should be safe to merge in the sense that the insights won't break. In the edge cases users have to configure the show mean setting again, which is acceptable I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification: This function is used for migrating filter-based insights to query-based ones. We've already converted all existing insights a couple of weeks ago. Only people who use our API directly can still generate these insights. Today we have about 130 filter-based insights and only two of these are retention ones.

Since the retention mean feature was never available with filter based insights, I imagine we don't need to add any handling for this legacy use case. If we were to do so however, it would only work if we also added a mixin here https://github.com/PostHog/posthog/blob/master/posthog/models/filters/filter.py, as otherwise the filter will get lost when serializing. We'd probably also want to handle both showMean and showMeanRetention then.

@@ -669,7 +669,7 @@ describe('filtersToQueryNode', () => {
returning_entity: { id: '1' },
target_entity: { id: '1' },
period: RetentionPeriod.Day,
show_mean: true,
mean_retention_calculation: 'simple',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be better to test that a show_mean filter gets converted to a meanRetentionCalculation field on the query. legacy queries exist with legacy properties.

@@ -539,6 +539,7 @@ def _insight_filter(filter: dict, allow_variables: bool = False):
),
period=filter.get("period"),
showMean=filter.get("show_mean"),
meanRetentionCalculation=filter.get("mean_retention_calculation"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the conversion in this method should do the same thing that we do frontend side i.e. in frontend/src/queries/nodes/InsightQuery/utils/filtersToQueryNode.ts. here you had this code:

 meanRetentionCalculation: filters.mean_retention_calculation || (typeof filters.show_mean === 'boolean' ? (filters.show_mean ? 'simple' : 'none') : 'simple'),

it doesn't really matter in this case, as there shouldn't be legacy insights with a show_mean filter. just commenting this for completeness.

export function parseDraftQueryFromLocalStorage(
query: string
): { query: Node<Record<string, any>>; timestamp: number } | null {
function parseAndMigrateQuery<T>(query: string): T | null {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added to the notebooks migration https://github.com/PostHog/posthog/blob/master/frontend/src/scenes/notebooks/Notebook/migrations/migrate.ts#L44-L56 as well. Ideally there should also be a unit test.

Migrate the showMean boolean field to meanRetentionCalculation string field in retention insights.
"""
retention_insights = Insight.objects.filter(
deleted=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to add deleted=False, as that's the default for the InsightsManager. I'd argue that we want to migrate delete insights as well though, and thus we should use objects_including_soft_deleted.

if live_run:
with transaction.atomic():
# Convert boolean to string - if True, use 'simple' else 'none'
insight.query["source"]["retentionFilter"]["meanRetentionCalculation"] = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we default to show the mean retention now, the field should only be None if showMean is explicitly False.

Copy link
Contributor

@thmsobrmlr thmsobrmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work @Sriram-bk ! I left a couple of comments inline, but the PR should be safe to merge in the sense that the insights won't break. In the edge cases users have to configure the show mean setting again, which is acceptable I think.

@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch 2 times, most recently from 2605cf4 to b87d496 Compare February 25, 2025 18:13
@Sriram-bk Sriram-bk force-pushed the sri/retention-add-weighted-mean-migration-fixed branch from 5ef7fe3 to 8617865 Compare February 28, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants