posthog-destination: fix an issue with the tracking of anonymous users #1179
+36
−49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, it seems to us that there are a few issues with the tracking of anonymous users. First I want to specify that the term "anonymous" is overloaded here. There are 3 categories of events, and I'll specify some terms I'll refer to them by to keep them straight:
Anonymous/Unidentified
- Events that occur before a user has identified themselves, which should have some sort ofanonymousId
associated with them. Importantly, we do intend to merge these events into an identified user/"Person" at some point.Identified
- Events that occur after a user has identified themselves.Anonymous/Anonymous
- Posthog "anonymous" events which are intended never to have identifiable information associated with them. Importantly, these events should never be merged into a "Person" record.When
Enable Anonymous User Profiles
Is EnabledWhen this setting is enabled, over multiple sessions where the user starts as
Anonymous/Unidentified
and then becomesIdentified
, currently, you'll end up with duplicate "person" records in Posthog for that user. While there is an alias event that connects the two of them together, they don't get merged into a single record as they should properly be. Therefore, overtime you accumulate a large set of distinct "Person" records in Posthog all containing theAnonymous/Unidentified
event history of a user prior to identification, with aliases back to the true identity with the actualIdentified
events.By default, all Posthog events are associated with a person profile. Therefore,
identify
should not be called for aAnonymous/Unidentified
user, but rather only called for the actualIdentified
user and then immediately aliased to theAnonymous/Unidentified
users' previousdistinctId
. This merges all the users' data (bothAnonymous/Unidentified
&Identified
) into a single record.When
Enable Anonymous User Profiles
Is DisabledIf you disable this setting, no events are sent to posthog at all, whereas what I would expect would happen is to have
Anonymous/Anonymous
events sent to Posthog.