Skip to content

tech draft: Telemetry PRD#8973

Draft
royendo wants to merge 1 commit intomainfrom
tech-draft-telemetry
Draft

tech draft: Telemetry PRD#8973
royendo wants to merge 1 commit intomainfrom
tech-draft-telemetry

Conversation

@royendo
Copy link
Contributor

@royendo royendo commented Mar 4, 2026

Tech draft outlining a rehaul of our telemetry, this will give us more insight into what our developers are clicking, or not, in the RD UI.

PM-107: [Updated] Improve Internal Metrics Gathering for Rill

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@royendo royendo requested a review from begelundmuller March 4, 2026 15:23
Comment on lines +38 to +43
Current common fields per event:
```
app_name, install_id, client_id, build_id, version, is_dev,
project_id, user_id, organization_id (cloud only),
analytics_enabled, mode, service_name
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also event_id, event_time, event_type, event_name, service_name, service_version. See here:

type Event struct {

1. **No connector funnel tracking** — we track `source-add` and `source-cancel` but not `connector_selected`, `connector_form_started`, `connector_form_submitted`, or `connector_add_error`
2. **No model/SQL editor events** — zero coverage
3. **No dashboard interaction events** — no filter, time range, drill, export, share tracking
4. **No AI feature events** — only PostHog autocapture, no structured custom events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's several events on the backend for this: ai_message, ai_completion, ai_generated_metrics_view_yaml, ai_generated_canvas_dashboard_yaml

5. **No deploy funnel granularity** — just `deploy-intent` and `deploy-success`, missing org selection, naming, errors
6. **No cloud admin events** — member management, access policies, embed config are untracked
7. **No navigation/UI engagement events** — modals, tabs, tooltips, onboarding, empty state CTAs
8. **Missing global properties** — no `session_id`, `anonymous_id`, `rd_project_id` / `rc_project_id` distinction, `environment`, `platform`, `current_page`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install_id is kind a session/anonymous ID. is_dev is a (bad) proxy for environment.


### Decision: Extend the Rill Custom Telemetry Pipeline

**Recommendation:** Route all new events through the existing Rill custom telemetry system (`MetricsService` → Intake API / Kafka), not PostHog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Although it's called TelemetryService:

service TelemetryService {

│ MetricsService.dispatch() │
│ │ │
│ ├──► RillIntakeClient (web-local) ──► /local/track │
│ │ ──► CLI proxy ──► intake.rilldata.io │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be refactored to also use the admin server's TelemetryService (which then passes it to Kafka). This would mean all events end up on the same Kafka topic.

Only reason we haven't done already is since it means breaking/migrating existing pipelines.

Comment on lines +512 to +513
| `model_created` | File creation handler | `model_type` |
| `model_saved` | File save handler | `model_type`, `has_error` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have lots of resource types, so perhaps a file_write event with properties like resource_type=model.

Also, Rill Developer saves keystroke-by-keystroke, so there needs to be some protection to avoid 1000s of events when editing a file.

Comment on lines +514 to +517
| `model_run` | Run/refresh button handler | `model_type` |
| `model_run_success` | Run success callback | `row_count`, `duration_ms` |
| `model_run_error` | Run error callback | `error_code` |
| `model_deleted` | File delete confirm handler | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be generic reconcile events emitted from the backend. Note that they can also be very chatty/high-volume during keystroke-by-keystroke editing.

| `model_run_success` | Run success callback | `row_count`, `duration_ms` |
| `model_run_error` | Run error callback | `error_code` |
| `model_deleted` | File delete confirm handler | — |
| `sql_editor_opened` | Editor pane mount | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic page_view event with a route property? Won't repeat these comments everywhere, but I think it's worth considering if a generic name makes sense for many of the events listed in this doc.

**Yes.** PostHog JS SDK is initialized in both `web-local` and `web-admin` via `initPosthog()` in `web-common/src/lib/analytics/posthog.ts`. It currently handles autocapture, session recording, and heatmaps. **New custom events will NOT go through PostHog** — they route through the Rill custom telemetry pipeline. PostHog continues running as-is for autocapture/recording until a future sunset effort replaces those capabilities.

### Are `rd_project_id` and `rc_project_id` stable identifiers?
**Partially.** Cloud `project_id` is a stable UUID from the admin database. Local `project_id` is currently an **MD5 hash of the project directory name** — this is not ideal because renaming the folder changes the ID. Recommend either:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an MD5 hash?

Honestly I would probably just keep project_id blank until the first deploy (at which point we can infer it from Git remotes). For cloud editing, it would always be available. And if you really want names from as early as possible, then maybe have a separate directory_name on Rill Developer.

Comment on lines +690 to +691
### Server-side event forwarding from Go runtime?
**Not for this phase.** Keep CLI events in the existing `activity.Client` pipeline. The frontend covers all UI interactions. Since both frontend and CLI events now flow through the same Rill pipeline (intake API / Kafka → warehouse), unified funnel analysis (e.g. `rill deploy` CLI → cloud dashboard view) is possible at the warehouse/query layer without any additional plumbing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I presume this doc is a breaking change to our pipelines, I definitely want to take this chance to simplify the backend flows as I outlined in some of my earlier comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants