You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: products/llm_analytics/backend/api/otel/README.md
+96-17Lines changed: 96 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,9 +124,15 @@ Determines event type based on span characteristics:
124
124
-`$ai_trace`: Root spans (no parent) for v2 frameworks
125
125
-`$ai_span`: All other spans, including root spans from v1 frameworks
126
126
127
-
**v1 Detection**: Checks for `prompt` or `completion` attributes OR framework scope name (e.g., `@mastra/otel`). v1 spans bypass the event merger.
127
+
**Pattern Detection**: Uses `OtelInstrumentationPattern` enum to determine routing:
128
128
129
-
**Event Type Logic**: For v1 frameworks like Mastra, root spans are marked as `$ai_span` (not `$ai_trace`) to ensure they appear in the tree hierarchy. This is necessary because `TraceQueryRunner` filters out `$ai_trace` events from the events array.
129
+
1. Provider declares pattern via `get_instrumentation_pattern()` (most reliable)
130
+
2. Span has `prompt` or `completion` attributes (indicates V1 data present)
131
+
3. Default to V2 (safer - waits for logs rather than sending incomplete)
132
+
133
+
V1 spans bypass the event merger and are sent immediately.
134
+
135
+
**Event Type Logic**: For V1 frameworks, root spans are marked as `$ai_span` (not `$ai_trace`) to ensure they appear in the tree hierarchy. This is necessary because `TraceQueryRunner` filters out `$ai_trace` events from the events array.
**posthog_native.py**: Extracts PostHog-specific attributes prefixed with `posthog.ai.*`. These take precedence in the waterfall.
155
161
156
-
**genai.py**: Extracts OpenTelemetry GenAI semantic convention attributes (`gen_ai.*`). Handles indexed message fields by collecting attributes like `gen_ai.prompt.0.role` into structured message arrays. Supports provider-specific transformations for frameworks that use custom OTEL formats.
162
+
**genai.py**: Extracts OpenTelemetry GenAI semantic convention attributes (`gen_ai.*`). Handles indexed message fields by collecting attributes like `gen_ai.prompt.0.role` into structured message arrays. Provides `detect_provider()` function for centralized provider detection. Supports provider-specific transformations for frameworks that use custom OTEL formats.
157
163
158
164
**providers/**: Framework-specific transformers for handling custom OTEL formats:
159
165
160
-
-**base.py**: Abstract base class defining the provider transformer interface (`can_handle()`, `transform_prompt()`, `transform_completion()`)
161
-
-**mastra.py**: Transforms Mastra's wrapped message format (e.g., `{"messages": [...]}` for input, `{"text": "...", "files": [], ...}` for output) into standard PostHog format. Detected by instrumentation scope name `@mastra/otel`.
166
+
-**base.py**: Abstract base class defining the provider transformer interface:
167
+
-`can_handle()`: Detect if transformer handles this span
168
+
-`transform_prompt()`: Transform provider-specific prompt format
169
+
-`transform_completion()`: Transform provider-specific completion format
170
+
-`get_instrumentation_pattern()`: Declare V1 or V2 pattern (returns `OtelInstrumentationPattern` enum)
171
+
-**mastra.py**: Transforms Mastra's wrapped message format (e.g., `{"messages": [...]}` for input, `{"text": "...", "files": [], ...}` for output) into standard PostHog format. Detected by instrumentation scope name `@mastra/otel`. Declares `V1_ATTRIBUTES` pattern.
162
172
163
173
## Event Schema
164
174
@@ -195,13 +205,13 @@ All events conform to the PostHog LLM Analytics schema:
@@ -252,24 +262,25 @@ The merger returns None on first arrival rather than blocking. This prevents the
252
262
253
263
v2 can send multiple log events in a single HTTP request. The ingestion layer groups these by (trace_id, span_id) and accumulates their properties before calling the merger. This prevents race conditions where partial log data gets merged before all logs arrive.
254
264
255
-
### v1/v2 Detection
265
+
### Pattern Detection via Provider Transformers
256
266
257
-
Rather than requiring explicit configuration, the transformer auto-detects instrumentation version by:
267
+
Rather than hardcoding framework names, the transformer uses a layered detection approach:
258
268
259
-
1. Checking for `prompt` or `completion` attributes (after extraction)
260
-
2. Detecting framework via instrumentation scope name (e.g., `@mastra/otel`)
2.**Content detection** (fallback): Span has `prompt` or `completion` attributes after extraction
271
+
3.**Safe default**: Unknown providers default to V2 (waits for logs rather than sending incomplete events)
261
272
262
-
This allows both patterns to coexist without configuration, and supports frameworks that don't follow standard attribute conventions.
273
+
This allows both patterns to coexist without configuration, and new providers only need to declare their pattern in one place.
263
274
264
275
### Provider Transformers
265
276
266
277
Some frameworks (like Mastra) wrap OTEL data in custom structures that don't match standard GenAI conventions. Provider transformers detect these frameworks (via instrumentation scope or attribute prefixes) and unwrap their data into standard format. This keeps framework-specific logic isolated while maintaining compatibility with the core transformer pipeline.
267
278
268
279
**Example**: Mastra wraps prompts as `{"messages": [{"role": "user", "content": [...]}]}` where content is an array of `{"type": "text", "text": "..."}` objects. The Mastra transformer unwraps this into standard `[{"role": "user", "content": "..."}]` format.
269
280
270
-
### Event Type Determination for v1 Frameworks
281
+
### Event Type Determination for V1 Frameworks
271
282
272
-
v1 frameworks create root spans that should appear in the tree hierarchy alongside their children. These root spans are marked as `$ai_span` (not `$ai_trace`) because `TraceQueryRunner` filters out `$ai_trace` events from the events array. This ensures v1 framework traces display correctly with proper parent-child relationships in the UI.
283
+
V1 frameworks create root spans that should appear in the tree hierarchy alongside their children. The `determine_event_type()` function checks `provider.get_instrumentation_pattern()` and marks V1 root spans as `$ai_span` (not `$ai_trace`) because `TraceQueryRunner` filters out `$ai_trace` events from the events array. This ensures V1 framework traces display correctly with proper parent-child relationships in the UI.
273
284
274
285
### TTL-Based Cleanup
275
286
@@ -282,8 +293,9 @@ The event merger uses 60-second TTL on cache entries. This automatically cleans
282
293
Create a new transformer in `conventions/providers/`:
283
294
284
295
```python
285
-
from .base import ProviderTransformer
296
+
from .base importOtelInstrumentationPattern, ProviderTransformer
"""Declare V1 or V2 pattern - determines event routing."""
310
+
# V1: All data in span attributes, send immediately
311
+
# V2: Metadata in spans, content in logs, requires merge
312
+
return OtelInstrumentationPattern.V1_ATTRIBUTES
313
+
296
314
deftransform_prompt(self, prompt: Any) -> Any:
297
315
"""Transform wrapped prompt format to standard."""
298
316
ifnotisinstance(prompt, str):
@@ -369,6 +387,67 @@ Extend `build_event_properties()` in `transformer.py` to map additional attribut
369
387
-**Memory**: Redis cache bounded by TTL (60s max retention)
370
388
-**Concurrency**: Simple Redis operations enable fast merging with minimal race condition risk
371
389
390
+
## Provider Reference
391
+
392
+
Different LLM frameworks implement OTEL instrumentation with their own nuances. This section documents known provider behaviors to help understand what to expect from each.
393
+
394
+
### Mastra (`@mastra/otel`)
395
+
396
+
**Detection**: Instrumentation scope name `@mastra/otel` or `mastra.*` attribute prefix
397
+
398
+
**OTEL Pattern**: `V1_ATTRIBUTES` (all data in span attributes)
399
+
400
+
**Key Behaviors**:
401
+
402
+
-**No conversation history accumulation**: Each `agent.generate()` call creates a separate, independent trace. The `gen_ai.prompt` only contains that specific call's input (typically system message + current user message), not the accumulated conversation history from previous turns.
403
+
-**Wrapped message format**: Prompts are JSON-wrapped as `{"messages": [{"role": "user", "content": [{"type": "text", "text": "..."}]}]}` where content is an array of typed objects.
404
+
-**Wrapped completion format**: Completions are JSON-wrapped as `{"text": "...", "files": [], "warnings": [], ...}`.
405
+
-**Multi-turn traces**: In a multi-turn conversation, you'll see multiple separate traces (one per `agent.generate()` call), each showing only that turn's input/output.
406
+
407
+
**Implications for PostHog**:
408
+
409
+
- Each turn appears as a separate trace in LLM Analytics
410
+
- To see full conversation context, users need to look at the sequence of traces
411
+
- The Mastra transformer unwraps the custom JSON format into standard PostHog message arrays
412
+
413
+
**Example**: A 4-turn conversation produces 4 traces, where turn 4's input only shows "Thanks, bye!" (not the previous greeting, weather query, and joke request).
**Detection**: Spans without prompt/completion attributes, accompanied by OTEL log events (no custom provider transformer needed - detected by absence of V1 content)
0 commit comments