Skip to content

Conversation

ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Oct 3, 2025

Export Per-Record Metrics

Scripts

examples/parse_profile_export.py

Purpose: Load and display profile_export.jsonl files using AIPerf's native Pydantic models.

Example Artifacts

The artifacts/run1/directory contains sample output files from an actual AIPerf run.

profile_export.jsonl

Format: JSON Lines (one record per line)
Purpose: Per-request metric data with full detail

Each line contains:

  • Request metadata (worker ID, timestamps, conversation ID)
  • Metrics for that specific request (latency, tokens, etc.)
  • Error information (if request failed)

Data Models

AIPerf uses Pydantic models for type-safe data handling:

from aiperf.common.models import (
    MetricRecordInfo,
    MetricRecordMetadata,
    MetricValue,
    ErrorDetails,
)

Model Definitions:

Example: Successful Request

{
  "metadata": {
    "x_request_id": "dc9ffe9b-0f48-4863-9db7-e106ea64d094",
    "x_correlation_id": "6f24617d-f29b-4399-af87-b98a61fe43f7",
    "conversation_id": "af0041e4-aec8-412b-82a5-e6e2455b892a",
    "turn_index": 0,
    "timestamp_ns": 1759522419158355424,
    "worker_id": "worker_aec1f387",
    "record_processor_id": "record_processor_5cdb3e92",
    "credit_phase": "profiling"
  },
  "metrics": {
    "input_sequence_length": {"value": 550, "unit": "tokens"},
    "ttft": {"value": 3340.543979, "unit": "ms"},
    "request_count": {"value": 1, "unit": "requests"},
    "request_latency": {"value": 8214.202336, "unit": "ms"},
    "min_request_timestamp": {"value": 1759522419158355424, "unit": "ns"},
    "output_token_count": {"value": 62, "unit": "tokens"},
    "reasoning_token_count": {"value": 131, "unit": "tokens"},
    "ttst": {"value": 139.01211999999998, "unit": "ms"},
    "inter_chunk_latency": {
      "value": [139.01211999999998,137.504221,138.46932999999999,45.724841,22.820729,22.354865999999998,24.233856,21.191751,21.803236,21.112617,22.138389999999998,22.290063,21.568081,21.639267999999998,21.043643,21.699054999999998,21.465737,21.357903,23.664227,19.843211,21.429326,20.807807999999998,22.244322999999998,22.980819999999998,21.714579999999998,20.311258,22.320152,20.716417,21.489044,23.120455999999997,21.194105,21.747794,21.775451,22.772171,21.177619999999997,21.435968,22.72408,22.319703999999998,23.697609999999997,22.692925,24.573838,24.935859999999998,26.220257,31.696505,27.352555,24.474261,30.586574,22.706429,27.079940999999998,21.097013,21.312921,19.886108,21.975094,25.711636,23.944499999999998,22.047128,19.041073,25.347305,21.117617,20.374716,21.078395999999998,22.556409,21.256626,22.730458,20.697526999999997,24.304420999999998,19.036089999999998,22.208375999999998,21.108458,22.866515,26.124654,19.439919,24.660149,24.480218999999998,22.055654999999998,24.99418,18.583989,23.828048,22.653662999999998,20.263586,22.421452,22.796861,23.564021,22.431328,22.228718999999998,21.330883,21.859503,22.474016,22.873683,22.787454999999998,22.573733999999998,21.460922,22.424144,22.442114999999998,23.179195,22.802578999999998,22.545786,22.882702,23.31232,23.126859,21.893006,23.557437999999998,22.776183,22.061291999999998,22.107775999999998,22.255364,22.322226999999998,24.980131999999998,21.467501,21.797259999999998,23.437003999999998,23.993665999999997,22.305018999999998,23.036853999999998,22.524950999999998,22.406306,22.918474,22.922335999999998,21.904897,21.565794,23.226157999999998,23.259197999999998,23.434093999999998,21.758516999999998,22.842456,22.888417999999998,21.407372,22.814517,22.408683,22.539944,160.85134,23.339579999999998,22.765987,22.429622,22.025340999999997,22.615395,22.957057,23.911932,22.003268,22.03979,23.155224,22.854999,23.844901,23.013745,22.209705,23.692,22.305362,22.788823,24.418011999999997,21.410004,23.309737,22.293789,24.580631999999998,21.682264,22.708857,22.872097,23.393947,23.339647,23.22015,24.162468999999998,22.352579,24.407208999999998,23.268773,23.927725,23.887949,22.625245,23.777784999999998,23.140172999999997,22.655293,23.344238999999998,23.776709999999998,22.741847,24.011459,22.901256,24.119477999999997,21.972716,23.987218,22.558432999999997,22.693851,22.350789,23.023360999999998,22.424141,22.478153,26.871579999999998,23.642765999999998,19.764049999999997,23.363139,22.169117999999997,23.956905,21.800013999999997,22.825948999999998,24.294238],
      "unit": "ms"
    },
    "output_sequence_length": {"value": 193, "unit": "tokens"},
    "max_response_timestamp": {"value": 1759522427372557760, "unit": "ns"},
    "inter_token_latency": {"value": 25.383637276041668, "unit": "ms"},
    "output_token_throughput_per_user": {"value": 39.395457361969534, "unit": "tokens/sec/user"}
  },
  "error": null
}

Example: Failed Request

{
  "metadata": {
    "x_request_id": "c087da26-5552-4785-958a-cad75c7b22de",
    "x_correlation_id": "51a3af05-bcd3-4ad6-ab75-3eec065ffe6c",
    "conversation_id": "82c50e66-608b-4cc1-bcab-a71090077120",
    "turn_index": 0,
    "timestamp_ns": 1759522419154829652,
    "worker_id": "worker_1d62deaa",
    "record_processor_id": "record_processor_ef1716ef",
    "credit_phase": "profiling"
  },
  "metrics": {
    "error_request_count": {"value": 1, "unit": "requests"}
  },
  "error": {
    "code": 499,
    "type": "RequestCancellationError",
    "message": "Request was cancelled after 0.000 seconds"
  }
}

Summary by CodeRabbit

  • New Features

    • Per-record JSONL profile export with selectable export levels and richer per-request metadata.
    • New error-focused metrics and totals for input sequence lengths.
    • Credit numbering across timing flows for better traceability.
  • Changes

    • Default profile export filename now uses .jsonl.
    • Per-record exports hide certain aggregate/internal metrics by default.
    • Input token counts are now recorded for error/invalid requests.
  • Documentation

    • Added "Working with Benchmark Data" tutorial and README link.
  • Tests

    • Extensive new and updated tests for exports, processors, parsing, metrics, and timings.

Copy link

coderabbitai bot commented Oct 3, 2025

Warning

Rate limit exceeded

@ajcasagrande has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between e51cbba and 22ccc4b.

📒 Files selected for processing (1)
  • aiperf/common/models/base_models.py (1 hunks)

Walkthrough

Adds per-record export support, new ExportLevel enum and record models, refactors MetricRecordsMessage to carry metadata, updates processors/protocols to accept messages, implements RecordExportResultsProcessor (JSONL), propagates credit_num through timing/workers, updates metrics flags and token-counting parser, and expands tests/docs.

Changes

Cohort / File(s) Summary
Config & output
aiperf/common/config/config_defaults.py, aiperf/common/config/output_config.py, aiperf/common/config/__init__.py
Add ExportLevel import and EXPORT_LEVEL default; change PROFILE_EXPORT_FILE to .jsonl; expose profile_export_file and export_level on OutputConfig; export parse_str_as_numeric_dict.
Enums
aiperf/common/enums/__init__.py, aiperf/common/enums/data_exporter_enums.py, aiperf/common/enums/metric_enums.py, aiperf/common/enums/post_processor_enums.py
Introduce ExportLevel enum and re-export it; remove MetricDateTimeUnit; add MetricFlags.NO_INDIVIDUAL_RECORDS; add ResultsProcessorType.RECORD_EXPORT; adjust public exports.
Models & base
aiperf/common/models/record_models.py, aiperf/common/models/__init__.py, aiperf/common/models/base_models.py
Add MetricValue, MetricRecordMetadata, MetricRecordInfo; add RequestRecord.credit_num; re-export new models; set AIPerfBaseModel config to allow extra fields.
Messages
aiperf/common/messages/inference_messages.py, aiperf/common/messages/credit_messages.py
MetricRecordsMessage now has metadata: MetricRecordMetadata and removes flat header fields; CreditDropMessage adds credit_num and refines phase description.
Protocols & processors
aiperf/common/protocols.py, aiperf/post_processors/metric_results_processor.py, aiperf/post_processors/record_export_results_processor.py, aiperf/post_processors/__init__.py, aiperf/post_processors/base_metrics_processor.py
ResultsProcessorProtocol.process_result now accepts MetricRecordsMessage; refactor MetricResultsProcessor to iterate message.results; add RecordExportResultsProcessor (JSONL, gated by ExportLevel) and export it; use applicable_tags in setup.
Records pipeline & services
aiperf/records/records_manager.py, aiperf/records/record_processor_service.py
Wrap processor creation to handle PostProcessorDisabled; convert and operate on MetricRecordDict instances; build MetricRecordMetadata via perf→wall-clock conversion and send MetricRecordsMessage to processors.
Metrics core & types
aiperf/metrics/metric_dicts.py, aiperf/metrics/metric_registry.py, aiperf/metrics/types/*
Add MetricRecordDict.to_display_dict honoring flags and display units; allow NONE flags in tags_applicable_to; remove date-time display units; add NO_INDIVIDUAL_RECORDS to several metrics; add error-specific ISL metrics and totals; adjust request count display order/flags.
Timing & workers
aiperf/timing/credit_manager.py, aiperf/timing/timing_manager.py, aiperf/timing/fixed_schedule_strategy.py, aiperf/timing/request_rate_strategy.py, aiperf/workers/worker.py
Add credit_num parameter to drop_credit and propagate it; strategies pass phase_stats.sent as credit_num and increment after; worker helper methods made keyword-only and assign record.credit_num.
Parser
aiperf/parsers/inference_result_parser.py
get_tokenizer on-demand; compute and attach input_token_count for error/invalid paths; compute_input_token_count obtains tokenizer internally (signature changed).
Utilities & exceptions
aiperf/common/utils.py, aiperf/common/exceptions.py
Add compute_time_ns helper; add PostProcessorDisabled exception.
Docs
README.md, docs/tutorials/working-with-profile-exports.md
Add “Working with Benchmark Data / Profile Exports” tutorial and link; document JSONL profile exports and Pydantic models.
Tests: post_processors & records
tests/post_processors/*, tests/records/conftest.py
Add factories create_metric_metadata, create_metric_records_message; update processor tests to message-based flow; add tests for RecordExportResultsProcessor; add records fixtures/constants.
Tests: metrics & parser
tests/metrics/test_input_sequence_length_metric.py, tests/parsers/test_inference_result_parser.py
Add tests for error ISL metrics and totals; rework parser tests to assert input token counting on error/invalid paths; add error-related fixtures/helpers.
Tests: timing & workers
tests/timing_manager/conftest.py, tests/workers/test_worker.py
Update mocks and tests to pass credit_num through drop_credit and CreditDropMessage.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Strategy
  participant TimingMgr as TimingManager
  participant Worker as Worker
  participant RecordSvc as RecordProcessorService
  Note over Strategy,TimingMgr: pass credit_num
  Strategy->>TimingMgr: drop_credit(phase, credit_num=phase_stats.sent, ...)
  TimingMgr->>Worker: CreditDropMessage{phase, credit_num, ...}
  Worker->>Worker: _build_response_record(..., message=CreditDropMessage)
  Worker->>Worker: record.credit_num = message.credit_num
  Worker-->>RecordSvc: ParsedResponseRecord -> Record creation
Loading
sequenceDiagram
  autonumber
  participant RecordSvc as RecordProcessorService
  participant Utils as compute_time_ns
  participant Msg as MetricRecordsMessage
  participant ResultsProc as MetricResultsProcessor
  participant ExportProc as RecordExportResultsProcessor
  participant FS as FileSystem
  RecordSvc->>Utils: compute_time_ns(start_time, start_perf, perf_ns)
  Utils-->>RecordSvc: wall_clock_ns
  RecordSvc->>Msg: create MetricRecordsMessage(metadata, results, error)
  RecordSvc-->>ResultsProc: process_result(message)
  ResultsProc->>ResultsProc: iterate message.results -> MetricRecordDict
  ResultsProc-->>ExportProc: process_result(message)
  ExportProc->>ExportProc: to_display_dict(registry, show_internal?)
  alt display non-empty
    ExportProc->>FS: append JSONL line (async)
  else empty
    ExportProc-->>ExportProc: skip write
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

I thump my paw: new tracks to chart,
JSONL crumbs, a burrower’s art.
Metadata tucked in every line,
Credits counted, timestamps align.
Tokens hopped and exports shine. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the primary feature being introduced—exporting per‐record metrics to the profile_export.jsonl file—and directly reflects the main objective of the changeset. It is concise, clear, and specific enough for a teammate to understand the purpose of the pull request at a glance without extraneous detail. The use of “feat:” aligns with conventional commit styling but does not detract from readability. Overall, the title accurately summarizes the most important change.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

codecov bot commented Oct 3, 2025

@ajcasagrande ajcasagrande changed the title Ajc/per record metrics feat: Add per-record metrics export to profile_export.jsonl Oct 3, 2025
@github-actions github-actions bot added the feat label Oct 3, 2025
@ajcasagrande
Copy link
Contributor Author

@coderabbitai review

Copy link

coderabbitai bot commented Oct 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
aiperf/common/config/output_config.py (1)

48-50: Consider making export_level configurable.

The export_level property currently returns a hardcoded ExportLevel.RECORDS. While this aligns with the PR objectives and default configuration, consider whether this should be a configurable field (similar to profile_export_file) to allow users to select different export levels (SUMMARY, RECORDS, RAW) via CLI or config file.

If making this configurable, you could add:

+    export_level: Annotated[
+        ExportLevel,
+        Field(
+            description="The level of detail to export in the profile export file.",
+        ),
+        CLIParameter(
+            name=("--export-level",),
+            group=_CLI_GROUP,
+        ),
+    ] = OutputDefaults.EXPORT_LEVEL
+
-    @property
-    def export_level(self) -> ExportLevel:
-        return ExportLevel.RECORDS
tests/post_processors/conftest.py (1)

200-238: Stop passing unsupported error into MetricRecordMetadata

MetricRecordMetadata (see aiperf/common/models/record_models.py) no longer defines an error field, so forwarding error=... here is either ignored or triggers a validation error depending on the model config. It also makes the helper’s docstring lie about what callers get back. Please drop the parameter/forwarding so the helper reflects the actual metadata surface.

-def create_metric_metadata(
+def create_metric_metadata(
     conversation_id: str | None = None,
     turn_index: int = 0,
     timestamp_ns: int = 1_000_000_000,
     worker_id: str = "worker-1",
     record_processor_id: str = "processor-1",
     credit_phase: CreditPhase = CreditPhase.PROFILING,
     x_request_id: str | None = None,
-    x_correlation_id: str | None = None,
-    error: ErrorDetails | None = None,
+    x_correlation_id: str | None = None,
 ) -> MetricRecordMetadata:
@@
-        error: Error details if any
@@
-        x_correlation_id=x_correlation_id,
-        error=error,
+        x_correlation_id=x_correlation_id,
aiperf/post_processors/record_export_results_processor.py (1)

50-78: Consider batching writes for better performance.

The per-record file open/close pattern is safe but may become a bottleneck under high throughput. The broad exception catch (line 77) is appropriate here since it prevents a single malformed record from halting the entire export process.

For improved performance with high record counts, consider batching writes:

 async def process_result(self, message: MetricRecordsMessage) -> None:
+    json_lines = []
     record_dicts = [MetricRecordDict(result) for result in message.results]
     for record_dict in record_dicts:
         try:
             display_metrics = record_dict.to_display_dict(
                 MetricRegistry, self.show_internal
             )
             if not display_metrics:
                 continue
 
             record_info = MetricRecordInfo(
                 metadata=message.metadata,
                 metrics=display_metrics,
                 error=message.error,
             )
-            json_str = record_info.model_dump_json()
-
-            async with aiofiles.open(
-                self.output_file, mode="a", encoding="utf-8"
-            ) as f:
-                await f.write(json_str)
-                await f.write("\n")
+            json_lines.append(record_info.model_dump_json())
 
             self.record_count += 1
             if self.record_count % 100 == 0:
                 self.debug(f"Wrote {self.record_count} record metrics")
 
         except Exception as e:
             self.error(f"Failed to write record metrics: {e}")
+    
+    if json_lines:
+        async with aiofiles.open(
+            self.output_file, mode="a", encoding="utf-8"
+        ) as f:
+            await f.write("\n".join(json_lines) + "\n")
examples/parse_profile_export.py (1)

19-40: LGTM! Loader implementations are correct.

Both sync and async loaders properly parse JSONL files using Pydantic's model_validate_json. The conditional import of aiofiles in the async function is appropriate.

For more robust example code, consider adding error handling for malformed lines:

 def load_records(file_path: Path) -> list[MetricRecordInfo]:
     """Load profile_export.jsonl file into structured Pydantic models in sync mode."""
     records = []
     with open(file_path, encoding="utf-8") as f:
         for line in f:
             if line.strip():
-                record = MetricRecordInfo.model_validate_json(line)
-                records.append(record)
+                try:
+                    record = MetricRecordInfo.model_validate_json(line)
+                    records.append(record)
+                except Exception as e:
+                    print(f"Warning: Skipping malformed line: {e}")
     return records
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c414866 and d47cc26.

📒 Files selected for processing (26)
  • aiperf/common/config/config_defaults.py (2 hunks)
  • aiperf/common/config/output_config.py (2 hunks)
  • aiperf/common/enums/__init__.py (2 hunks)
  • aiperf/common/enums/data_exporter_enums.py (1 hunks)
  • aiperf/common/enums/metric_enums.py (1 hunks)
  • aiperf/common/enums/post_processor_enums.py (1 hunks)
  • aiperf/common/exceptions.py (1 hunks)
  • aiperf/common/messages/inference_messages.py (2 hunks)
  • aiperf/common/models/__init__.py (2 hunks)
  • aiperf/common/models/record_models.py (3 hunks)
  • aiperf/common/protocols.py (2 hunks)
  • aiperf/metrics/metric_dicts.py (2 hunks)
  • aiperf/metrics/types/max_response_metric.py (1 hunks)
  • aiperf/metrics/types/min_request_metric.py (1 hunks)
  • aiperf/post_processors/__init__.py (1 hunks)
  • aiperf/post_processors/metric_results_processor.py (2 hunks)
  • aiperf/post_processors/record_export_results_processor.py (1 hunks)
  • aiperf/records/record_processor_service.py (4 hunks)
  • aiperf/records/records_manager.py (6 hunks)
  • examples/README.md (1 hunks)
  • examples/parse_profile_export.py (1 hunks)
  • tests/post_processors/conftest.py (2 hunks)
  • tests/post_processors/test_metric_results_processor.py (4 hunks)
  • tests/post_processors/test_post_processor_integration.py (4 hunks)
  • tests/post_processors/test_record_export_results_processor.py (1 hunks)
  • tests/records/conftest.py (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
examples/README.md

90-90: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


125-125: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.13.2)
tests/post_processors/test_record_export_results_processor.py

98-98: Unused lambda argument: self

(ARG005)


205-205: Unused method argument: mock_metric_registry

(ARG002)


254-254: Unused method argument: mock_metric_registry

(ARG002)


279-279: Unused method argument: mock_metric_registry

(ARG002)


309-309: Unused method argument: sample_metric_records_message

(ARG002)


310-310: Unused method argument: mock_metric_registry

(ARG002)


363-363: Unused method argument: mock_metric_registry

(ARG002)


398-398: Unused method argument: mock_metric_registry

(ARG002)


447-447: Unused method argument: mock_metric_registry

(ARG002)


486-486: Unused method argument: mock_metric_registry

(ARG002)


517-517: Unused method argument: sample_metric_records_message

(ARG002)


518-518: Unused method argument: mock_metric_registry

(ARG002)

aiperf/post_processors/record_export_results_processor.py

28-28: Unused method argument: service_id

(ARG002)


37-39: Avoid specifying long messages outside the exception class

(TRY003)


77-77: Do not catch blind exception: Exception

(BLE001)

aiperf/post_processors/metric_results_processor.py

93-93: Abstract raise to an inner function

(TRY301)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (14)
aiperf/common/models/__init__.py (1)

53-56: LGTM! New model exports are correct.

The re-export of MetricRecordInfo, MetricRecordMetadata, and MetricValue from record_models is correctly reflected in the __all__ list, making these classes part of the public API surface.

Also applies to: 96-99

aiperf/metrics/types/min_request_metric.py (1)

5-5: LGTM! MetricDateTimeUnit removal is consistent.

The removal of the MetricDateTimeUnit import and the associated display_unit assignment aligns with the PR-wide removal of this enum. The metric now relies on default display behavior with unit = MetricTimeUnit.NANOSECONDS.

aiperf/common/config/config_defaults.py (1)

14-14: LGTM! Configuration updates align with new export features.

The addition of ExportLevel import, the update to PROFILE_EXPORT_FILE to use .jsonl extension, and the new EXPORT_LEVEL constant correctly support the new per-record metrics export functionality.

Also applies to: 117-117, 123-123

aiperf/common/exceptions.py (1)

146-148: LGTM! New exception class is well-defined.

The PostProcessorDisabled exception is correctly defined as a subclass of AIPerfError with a clear docstring. This exception enables processor initialization logic to skip disabled post-processors gracefully.

aiperf/common/enums/data_exporter_enums.py (1)

19-29: LGTM! ExportLevel enum is well-defined.

The new ExportLevel enum correctly defines three export levels (SUMMARY, RECORDS, RAW) with clear docstrings describing their semantics. This provides a clean abstraction for controlling export granularity.

aiperf/common/enums/metric_enums.py (1)

171-171: LGTM! MetricDateTimeUnit removal is correct.

The removal of MetricDateTimeUnit and the narrowing of the type check in convert_to to MetricTimeUnit | MetricTimeUnitInfo is consistent with the PR-wide removal of this enum. The datetime import and conversion path have been correctly removed.

tests/post_processors/test_metric_results_processor.py (1)

12-12: LGTM! Test updates correctly adopt message-based processing.

The tests have been correctly updated to use the new create_metric_records_message helper from tests.post_processors.conftest, replacing direct MetricRecordDict construction with message-based inputs. This aligns with the broader refactor to message-centric processing.

Also applies to: 17-17, 43-47, 54-59, 71-75, 91-103

aiperf/post_processors/record_export_results_processor.py (2)

26-48: LGTM! Initialization logic is correct.

The initialization properly validates export level, sets up the output file path, creates necessary directories, and clears any existing output file. The service_id parameter is unused but required by the factory protocol pattern.


80-88: LGTM! Summarize and shutdown implementations are correct.

The empty summarize method is appropriate for this export-only processor, and the shutdown hook provides useful diagnostic information about the total records written.

examples/parse_profile_export.py (1)

71-87: LGTM! Main function implementation is correct.

The CLI setup with cyclopts and conditional async/sync loading is well-structured and appropriate for an example script.

aiperf/records/records_manager.py (4)

108-123: LGTM! Defensive processor initialization is well-implemented.

The try/except wrapper correctly handles processors that opt-out via PostProcessorDisabled, allowing the system to gracefully skip disabled processors without failing initialization.


126-144: LGTM! Metadata access refactoring is correct.

The migration to access credit_phase and worker_id through message.metadata aligns with the new MetricRecordMetadata structure, and the introduction of MetricRecordDict provides structured metric access.


177-212: LGTM! Duration filtering correctly adapted to MetricRecordDict.

The refactored filtering logic properly uses MetricRecordDict.get() to retrieve metric values while preserving the all-or-nothing filtering behavior.


264-273: LGTM! Results processor invocation correctly updated.

The refactored method properly passes the complete MetricRecordsMessage to each processor, enabling them to access metadata, metrics, and error information in a structured way.

Base automatically changed from ajc/x-request-id to main October 3, 2025 21:22
@ajcasagrande ajcasagrande self-assigned this Oct 3, 2025
@ajcasagrande ajcasagrande force-pushed the ajc/per-record-metrics branch from 8cfc99e to b04d012 Compare October 3, 2025 23:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
examples/README.md (1)

90-90: Use proper heading syntax for example sections.

The bold text for "Example: Successful Request" and "Example: Failed Request" should use proper Markdown heading syntax (e.g., ### Example: Successful Request) for better document structure and navigation.

Apply this diff:

-**Example: Successful Request**
+### Example: Successful Request
-**Example: Failed Request**
+### Example: Failed Request

Also applies to: 125-125

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc45705 and b04d012.

📒 Files selected for processing (24)
  • aiperf/common/config/config_defaults.py (2 hunks)
  • aiperf/common/config/output_config.py (2 hunks)
  • aiperf/common/enums/__init__.py (2 hunks)
  • aiperf/common/enums/data_exporter_enums.py (1 hunks)
  • aiperf/common/enums/metric_enums.py (1 hunks)
  • aiperf/common/enums/post_processor_enums.py (1 hunks)
  • aiperf/common/exceptions.py (1 hunks)
  • aiperf/common/models/__init__.py (2 hunks)
  • aiperf/common/models/record_models.py (3 hunks)
  • aiperf/common/protocols.py (2 hunks)
  • aiperf/metrics/metric_dicts.py (2 hunks)
  • aiperf/metrics/types/max_response_metric.py (1 hunks)
  • aiperf/metrics/types/min_request_metric.py (1 hunks)
  • aiperf/post_processors/__init__.py (1 hunks)
  • aiperf/post_processors/metric_results_processor.py (2 hunks)
  • aiperf/post_processors/record_export_results_processor.py (1 hunks)
  • aiperf/records/records_manager.py (6 hunks)
  • examples/README.md (1 hunks)
  • examples/parse_profile_export.py (1 hunks)
  • tests/post_processors/conftest.py (2 hunks)
  • tests/post_processors/test_metric_results_processor.py (4 hunks)
  • tests/post_processors/test_post_processor_integration.py (4 hunks)
  • tests/post_processors/test_record_export_results_processor.py (1 hunks)
  • tests/records/conftest.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • aiperf/common/protocols.py
  • tests/records/conftest.py
  • aiperf/common/models/init.py
  • aiperf/metrics/types/min_request_metric.py
  • aiperf/common/exceptions.py
  • aiperf/metrics/types/max_response_metric.py
  • aiperf/common/enums/data_exporter_enums.py
  • aiperf/common/config/output_config.py
  • tests/post_processors/test_metric_results_processor.py
  • aiperf/common/config/config_defaults.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T21:15:21.514Z
Learnt from: ajcasagrande
PR: ai-dynamo/aiperf#325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.514Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.

Applied to files:

  • aiperf/metrics/metric_dicts.py
🧬 Code graph analysis (12)
aiperf/post_processors/__init__.py (1)
aiperf/post_processors/record_export_results_processor.py (1)
  • RecordExportResultsProcessor (23-88)
aiperf/post_processors/metric_results_processor.py (7)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (30-61)
aiperf/common/protocols.py (5)
  • process_result (505-505)
  • is_trace_enabled (58-58)
  • trace (66-66)
  • debug (67-67)
  • warning (70-70)
aiperf/post_processors/record_export_results_processor.py (1)
  • process_result (50-78)
aiperf/metrics/metric_dicts.py (4)
  • MetricRecordDict (50-107)
  • MetricArray (133-207)
  • extend (148-155)
  • append (157-163)
aiperf/common/enums/metric_enums.py (1)
  • MetricType (273-288)
aiperf/metrics/base_aggregate_metric.py (2)
  • BaseAggregateMetric (13-103)
  • current_value (58-60)
aiperf/common/exceptions.py (1)
  • NoMetricValue (142-143)
aiperf/common/enums/__init__.py (1)
aiperf/common/enums/data_exporter_enums.py (1)
  • ExportLevel (19-29)
tests/post_processors/test_post_processor_integration.py (3)
aiperf/metrics/metric_dicts.py (1)
  • MetricArray (133-207)
tests/post_processors/conftest.py (1)
  • create_metric_records_message (241-278)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
aiperf/post_processors/record_export_results_processor.py (13)
aiperf/common/config/service_config.py (1)
  • ServiceConfig (30-171)
aiperf/common/config/user_config.py (1)
  • UserConfig (34-280)
aiperf/common/enums/data_exporter_enums.py (1)
  • ExportLevel (19-29)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/common/factories.py (1)
  • ResultsProcessorFactory (576-596)
aiperf/common/hooks.py (1)
  • on_stop (270-287)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (30-61)
aiperf/common/models/record_models.py (2)
  • MetricRecordInfo (97-110)
  • MetricResult (24-56)
aiperf/common/protocols.py (4)
  • ResultsProcessorProtocol (501-507)
  • process_result (505-505)
  • error (72-72)
  • summarize (507-507)
aiperf/metrics/metric_dicts.py (2)
  • MetricRecordDict (50-107)
  • to_display_dict (61-107)
aiperf/metrics/metric_registry.py (1)
  • MetricRegistry (27-292)
aiperf/post_processors/base_metrics_processor.py (1)
  • BaseMetricsProcessor (13-66)
aiperf/common/config/output_config.py (1)
  • export_level (49-50)
tests/post_processors/conftest.py (5)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/enums/message_enums.py (1)
  • MessageType (7-47)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (30-61)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
aiperf/common/models/record_models.py (2)
  • MetricRecordMetadata (66-94)
  • timestamp_ns (513-515)
aiperf/common/enums/post_processor_enums.py (1)
aiperf/common/enums/base_enums.py (1)
  • CaseInsensitiveStrEnum (15-73)
aiperf/common/models/record_models.py (3)
aiperf/common/models/base_models.py (1)
  • AIPerfBaseModel (25-53)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
aiperf/metrics/metric_dicts.py (5)
aiperf/common/aiperf_logger.py (1)
  • AIPerfLogger (25-239)
aiperf/common/enums/metric_enums.py (8)
  • MetricType (273-288)
  • MetricFlags (365-448)
  • missing_flags (440-448)
  • convert_to (34-44)
  • convert_to (55-57)
  • convert_to (70-75)
  • convert_to (169-174)
  • convert_to (212-232)
aiperf/common/exceptions.py (3)
  • MetricTypeError (126-127)
  • MetricUnitError (130-131)
  • NoMetricValue (142-143)
aiperf/common/models/record_models.py (2)
  • MetricResult (24-56)
  • MetricValue (59-63)
aiperf/metrics/metric_registry.py (2)
  • MetricRegistry (27-292)
  • get_class (112-121)
examples/parse_profile_export.py (2)
aiperf/common/models/record_models.py (1)
  • MetricRecordInfo (97-110)
aiperf/metrics/metric_registry.py (1)
  • MetricRegistry (27-292)
tests/post_processors/test_record_export_results_processor.py (6)
aiperf/common/config/output_config.py (2)
  • OutputConfig (16-50)
  • export_level (49-50)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (30-61)
aiperf/common/models/record_models.py (4)
  • MetricRecordInfo (97-110)
  • MetricRecordMetadata (66-94)
  • MetricValue (59-63)
  • timestamp_ns (513-515)
aiperf/metrics/metric_dicts.py (1)
  • MetricRecordDict (50-107)
aiperf/post_processors/record_export_results_processor.py (4)
  • RecordExportResultsProcessor (23-88)
  • process_result (50-78)
  • _shutdown (85-88)
  • summarize (80-82)
tests/post_processors/conftest.py (2)
  • create_metric_records_message (241-278)
  • mock_metric_registry (176-197)
aiperf/records/records_manager.py (6)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/metrics/metric_dicts.py (2)
  • MetricRecordDict (50-107)
  • append (157-163)
aiperf/common/factories.py (16)
  • ResultsProcessorFactory (576-596)
  • create_instance (183-209)
  • create_instance (284-315)
  • create_instance (354-364)
  • create_instance (375-381)
  • create_instance (390-395)
  • create_instance (406-414)
  • create_instance (425-430)
  • create_instance (439-447)
  • create_instance (456-464)
  • create_instance (491-499)
  • create_instance (536-550)
  • create_instance (561-573)
  • create_instance (584-596)
  • create_instance (607-611)
  • create_instance (620-628)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (30-61)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
aiperf/post_processors/record_export_results_processor.py (1)
  • process_result (50-78)
🪛 markdownlint-cli2 (0.18.1)
examples/README.md

90-90: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


125-125: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.13.2)
aiperf/post_processors/metric_results_processor.py

93-93: Abstract raise to an inner function

(TRY301)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Do not catch blind exception: Exception

(BLE001)

aiperf/post_processors/record_export_results_processor.py

28-28: Unused method argument: service_id

(ARG002)


37-39: Avoid specifying long messages outside the exception class

(TRY003)


77-77: Do not catch blind exception: Exception

(BLE001)

tests/post_processors/test_record_export_results_processor.py

98-98: Unused lambda argument: self

(ARG005)


205-205: Unused method argument: mock_metric_registry

(ARG002)


254-254: Unused method argument: mock_metric_registry

(ARG002)


279-279: Unused method argument: mock_metric_registry

(ARG002)


309-309: Unused method argument: sample_metric_records_message

(ARG002)


310-310: Unused method argument: mock_metric_registry

(ARG002)


363-363: Unused method argument: mock_metric_registry

(ARG002)


398-398: Unused method argument: mock_metric_registry

(ARG002)


447-447: Unused method argument: mock_metric_registry

(ARG002)


486-486: Unused method argument: mock_metric_registry

(ARG002)


517-517: Unused method argument: sample_metric_records_message

(ARG002)


518-518: Unused method argument: mock_metric_registry

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (9)
aiperf/post_processors/__init__.py (1)

20-22: LGTM!

The new RecordExportResultsProcessor is properly exposed through the public API.

Also applies to: 24-29

aiperf/common/enums/metric_enums.py (1)

169-174: LGTM!

The removal of MetricDateTimeUnit is clean, and the type check in convert_to has been appropriately narrowed to only handle MetricTimeUnit | MetricTimeUnitInfo.

aiperf/common/enums/__init__.py (1)

26-30: LGTM!

The public API correctly exposes ExportLevel and removes the deprecated MetricDateTimeUnit.

Also applies to: 103-157

aiperf/common/enums/post_processor_enums.py (1)

7-12: LGTM!

The enhanced docstrings provide better context about the processing pipeline stages, and the new RECORD_EXPORT member is well-documented.

Also applies to: 19-33

tests/post_processors/test_post_processor_integration.py (1)

36-50: LGTM!

The test updates correctly reflect the shift to message-based processing with MetricRecordsMessage. The tests now properly construct messages with metadata for each batch.

Also applies to: 62-79

aiperf/metrics/metric_dicts.py (2)

7-7: LGTM!

The new imports and module-level logger are properly configured following AIPerf conventions.

Also applies to: 15-17, 21-21, 28-29


61-107: LGTM!

The to_display_dict method is well-implemented with proper:

  • Registry lookup and missing metric handling
  • Internal/experimental metric filtering (logic confirmed correct per previous review)
  • Unit conversion with error handling
  • MetricValue construction

Based on learnings: The missing_flags check correctly excludes metrics with either EXPERIMENTAL or INTERNAL flag when show_internal=False.

tests/post_processors/conftest.py (2)

10-12: LGTM!

The new imports properly support the message-based test infrastructure.

Also applies to: 20-21


200-278: LGTM!

Both factory functions (create_metric_metadata and create_metric_records_message) are well-designed with:

  • Sensible defaults that reduce test boilerplate
  • Flexible parameter overrides
  • Clean handling of optional metadata construction
  • Proper typing

The special handling of x_request_id in create_metric_records_message is a nice ergonomic touch.

Comment on lines +50 to +77
async def process_result(self, message: MetricRecordsMessage) -> None:
record_dicts = [MetricRecordDict(result) for result in message.results]
for record_dict in record_dicts:
try:
display_metrics = record_dict.to_display_dict(
MetricRegistry, self.show_internal
)
if not display_metrics:
continue

record_info = MetricRecordInfo(
metadata=message.metadata,
metrics=display_metrics,
error=message.error,
)
json_str = record_info.model_dump_json()

async with aiofiles.open(
self.output_file, mode="a", encoding="utf-8"
) as f:
await f.write(json_str)
await f.write("\n")

self.record_count += 1
if self.record_count % 100 == 0:
self.debug(f"Wrote {self.record_count} record metrics")

except Exception as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Merge per-request metrics before writing JSONL

MetricRecordsMessage can carry multiple MetricRecordDicts for the same request (one per record processor). Emitting a line per dict duplicates metadata and splits the metrics across several rows, breaking the “one request per JSONL line with full detail” contract and double-counting once extensions add extra record processors. Please merge the dicts first and write a single record per message. For example:

-        record_dicts = [MetricRecordDict(result) for result in message.results]
-        for record_dict in record_dicts:
-            try:
-                display_metrics = record_dict.to_display_dict(
-                    MetricRegistry, self.show_internal
-                )
-                if not display_metrics:
-                    continue
+        record_dicts = [MetricRecordDict(result) for result in message.results]
+        merged_record = MetricRecordDict()
+        for record_dict in record_dicts:
+            merged_record.update(record_dict)
+
+        try:
+            display_metrics = merged_record.to_display_dict(
+                MetricRegistry, self.show_internal
+            )
+            if not display_metrics:
+                return
@@
-                record_info = MetricRecordInfo(
-                    metadata=message.metadata,
-                    metrics=display_metrics,
-                    error=message.error,
-                )
+            record_info = MetricRecordInfo(
+                metadata=message.metadata,
+                metrics=display_metrics,
+                error=message.error,
+            )
@@
-                self.record_count += 1
-                if self.record_count % 100 == 0:
-                    self.debug(f"Wrote {self.record_count} record metrics")
-
-            except Exception as e:
-                self.error(f"Failed to write record metrics: {e}")
+            self.record_count += 1
+            if self.record_count % 100 == 0:
+                self.debug(f"Wrote {self.record_count} record metrics")
+
+        except Exception as e:
+            self.error(f"Failed to write record metrics: {e}")

(Adjust control flow accordingly.)

This keeps the export aligned with the promised per-request detail. Based on learnings

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.13.2)

77-77: Do not catch blind exception: Exception

(BLE001)

Comment on lines +226 to +234
assert processor.record_count == 2 # Two result dicts in the message
assert processor.output_file.exists()

# Verify file content
with open(processor.output_file, "rb") as f:
lines = f.readlines()

assert len(lines) == 2
for line in lines:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align expectations with per-request export semantics

Once we merge all per-processor metric dicts into a single record (see processor fix), each MetricRecordsMessage should emit exactly one JSONL entry. Please update the assertions in this test (and related ones) to expect a single line/count per message instead of record_count == 2 / len(lines) == 2.

🤖 Prompt for AI Agents
In tests/post_processors/test_record_export_results_processor.py around lines
226 to 234, the test still expects two records/lines per MetricRecordsMessage
(assert processor.record_count == 2 and assert len(lines) == 2) but per-request
export now emits a single merged JSONL entry per message; change the assertions
to expect a single record/line (assert processor.record_count == 1 and assert
len(lines) == 1) and adjust any loop or per-line checks accordingly to validate
that single JSONL entry.

Comment on lines +337 to +344
assert processor.record_count == 10 # 5 messages * 2 results each
assert processor.output_file.exists()

with open(processor.output_file, "rb") as f:
lines = f.readlines()

assert len(lines) == 10

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Adjust multi-message count to match merged-record output

After consolidating all metric dicts for a message into a single export record, the accumulator should report one record per message. Update this test to expect record_count == 5 and len(lines) == 5 (one per request), and add an assertion that the merged metrics include contributions from the different dicts.

🤖 Prompt for AI Agents
In tests/post_processors/test_record_export_results_processor.py around lines
337 to 344, the test still expects 10 records/lines even though messages are now
merged into a single export record per message; update the two assertions to
assert processor.record_count == 5 and assert len(lines) == 5, and add an
assertion verifying that each output record contains merged metrics from the
different input dicts (for example by loading each line and asserting expected
metric keys/values from both dicts are present in the merged record).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b04d012 and 7062ef9.

📒 Files selected for processing (2)
  • aiperf/common/config/config_defaults.py (2 hunks)
  • aiperf/common/enums/metric_enums.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
aiperf/common/config/config_defaults.py (1)
aiperf/common/enums/data_exporter_enums.py (1)
  • ExportLevel (19-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (3)
aiperf/common/config/config_defaults.py (2)

14-14: LGTM: Import addition is correct.

The ExportLevel enum is properly imported and used on line 124.


124-124: LGTM: New export level default aligns with PR objectives.

The new EXPORT_LEVEL constant with default value ExportLevel.RECORDS correctly enables per-record metrics export as intended by this PR.

aiperf/common/enums/metric_enums.py (1)

171-171: LGTM: Type narrowing correctly reflects removal of MetricDateTimeUnit.

The narrowed type check is consistent with the removal of MetricDateTimeUnit from the codebase. As noted in previous review comments, keeping timestamps as epoch nanoseconds simplifies usage for end users.

Based on past review comments.

@ajcasagrande ajcasagrande force-pushed the ajc/per-record-metrics branch from 7062ef9 to dc45705 Compare October 3, 2025 23:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/post_processors/test_record_export_results_processor.py (1)

226-247: Export one merged record per MetricRecordsMessage

These assertions only pass because RecordExportResultsProcessor.process_result loops over message.results and emits a separate JSONL line for every sub-dict. Every MetricRecordsMessage represents a single request (complete metadata + error), and the PR objective promises “profile_export.jsonl … one record per line” with the full metric map. Duplicating metadata across multiple lines for the same request breaks that contract and forces downstream consumers to stitch partial metric fragments back together.

Please merge the per-processor dicts before serializing so that each message yields exactly one JSONL entry, then update this test (and the related count-based expectations below) to assert a single record per request.

-        record_dicts = [MetricRecordDict(result) for result in message.results]
-        for record_dict in record_dicts:
-            try:
-                display_metrics = record_dict.to_display_dict(
-                    MetricRegistry, self.show_internal
-                )
-                if not display_metrics:
-                    continue
-
-                record_info = MetricRecordInfo(
-                    metadata=message.metadata,
-                    metrics=display_metrics,
-                    error=message.error,
-                )
-                json_str = record_info.model_dump_json()
-
-                async with aiofiles.open(
-                    self.output_file, mode="a", encoding="utf-8"
-                ) as f:
-                    await f.write(json_str)
-                    await f.write("\n")
-
-                self.record_count += 1
-                if self.record_count % 100 == 0:
-                    self.debug(f"Wrote {self.record_count} record metrics")
-
-            except Exception as e:
-                self.error(f"Failed to write record metrics: {e}")
+        combined_record = MetricRecordDict()
+        for result in message.results:
+            combined_record.update(result)
+
+        try:
+            display_metrics = combined_record.to_display_dict(
+                MetricRegistry, self.show_internal
+            )
+            if not display_metrics:
+                return
+
+            record_info = MetricRecordInfo(
+                metadata=message.metadata,
+                metrics=display_metrics,
+                error=message.error,
+            )
+            json_str = record_info.model_dump_json()
+
+            async with aiofiles.open(
+                self.output_file, mode="a", encoding="utf-8"
+            ) as f:
+                await f.write(json_str)
+                await f.write("\n")
+
+            self.record_count += 1
+            if self.record_count % 100 == 0:
+                self.debug(f"Wrote {self.record_count} record metrics")
+
+        except Exception as e:
+            self.error(f"Failed to write record metrics: {e}")
aiperf/post_processors/record_export_results_processor.py (1)

50-78: Merge per-request metrics before writing JSONL.

The current implementation writes one JSONL line per MetricRecordDict in message.results. However, MetricRecordsMessage can carry multiple dicts for the same request (one per record processor). This duplicates metadata and splits metrics across several rows, breaking the "one request per JSONL line with full detail" contract.

Please merge the dicts first and write a single record per message, as suggested in the previous review.

Based on learnings

🧹 Nitpick comments (6)
examples/parse_profile_export.py (3)

19-27: Consider adding error handling for file I/O operations.

The sync load_records function does not handle potential file I/O errors (e.g., FileNotFoundError, PermissionError, encoding issues). While this is an example script, adding a try-except block with informative error messages would improve robustness and user experience.

Apply this diff to add error handling:

 def load_records(file_path: Path) -> list[MetricRecordInfo]:
     """Load profile_export.jsonl file into structured Pydantic models in sync mode."""
     records = []
-    with open(file_path, encoding="utf-8") as f:
-        for line in f:
-            if line.strip():
-                record = MetricRecordInfo.model_validate_json(line)
-                records.append(record)
+    try:
+        with open(file_path, encoding="utf-8") as f:
+            for line in f:
+                if line.strip():
+                    record = MetricRecordInfo.model_validate_json(line)
+                    records.append(record)
+    except FileNotFoundError:
+        print(f"Error: File not found: {file_path}")
+        raise
+    except PermissionError:
+        print(f"Error: Permission denied: {file_path}")
+        raise
+    except Exception as e:
+        print(f"Error loading records: {e}")
+        raise
     return records

30-40: Consider adding error handling for async file I/O operations.

The async load_records_async function does not handle potential file I/O errors (e.g., FileNotFoundError, PermissionError, encoding issues). Adding a try-except block with informative error messages would improve robustness and user experience.

Apply this diff to add error handling:

 async def load_records_async(file_path: Path) -> list[MetricRecordInfo]:
     """Load profile_export.jsonl file into structured Pydantic models in async mode."""
     import aiofiles

     records = []
-    async with aiofiles.open(file_path, encoding="utf-8") as f:
-        async for line in f:
-            if line.strip():
-                record = MetricRecordInfo.model_validate_json(line)
-                records.append(record)
+    try:
+        async with aiofiles.open(file_path, encoding="utf-8") as f:
+            async for line in f:
+                if line.strip():
+                    record = MetricRecordInfo.model_validate_json(line)
+                    records.append(record)
+    except FileNotFoundError:
+        print(f"Error: File not found: {file_path}")
+        raise
+    except PermissionError:
+        print(f"Error: Permission denied: {file_path}")
+        raise
+    except Exception as e:
+        print(f"Error loading records: {e}")
+        raise
     return records

58-63: Consider handling potential MetricTypeError from MetricRegistry.get_class.

Line 60 calls MetricRegistry.get_class(metric_tag), which can raise MetricTypeError if the metric tag is not registered. While this is an example script, adding error handling would prevent crashes and provide a better user experience if the profile export contains unregistered metrics.

Apply this diff to add error handling:

         console.rule("[bold]Example Metrics[/]")
         for metric_tag, metric_value in record.metrics.items():
-            metric_cls = MetricRegistry.get_class(metric_tag)
-            console.print(
-                f"[bold cyan]{metric_cls.header} ({metric_tag})[/]: [green]{metric_value.value} ({metric_value.unit})[/]"
-            )
+            try:
+                metric_cls = MetricRegistry.get_class(metric_tag)
+                console.print(
+                    f"[bold cyan]{metric_cls.header} ({metric_tag})[/]: [green]{metric_value.value} ({metric_value.unit})[/]"
+                )
+            except Exception as e:
+                console.print(
+                    f"[bold cyan]{metric_tag}[/]: [green]{metric_value.value} ({metric_value.unit})[/] [dim](header lookup failed: {e})[/]"
+                )
aiperf/post_processors/record_export_results_processor.py (3)

28-28: Consider removing unused service_id parameter.

The service_id parameter is not used in the class. If it's not required by the factory signature or for future extensibility, consider removing it to reduce clutter.

Apply this diff:

     def __init__(
         self,
-        service_id: str,
         service_config: ServiceConfig,
         user_config: UserConfig,
         **kwargs,
     ):

37-39: Consider extracting exception message to class constant.

The static analysis tool flags the long exception message. While this is a minor nitpick, extracting it to a class constant or using a shorter message would align with TRY003 guidance.

Example:

+    _DISABLED_MESSAGE_TEMPLATE = "Record export results processor is disabled for export level {}"
+
     def __init__(
         ...
     ):
         ...
         if export_level not in (ExportLevel.RECORDS, ExportLevel.RAW):
             raise PostProcessorDisabled(
-                f"Record export results processor is disabled for export level {export_level}"
+                self._DISABLED_MESSAGE_TEMPLATE.format(export_level)
             )

77-78: Consider narrowing exception handling or logging full traceback.

The broad Exception catch is used to prevent one record failure from stopping the processor, which is reasonable. However, it could mask unexpected errors. Consider logging the full traceback for debugging:

Apply this diff:

             except Exception as e:
-                self.error(f"Failed to write record metrics: {e}")
+                self.error(f"Failed to write record metrics: {e}", exc_info=True)

Alternatively, catch specific exceptions (e.g., IOError, ValueError) if the expected failure modes are known.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7062ef9 and ce7ed08.

⛔ Files ignored due to path filters (1)
  • examples/artifacts/run1/profile_export_aiperf.csv is excluded by !**/*.csv
📒 Files selected for processing (27)
  • aiperf/common/config/config_defaults.py (2 hunks)
  • aiperf/common/config/output_config.py (2 hunks)
  • aiperf/common/enums/__init__.py (2 hunks)
  • aiperf/common/enums/data_exporter_enums.py (1 hunks)
  • aiperf/common/enums/metric_enums.py (1 hunks)
  • aiperf/common/enums/post_processor_enums.py (1 hunks)
  • aiperf/common/exceptions.py (1 hunks)
  • aiperf/common/messages/inference_messages.py (2 hunks)
  • aiperf/common/models/__init__.py (2 hunks)
  • aiperf/common/models/record_models.py (3 hunks)
  • aiperf/common/protocols.py (2 hunks)
  • aiperf/metrics/metric_dicts.py (2 hunks)
  • aiperf/metrics/types/max_response_metric.py (1 hunks)
  • aiperf/metrics/types/min_request_metric.py (1 hunks)
  • aiperf/post_processors/__init__.py (1 hunks)
  • aiperf/post_processors/metric_results_processor.py (2 hunks)
  • aiperf/post_processors/record_export_results_processor.py (1 hunks)
  • aiperf/records/record_processor_service.py (4 hunks)
  • aiperf/records/records_manager.py (6 hunks)
  • examples/README.md (1 hunks)
  • examples/artifacts/run1/profile_export_aiperf.json (1 hunks)
  • examples/parse_profile_export.py (1 hunks)
  • tests/post_processors/conftest.py (2 hunks)
  • tests/post_processors/test_metric_results_processor.py (4 hunks)
  • tests/post_processors/test_post_processor_integration.py (4 hunks)
  • tests/post_processors/test_record_export_results_processor.py (1 hunks)
  • tests/records/conftest.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/artifacts/run1/profile_export_aiperf.json
🚧 Files skipped from review as they are similar to previous changes (9)
  • aiperf/common/enums/post_processor_enums.py
  • aiperf/common/config/output_config.py
  • aiperf/metrics/types/min_request_metric.py
  • aiperf/post_processors/init.py
  • aiperf/common/enums/data_exporter_enums.py
  • tests/post_processors/test_metric_results_processor.py
  • aiperf/common/config/config_defaults.py
  • aiperf/metrics/types/max_response_metric.py
  • aiperf/common/exceptions.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T21:15:21.514Z
Learnt from: ajcasagrande
PR: ai-dynamo/aiperf#325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.514Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.

Applied to files:

  • aiperf/metrics/metric_dicts.py
🧬 Code graph analysis (15)
aiperf/common/protocols.py (3)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
aiperf/post_processors/record_export_results_processor.py (1)
  • process_result (50-78)
tests/records/conftest.py (2)
aiperf/common/models/record_models.py (7)
  • ParsedResponse (476-486)
  • ParsedResponseRecord (489-568)
  • RequestRecord (226-416)
  • TextResponseData (427-434)
  • start_perf_ns (508-510)
  • timestamp_ns (513-515)
  • end_perf_ns (520-531)
tests/post_processors/conftest.py (2)
  • sample_request_record (45-55)
  • sample_parsed_record (59-77)
aiperf/post_processors/metric_results_processor.py (6)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/common/protocols.py (3)
  • process_result (505-505)
  • debug (67-67)
  • warning (70-70)
aiperf/post_processors/record_export_results_processor.py (1)
  • process_result (50-78)
aiperf/metrics/metric_dicts.py (4)
  • MetricRecordDict (50-107)
  • MetricArray (133-207)
  • extend (148-155)
  • append (157-163)
aiperf/metrics/base_aggregate_metric.py (2)
  • BaseAggregateMetric (13-103)
  • current_value (58-60)
aiperf/common/exceptions.py (1)
  • NoMetricValue (142-143)
aiperf/common/models/__init__.py (1)
aiperf/common/models/record_models.py (4)
  • MetricRecordInfo (97-110)
  • MetricRecordMetadata (66-94)
  • MetricResult (24-56)
  • MetricValue (59-63)
tests/post_processors/test_post_processor_integration.py (4)
aiperf/metrics/metric_dicts.py (1)
  • MetricArray (133-207)
tests/post_processors/conftest.py (1)
  • create_metric_records_message (241-278)
aiperf/common/protocols.py (1)
  • process_result (505-505)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
aiperf/common/enums/__init__.py (1)
aiperf/common/enums/data_exporter_enums.py (1)
  • ExportLevel (19-29)
aiperf/metrics/metric_dicts.py (5)
aiperf/common/aiperf_logger.py (1)
  • AIPerfLogger (25-239)
aiperf/common/enums/metric_enums.py (8)
  • MetricType (273-288)
  • MetricFlags (365-451)
  • missing_flags (443-451)
  • convert_to (34-44)
  • convert_to (55-57)
  • convert_to (70-75)
  • convert_to (169-174)
  • convert_to (212-232)
aiperf/common/exceptions.py (3)
  • MetricTypeError (126-127)
  • MetricUnitError (130-131)
  • NoMetricValue (142-143)
aiperf/common/models/record_models.py (1)
  • MetricValue (59-63)
aiperf/metrics/metric_registry.py (1)
  • get_class (112-121)
tests/post_processors/conftest.py (4)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/enums/message_enums.py (1)
  • MessageType (7-47)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/common/models/record_models.py (2)
  • MetricRecordMetadata (66-94)
  • timestamp_ns (513-515)
aiperf/post_processors/record_export_results_processor.py (12)
aiperf/common/config/user_config.py (1)
  • UserConfig (34-280)
aiperf/common/enums/data_exporter_enums.py (1)
  • ExportLevel (19-29)
aiperf/common/enums/post_processor_enums.py (1)
  • ResultsProcessorType (19-33)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/common/factories.py (1)
  • ResultsProcessorFactory (576-596)
aiperf/common/hooks.py (1)
  • on_stop (270-287)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/common/models/record_models.py (2)
  • MetricRecordInfo (97-110)
  • MetricResult (24-56)
aiperf/common/protocols.py (4)
  • ResultsProcessorProtocol (501-507)
  • process_result (505-505)
  • error (72-72)
  • summarize (507-507)
aiperf/metrics/metric_dicts.py (2)
  • MetricRecordDict (50-107)
  • to_display_dict (61-107)
aiperf/metrics/metric_registry.py (1)
  • MetricRegistry (27-292)
aiperf/post_processors/base_metrics_processor.py (1)
  • BaseMetricsProcessor (14-99)
tests/post_processors/test_record_export_results_processor.py (6)
aiperf/common/config/output_config.py (2)
  • OutputConfig (16-50)
  • export_level (49-50)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/common/models/record_models.py (4)
  • MetricRecordInfo (97-110)
  • MetricRecordMetadata (66-94)
  • MetricValue (59-63)
  • timestamp_ns (513-515)
aiperf/metrics/metric_dicts.py (1)
  • MetricRecordDict (50-107)
aiperf/post_processors/record_export_results_processor.py (4)
  • RecordExportResultsProcessor (23-88)
  • process_result (50-78)
  • _shutdown (85-88)
  • summarize (80-82)
tests/post_processors/conftest.py (2)
  • create_metric_records_message (241-278)
  • mock_metric_registry (176-197)
aiperf/records/records_manager.py (7)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/metrics/metric_dicts.py (2)
  • MetricRecordDict (50-107)
  • append (157-163)
aiperf/common/factories.py (15)
  • create_instance (183-209)
  • create_instance (284-315)
  • create_instance (354-364)
  • create_instance (375-381)
  • create_instance (390-395)
  • create_instance (406-414)
  • create_instance (425-430)
  • create_instance (439-447)
  • create_instance (456-464)
  • create_instance (491-499)
  • create_instance (536-550)
  • create_instance (561-573)
  • create_instance (584-596)
  • create_instance (607-611)
  • create_instance (620-628)
aiperf/common/protocols.py (4)
  • debug (67-67)
  • is_trace_enabled (58-58)
  • trace (66-66)
  • process_result (505-505)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
aiperf/post_processors/record_export_results_processor.py (1)
  • process_result (50-78)
examples/parse_profile_export.py (2)
aiperf/common/models/record_models.py (1)
  • MetricRecordInfo (97-110)
aiperf/metrics/metric_registry.py (1)
  • MetricRegistry (27-292)
aiperf/common/models/record_models.py (3)
aiperf/common/models/base_models.py (1)
  • AIPerfBaseModel (25-53)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
aiperf/common/messages/inference_messages.py (4)
aiperf/common/enums/message_enums.py (1)
  • MessageType (7-47)
aiperf/common/messages/service_messages.py (1)
  • BaseServiceMessage (18-25)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
aiperf/common/models/record_models.py (2)
  • MetricRecordMetadata (66-94)
  • MetricResult (24-56)
aiperf/records/record_processor_service.py (6)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/common/models/record_models.py (3)
  • MetricRecordMetadata (66-94)
  • ParsedResponseRecord (489-568)
  • timestamp_ns (513-515)
aiperf/common/factories.py (17)
  • RecordProcessorFactory (553-573)
  • get_all_class_types (240-242)
  • create_instance (183-209)
  • create_instance (284-315)
  • create_instance (354-364)
  • create_instance (375-381)
  • create_instance (390-395)
  • create_instance (406-414)
  • create_instance (425-430)
  • create_instance (439-447)
  • create_instance (456-464)
  • create_instance (491-499)
  • create_instance (536-550)
  • create_instance (561-573)
  • create_instance (584-596)
  • create_instance (607-611)
  • create_instance (620-628)
aiperf/common/protocols.py (1)
  • push (155-155)
aiperf/zmq/push_client.py (1)
  • push (106-115)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
🪛 markdownlint-cli2 (0.18.1)
examples/README.md

90-90: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


125-125: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.13.3)
aiperf/post_processors/metric_results_processor.py

93-93: Abstract raise to an inner function

(TRY301)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Do not catch blind exception: Exception

(BLE001)

aiperf/post_processors/record_export_results_processor.py

28-28: Unused method argument: service_id

(ARG002)


37-39: Avoid specifying long messages outside the exception class

(TRY003)


77-77: Do not catch blind exception: Exception

(BLE001)

tests/post_processors/test_record_export_results_processor.py

98-98: Unused lambda argument: self

(ARG005)


205-205: Unused method argument: mock_metric_registry

(ARG002)


254-254: Unused method argument: mock_metric_registry

(ARG002)


279-279: Unused method argument: mock_metric_registry

(ARG002)


309-309: Unused method argument: sample_metric_records_message

(ARG002)


310-310: Unused method argument: mock_metric_registry

(ARG002)


363-363: Unused method argument: mock_metric_registry

(ARG002)


398-398: Unused method argument: mock_metric_registry

(ARG002)


447-447: Unused method argument: mock_metric_registry

(ARG002)


486-486: Unused method argument: mock_metric_registry

(ARG002)


517-517: Unused method argument: sample_metric_records_message

(ARG002)


518-518: Unused method argument: mock_metric_registry

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (12)
aiperf/common/enums/metric_enums.py (1)

169-175: LGTM!

The removal of MetricDateTimeUnit from the type check is consistent with the PR objective to eliminate the datetime unit. The change aligns with the developer's decision to use epoch nanoseconds directly for timestamp values.

tests/post_processors/test_post_processor_integration.py (2)

36-61: LGTM!

The test correctly uses the new message-based API by creating MetricRecordsMessage objects via create_metric_records_message and passing them to process_result. The assertions verify that the results processor correctly accumulates data from the message.


62-86: LGTM!

The test correctly verifies accumulation across multiple messages by creating distinct MetricRecordsMessage objects in a loop and processing each one. The use of unique x_request_id, timestamp_ns, and x_correlation_id per message ensures proper isolation and accumulation testing.

aiperf/records/record_processor_service.py (2)

84-97: LGTM!

The addition of error handling to catch PostProcessorDisabled during processor initialization is a good practice. It allows the service to gracefully skip disabled processors and log the action without failing initialization.


131-147: LGTM!

The updated message construction correctly encapsulates request metadata in a MetricRecordMetadata object and sends it via MetricRecordsMessage. All required fields are properly populated from the incoming message and service ID.

examples/README.md (1)

1-148: LGTM!

The README provides clear, comprehensive documentation for the example scripts and artifacts. The descriptions of data models, usage patterns, and example payloads are helpful for users understanding the new per-record metrics export feature.

Note: There are minor markdownlint warnings (lines 90, 125) about using emphasis instead of headings for "Example: Successful Request" and "Example: Failed Request". These can be safely ignored or optionally converted to proper headings (### Example: Successful Request) if strict markdown compliance is desired.

aiperf/common/enums/__init__.py (2)

26-30: LGTM!

The addition of ExportLevel to the public API is consistent with the PR's introduction of per-record metrics export levels.


103-157: LGTM!

The all list correctly includes ExportLevel and no longer includes MetricDateTimeUnit, aligning with the PR's public API changes.

aiperf/common/models/__init__.py (2)

49-68: LGTM!

The addition of MetricRecordInfo, MetricRecordMetadata, and MetricValue to the imports is consistent with the PR's introduction of new per-record export models.


76-123: LGTM!

The all list correctly includes the new model classes MetricRecordInfo, MetricRecordMetadata, and MetricValue, making them part of the public API.

aiperf/records/records_manager.py (2)

108-123: LGTM: Defensive processor initialization.

The try/except pattern gracefully handles processors that self-disable via PostProcessorDisabled, allowing the system to skip unavailable processors without crashing. This is a good extensibility practice.


264-273: LGTM: Consistent message-based interface.

The updated signature and call to process_result(message) align with the ResultsProcessorProtocol and allow processors to access full request metadata and error context. This is a clean refactor.

Comment on lines +14 to +55
# Constants for test data
DEFAULT_START_TIME_NS = 1_000_000
DEFAULT_FIRST_RESPONSE_NS = 1_050_000
DEFAULT_LAST_RESPONSE_NS = 1_100_000
DEFAULT_INPUT_TOKENS = 5
DEFAULT_OUTPUT_TOKENS = 2


@pytest.fixture
def sample_request_record() -> RequestRecord:
"""Create a sample RequestRecord for testing."""
return RequestRecord(
conversation_id="test-conversation",
turn_index=0,
model_name="test-model",
start_perf_ns=DEFAULT_START_TIME_NS,
timestamp_ns=DEFAULT_START_TIME_NS,
end_perf_ns=DEFAULT_LAST_RESPONSE_NS,
error=None,
)


@pytest.fixture
def sample_parsed_record(sample_request_record: RequestRecord) -> ParsedResponseRecord:
"""Create a valid ParsedResponseRecord for testing."""
responses = [
ParsedResponse(
perf_ns=DEFAULT_FIRST_RESPONSE_NS,
data=TextResponseData(text="Hello"),
),
ParsedResponse(
perf_ns=DEFAULT_LAST_RESPONSE_NS,
data=TextResponseData(text=" world"),
),
]

return ParsedResponseRecord(
request=sample_request_record,
responses=responses,
input_token_count=DEFAULT_INPUT_TOKENS,
output_token_count=DEFAULT_OUTPUT_TOKENS,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Eliminate code duplication with tests/post_processors/conftest.py.

The constants (DEFAULT_START_TIME_NS, DEFAULT_FIRST_RESPONSE_NS, DEFAULT_LAST_RESPONSE_NS, DEFAULT_INPUT_TOKENS, DEFAULT_OUTPUT_TOKENS) and fixtures (sample_request_record, sample_parsed_record) in this file are duplicates of those in tests/post_processors/conftest.py (lines 44-76).

Consider moving these shared test utilities to a common location (e.g., tests/conftest.py or a new tests/fixtures.py) and importing them in both test modules to eliminate duplication and ensure consistency.

Copy link
Contributor

@the-david-oy the-david-oy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! The code looks good.

For the documentation, it's different than what we do now. Are we sure we want to store sample profile export artifacts in our repository as well as a parsing script? If so, we should be testing all of these.

Also, the new README format looks strange to me, as this PR is creating a new examples directory that's entirely focused on the per-record metrics feature rather than integrating with the current docs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
aiperf/common/models/base_models.py (1)

26-28: Update the docstring to reflect the new configuration.

The docstring still describes the old behavior ("allow arbitrary types") but the configuration has changed to extra="allow", which accepts undeclared fields rather than arbitrary types. This mismatch will confuse developers.

Apply this diff to update the docstring:

-class AIPerfBaseModel(BaseModel):
-    """Base model for all AIPerf Pydantic models. This class is configured to allow
-    arbitrary types to be used as fields as to allow for more flexible model definitions
-    by end users without breaking the existing code.
+class AIPerfBaseModel(BaseModel):
+    """Base model for all AIPerf Pydantic models. This class is configured to allow
+    extra fields not defined in the model schema, enabling flexible model definitions
+    by end users without breaking existing code.
🧹 Nitpick comments (4)
aiperf/post_processors/base_metrics_processor.py (1)

85-93: Variable rename improves clarity.

The rename from supported_tags to applicable_tags better reflects the semantics of MetricRegistry.tags_applicable_to, which filters tags based on flags and types. This is a readability improvement with no functional change.

aiperf/parsers/inference_result_parser.py (1)

114-118: Consider narrowing exception handling scope.

The broad Exception catches in lines 116, 143, and 162 handle all failures when computing input token counts. While this prevents breaking the processing flow, it may silently hide unexpected errors. Consider:

  1. Catching more specific exceptions (e.g., TokenizationError, ValueError)
  2. Using self.exception() instead of self.warning() for unexpected failures to preserve stack traces
  3. Adding exception type to the log message for better debugging

Example refactor for one of the catches:

 try:
     input_token_count = await self.compute_input_token_count(request_record)
-except Exception as e:
-    self.warning(f"Error computing input token count for error record: {e}")
+except (TokenizationError, ValueError) as e:
+    self.warning(f"Error computing input token count for error record: {e}")
+    input_token_count = None
+except Exception:
+    self.exception("Unexpected error computing input token count for error record")
     input_token_count = None

Also applies to: 139-144, 160-163

tests/metrics/test_input_sequence_length_metric.py (1)

93-93: Consider moving ErrorDetails import to module level.

The ErrorDetails import appears locally within test methods. For consistency with the rest of the codebase and to make dependencies explicit, consider moving it to the top of the file alongside other imports.

Apply this diff to move the import to the top of the file:

 from aiperf.common.enums import MetricFlags
 from aiperf.common.exceptions import NoMetricValue
+from aiperf.common.models import ErrorDetails
 from aiperf.metrics.metric_dicts import MetricRecordDict, MetricResultsDict

Then remove the local imports from the test methods:

     def test_error_isl_basic(self):
         """Test basic error input sequence length extraction"""
-        from aiperf.common.models import ErrorDetails
-
         record = create_record(
     def test_error_isl_none_raises(self):
         """Test handling of None input tokens raises error"""
-        from aiperf.common.models import ErrorDetails
-
         record = create_record(

Also applies to: 106-106

tests/post_processors/test_record_export_results_processor.py (1)

149-174: Simplify file clearing assertion.

The implementation calls self.output_file.unlink(missing_ok=True) in __init__, which always removes the file. The conditional check at lines 169-174 is unnecessarily ambiguous.

Apply this diff to assert the expected behavior more directly:

-        # File should be cleared or not exist
-        if processor.output_file.exists():
-            content = processor.output_file.read_text()
-            assert content == ""
-        else:
-            assert not processor.output_file.exists()
+        # File should be removed by unlink(missing_ok=True)
+        assert not processor.output_file.exists()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce7ed08 and ae5d225.

📒 Files selected for processing (32)
  • README.md (1 hunks)
  • aiperf/common/config/__init__.py (1 hunks)
  • aiperf/common/enums/metric_enums.py (2 hunks)
  • aiperf/common/messages/credit_messages.py (1 hunks)
  • aiperf/common/models/base_models.py (1 hunks)
  • aiperf/common/models/record_models.py (4 hunks)
  • aiperf/common/utils.py (1 hunks)
  • aiperf/metrics/metric_dicts.py (2 hunks)
  • aiperf/metrics/metric_registry.py (2 hunks)
  • aiperf/metrics/types/error_request_count.py (1 hunks)
  • aiperf/metrics/types/input_sequence_length_metric.py (3 hunks)
  • aiperf/metrics/types/max_response_metric.py (2 hunks)
  • aiperf/metrics/types/min_request_metric.py (2 hunks)
  • aiperf/metrics/types/request_count_metric.py (1 hunks)
  • aiperf/parsers/inference_result_parser.py (7 hunks)
  • aiperf/post_processors/base_metrics_processor.py (1 hunks)
  • aiperf/records/record_processor_service.py (5 hunks)
  • aiperf/records/records_manager.py (6 hunks)
  • aiperf/timing/credit_manager.py (1 hunks)
  • aiperf/timing/fixed_schedule_strategy.py (1 hunks)
  • aiperf/timing/request_rate_strategy.py (1 hunks)
  • aiperf/timing/timing_manager.py (2 hunks)
  • aiperf/workers/worker.py (4 hunks)
  • docs/tutorials/working-with-profile-exports.md (1 hunks)
  • tests/metrics/test_input_sequence_length_metric.py (2 hunks)
  • tests/parsers/test_inference_result_parser.py (2 hunks)
  • tests/post_processors/conftest.py (2 hunks)
  • tests/post_processors/test_metric_results_processor.py (4 hunks)
  • tests/post_processors/test_post_processor_integration.py (4 hunks)
  • tests/post_processors/test_record_export_results_processor.py (1 hunks)
  • tests/timing_manager/conftest.py (2 hunks)
  • tests/workers/test_worker.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/post_processors/test_post_processor_integration.py
  • aiperf/metrics/types/min_request_metric.py
  • aiperf/metrics/types/max_response_metric.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T21:15:21.514Z
Learnt from: ajcasagrande
PR: ai-dynamo/aiperf#325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.514Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.

Applied to files:

  • aiperf/metrics/metric_dicts.py
  • aiperf/common/enums/metric_enums.py
🧬 Code graph analysis (17)
aiperf/metrics/metric_registry.py (2)
aiperf/common/enums/metric_enums.py (2)
  • MetricFlags (365-455)
  • has_flags (438-441)
aiperf/metrics/base_metric.py (1)
  • has_flags (147-149)
aiperf/post_processors/base_metrics_processor.py (1)
aiperf/metrics/metric_registry.py (3)
  • MetricRegistry (27-295)
  • tags_applicable_to (141-170)
  • create_dependency_order_for (255-295)
aiperf/metrics/types/error_request_count.py (1)
aiperf/common/enums/metric_enums.py (1)
  • MetricFlags (365-455)
aiperf/common/utils.py (1)
aiperf/common/models/record_models.py (1)
  • start_perf_ns (544-546)
aiperf/metrics/metric_dicts.py (5)
aiperf/common/aiperf_logger.py (1)
  • AIPerfLogger (25-239)
aiperf/common/enums/metric_enums.py (4)
  • MetricType (273-288)
  • MetricFlags (365-455)
  • missing_flags (447-455)
  • has_flags (438-441)
aiperf/common/exceptions.py (3)
  • MetricTypeError (126-127)
  • MetricUnitError (130-131)
  • NoMetricValue (142-143)
aiperf/common/models/record_models.py (1)
  • MetricValue (59-63)
aiperf/metrics/metric_registry.py (2)
  • MetricRegistry (27-295)
  • get_class (112-121)
tests/post_processors/conftest.py (4)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/enums/message_enums.py (1)
  • MessageType (7-47)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/common/models/record_models.py (1)
  • MetricRecordMetadata (66-123)
aiperf/metrics/types/input_sequence_length_metric.py (2)
aiperf/common/enums/metric_enums.py (2)
  • GenericMetricUnit (182-189)
  • MetricFlags (365-455)
aiperf/metrics/derived_sum_metric.py (1)
  • DerivedSumMetric (15-63)
aiperf/common/messages/credit_messages.py (2)
aiperf/common/models/progress_models.py (1)
  • phase (143-145)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
tests/parsers/test_inference_result_parser.py (5)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
aiperf/common/models/record_models.py (3)
  • RequestRecord (256-452)
  • has_error (365-367)
  • has_error (582-584)
aiperf/common/models/dataset_models.py (2)
  • Text (24-27)
  • Turn (43-70)
aiperf/common/tokenizer.py (2)
  • Tokenizer (25-161)
  • encode (79-94)
aiperf/parsers/inference_result_parser.py (3)
  • get_tokenizer (92-101)
  • get_turn (216-240)
  • parse_request_record (103-169)
aiperf/records/records_manager.py (7)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/metrics/metric_dicts.py (2)
  • MetricRecordDict (50-111)
  • append (161-167)
aiperf/common/factories.py (16)
  • ResultsProcessorFactory (576-596)
  • create_instance (183-209)
  • create_instance (284-315)
  • create_instance (354-364)
  • create_instance (375-381)
  • create_instance (390-395)
  • create_instance (406-414)
  • create_instance (425-430)
  • create_instance (439-447)
  • create_instance (456-464)
  • create_instance (491-499)
  • create_instance (536-550)
  • create_instance (561-573)
  • create_instance (584-596)
  • create_instance (607-611)
  • create_instance (620-628)
aiperf/common/protocols.py (4)
  • debug (67-67)
  • is_trace_enabled (58-58)
  • trace (66-66)
  • process_result (505-505)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/post_processors/record_export_results_processor.py (1)
  • process_result (50-78)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
aiperf/metrics/types/request_count_metric.py (1)
aiperf/common/enums/metric_enums.py (1)
  • MetricFlags (365-455)
tests/metrics/test_input_sequence_length_metric.py (5)
aiperf/metrics/types/input_sequence_length_metric.py (3)
  • ErrorInputSequenceLengthMetric (66-80)
  • InputSequenceLengthMetric (12-42)
  • TotalErrorInputSequenceLengthMetric (83-104)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
tests/metrics/conftest.py (2)
  • create_record (21-63)
  • create_metric_array (113-118)
aiperf/common/exceptions.py (1)
  • NoMetricValue (142-143)
aiperf/common/enums/metric_enums.py (2)
  • has_flags (438-441)
  • MetricFlags (365-455)
aiperf/parsers/inference_result_parser.py (2)
aiperf/common/models/record_models.py (2)
  • ParsedResponseRecord (525-604)
  • RequestRecord (256-452)
aiperf/records/record_processor_service.py (1)
  • get_tokenizer (107-116)
aiperf/common/models/record_models.py (3)
aiperf/common/models/base_models.py (1)
  • AIPerfBaseModel (25-54)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/models/error_models.py (1)
  • ErrorDetails (11-47)
tests/post_processors/test_metric_results_processor.py (2)
aiperf/post_processors/metric_results_processor.py (1)
  • process_result (68-100)
tests/post_processors/conftest.py (1)
  • create_metric_records_message (247-284)
aiperf/records/record_processor_service.py (6)
aiperf/common/exceptions.py (1)
  • PostProcessorDisabled (146-147)
aiperf/common/models/record_models.py (5)
  • MetricRecordMetadata (66-123)
  • ParsedResponseRecord (525-604)
  • timestamp_ns (549-551)
  • start_perf_ns (544-546)
  • end_perf_ns (556-567)
aiperf/common/protocols.py (3)
  • PushClientProtocol (154-155)
  • RecordProcessorProtocol (492-497)
  • push (155-155)
aiperf/common/utils.py (1)
  • compute_time_ns (110-128)
aiperf/common/factories.py (15)
  • create_instance (183-209)
  • create_instance (284-315)
  • create_instance (354-364)
  • create_instance (375-381)
  • create_instance (390-395)
  • create_instance (406-414)
  • create_instance (425-430)
  • create_instance (439-447)
  • create_instance (456-464)
  • create_instance (491-499)
  • create_instance (536-550)
  • create_instance (561-573)
  • create_instance (584-596)
  • create_instance (607-611)
  • create_instance (620-628)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
tests/post_processors/test_record_export_results_processor.py (7)
aiperf/common/config/output_config.py (2)
  • OutputConfig (16-50)
  • export_level (49-50)
aiperf/common/enums/timing_enums.py (1)
  • CreditPhase (32-42)
aiperf/common/messages/inference_messages.py (1)
  • MetricRecordsMessage (24-43)
aiperf/common/models/record_models.py (3)
  • MetricRecordInfo (126-140)
  • MetricRecordMetadata (66-123)
  • MetricValue (59-63)
aiperf/metrics/metric_dicts.py (1)
  • MetricRecordDict (50-111)
aiperf/post_processors/record_export_results_processor.py (4)
  • RecordExportResultsProcessor (23-88)
  • process_result (50-78)
  • _shutdown (85-88)
  • summarize (80-82)
tests/post_processors/conftest.py (2)
  • create_metric_records_message (247-284)
  • mock_metric_registry (176-197)
🪛 Ruff (0.13.3)
aiperf/common/utils.py

127-127: Avoid specifying long messages outside the exception class

(TRY003)

aiperf/parsers/inference_result_parser.py

116-116: Do not catch blind exception: Exception

(BLE001)


143-143: Do not catch blind exception: Exception

(BLE001)


162-162: Do not catch blind exception: Exception

(BLE001)

tests/post_processors/test_record_export_results_processor.py

98-98: Unused lambda argument: self

(ARG005)


205-205: Unused method argument: mock_metric_registry

(ARG002)


254-254: Unused method argument: mock_metric_registry

(ARG002)


279-279: Unused method argument: mock_metric_registry

(ARG002)


309-309: Unused method argument: sample_metric_records_message

(ARG002)


310-310: Unused method argument: mock_metric_registry

(ARG002)


363-363: Unused method argument: mock_metric_registry

(ARG002)


398-398: Unused method argument: mock_metric_registry

(ARG002)


447-447: Unused method argument: mock_metric_registry

(ARG002)


486-486: Unused method argument: mock_metric_registry

(ARG002)


517-517: Unused method argument: sample_metric_records_message

(ARG002)


518-518: Unused method argument: mock_metric_registry

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (43)
aiperf/common/config/__init__.py (1)

174-174: Verify that this change was made through mkinit regeneration.

This file is auto-generated by mkinit with an explicit warning not to edit below line 10. If parse_str_as_numeric_dict was manually added to __all__, the change will be lost on the next mkinit run.

Please confirm this change was made by:

  1. Updating the mkinit configuration or flags (if needed)
  2. Running mkinit to regenerate the file

If added manually, consider following the proper mkinit workflow to ensure the change persists.

aiperf/timing/credit_manager.py (1)

26-35: LGTM! Protocol signature extended for per-credit tracking.

The addition of the required credit_num: int parameter to drop_credit is consistent with the broader PR goal of enabling per-record metrics export. The parameter is positioned logically after credit_phase and the change propagates correctly through implementations (timing_manager, strategies, and tests).

aiperf/metrics/types/request_count_metric.py (1)

24-25: LGTM! Appropriate flag and display order adjustments for aggregate metric.

The addition of NO_INDIVIDUAL_RECORDS flag is correct for an aggregate counter metric—request count is a summary statistic not meaningful at the per-record level. The display order adjustment (1000 → 1100) is a minor reordering that doesn't affect functionality.

tests/timing_manager/conftest.py (1)

109-131: LGTM! Test fixture correctly updated for protocol change.

The MockCreditManager.drop_credit signature and CreditDropMessage construction are properly updated to include the new credit_num parameter, maintaining consistency with the CreditManagerProtocol change.

aiperf/metrics/types/error_request_count.py (1)

24-24: LGTM! Correct flag for aggregate error metric.

Adding NO_INDIVIDUAL_RECORDS to ErrorRequestCountMetric is appropriate—error count is a summary statistic not relevant at the per-record level. This is consistent with the same flag addition to RequestCountMetric.

tests/workers/test_worker.py (3)

225-233: LGTM! Test correctly includes new credit_num parameter.

The CreditDropMessage construction is properly updated with credit_num=1, aligning with the protocol change and ensuring test coverage of the new per-credit tracking field.


275-283: LGTM! Test correctly includes new credit_num parameter.

Consistent with other test updates, credit_num=1 is properly included in the message construction.


326-330: LGTM! Test correctly includes new credit_num parameter.

The test properly constructs CreditDropMessage with the required credit_num field.

aiperf/common/utils.py (1)

110-128: LGTM! Utility function correctly converts perf_counter to wall clock time.

The compute_time_ns function properly handles:

  • Timestamp conversion from perf_counter_ns to time_ns reference frame
  • None propagation for optional timestamps
  • Input validation preventing time travel (perf_ns < start_perf_ns)

The error message at line 127 includes contextual values for debugging, which is appropriate despite the static analysis hint (TRY003). The message is concise and aids troubleshooting.

aiperf/metrics/metric_registry.py (1)

154-170: Align flag logic with documentation
The condition now correctly treats MetricFlags.NONE as “no required flags,” matching the docstring. No external callers were found relying on the previous stricter behavior.

aiperf/timing/fixed_schedule_strategy.py (1)

107-117: LGTM!

The credit sequencing logic is correct. Passing phase_stats.sent before incrementing ensures each credit gets a unique, sequential number. The explanatory comment helpfully clarifies why the increment occurs after the call.

aiperf/timing/request_rate_strategy.py (1)

70-77: LGTM!

Credit numbering follows the same pattern as fixed_schedule_strategy.py, maintaining consistency across timing strategies.

docs/tutorials/working-with-profile-exports.md (1)

1-314: Excellent documentation!

This comprehensive guide provides clear examples for working with profile exports, including:

  • Well-structured data model documentation with source references
  • Detailed field descriptions for both successful and failed requests
  • Practical code examples for synchronous and asynchronous loading
  • Useful correlation patterns between inputs and results

The documentation will significantly improve the developer experience when working with AIPerf outputs.

aiperf/timing/timing_manager.py (1)

175-198: LGTM!

The signature change correctly adds credit_num as a required positional parameter and forwards it to CreditDropMessage. Making subsequent parameters keyword-only (with *) is good practice for API stability.

Note: This is a breaking change to the drop_credit interface, but all call sites in the PR have been updated consistently.

aiperf/common/enums/metric_enums.py (2)

169-174: LGTM!

The type check now accepts both MetricTimeUnit and MetricTimeUnitInfo, providing more flexibility for unit conversions.


434-436: LGTM!

The NO_INDIVIDUAL_RECORDS flag is well-documented and provides a clear mechanism to exclude aggregate metrics from per-record exports.

aiperf/common/messages/credit_messages.py (1)

25-33: LGTM!

The new credit_num field is well-documented with appropriate validation (ge=0). The description clearly explains its purpose for tracking progress and request ordering within a credit phase.

aiperf/records/record_processor_service.py (3)

86-98: LGTM!

Wrapping processor initialization in try/except to handle PostProcessorDisabled allows processors to be gracefully skipped when disabled. The debug message provides helpful feedback.


118-153: LGTM!

The _create_metric_record_metadata helper cleanly consolidates metadata construction. Using compute_time_ns to convert perf_ns timestamps to wall-clock time_ns ensures consistent timestamp representation for export.


169-178: LGTM!

The refactoring consolidates metadata into a single structured field, improving message schema consistency and reducing the number of top-level fields in MetricRecordsMessage.

tests/post_processors/test_metric_results_processor.py (1)

43-103: LGTM!

Test updates consistently migrate from direct dict construction to using create_metric_records_message, aligning with the new message-based processing flow. The mechanical nature of these changes reduces the risk of test-specific bugs.

aiperf/parsers/inference_result_parser.py (1)

187-187: LGTM! Cleaner tokenizer retrieval pattern.

The refactor to fetch the tokenizer internally within compute_input_token_count() (line 250) rather than pre-fetching it (removed from line 187) improves encapsulation and reduces coupling. This aligns well with the on-demand tokenizer creation pattern in get_tokenizer().

Also applies to: 242-254

aiperf/common/models/record_models.py (4)

59-64: LGTM! Clean value-unit pairing for export.

The MetricValue model provides a simple and effective structure for pairing metric values with their units, which is essential for the per-record export functionality.


66-124: LGTM! Comprehensive metadata model.

The MetricRecordMetadata model captures all essential request metadata for export. The field descriptions clearly explain each attribute's purpose and measurement method (e.g., time.time_ns() for timestamps).


126-141: LGTM! Well-structured record info model.

MetricRecordInfo effectively combines metadata, metrics, and error information into a single exportable structure, aligning with the JSON Lines format described in the PR objectives.


263-268: LGTM! credit_num field properly constrained.

The new credit_num field includes appropriate constraints (ge=0) and a clear description explaining its purpose in tracking credit phase progress and request ordering.

aiperf/records/records_manager.py (4)

109-123: LGTM! Graceful handling of disabled processors.

The try/except block properly handles PostProcessorDisabled exceptions during processor initialization, allowing the service to continue with remaining processors. The debug messages provide clear feedback about which processors were created or skipped.


137-139: LGTM! Clean conversion to MetricRecordDict.

The conversion to MetricRecordDict (lines 137-139) enables structured access to metric values and the to_display_dict functionality. The updated _should_include_request_by_duration() signature and implementation correctly operate on the converted record dictionaries.

Also applies to: 178-212


264-273: LGTM! Improved processor interface.

Passing the full MetricRecordsMessage to processors (line 270) provides processors with complete context including metadata, results, and error information, which is cleaner than passing individual fields.


131-135: All message metadata fields consistently accessed via message.metadata. No remaining direct message.credit_phase, message.worker_id, or message.benchmark_phase usages detected.

aiperf/metrics/types/input_sequence_length_metric.py (3)

14-14: LGTM: Docstring clarity improvement.

The docstring updates clarify that the existing metrics apply to "valid records," which makes sense given the addition of error-specific variants.

Also applies to: 47-47


66-80: LGTM: Error-specific metric follows established pattern.

The ErrorInputSequenceLengthMetric correctly:

  • Inherits from InputSequenceLengthMetric to reuse the extraction logic
  • Adds the ERROR_ONLY flag to ensure it only computes for error records
  • Maintains consistent metadata and flag combinations

83-104: Review NO_INDIVIDUAL_RECORDS on DerivedSumMetric classes
TotalErrorInputSequenceLengthMetric (and its siblings TotalInputSequenceLengthMetric, TotalReasoningTokensMetric, TotalOutputTokensMetric) currently omit MetricFlags.NO_INDIVIDUAL_RECORDS and will therefore be included in profile_export.jsonl. If these “total” metrics should be excluded from per‐record exports, add MetricFlags.NO_INDIVIDUAL_RECORDS to their flags tuples.

tests/metrics/test_input_sequence_length_metric.py (2)

90-122: LGTM: Comprehensive test coverage for error-specific metrics.

The test class covers:

  • Basic extraction of input tokens from error records
  • Handling of None input tokens (expecting NoMetricValue)
  • Metadata and flags validation

The tests appropriately mirror the structure of the non-error variant tests, ensuring consistency.


124-146: LGTM: Complete test coverage for total error metric.

The parametrized tests cover various sum calculation scenarios (multiple values, single value, empty list), and metadata validation confirms correct flag configuration.

tests/post_processors/test_record_export_results_processor.py (8)

1-73: Test setup and fixtures are well-structured.

The imports, fixtures, and sample data are appropriate for testing the record export processor. The sample_metric_records_message fixture creates a message with 2 result dicts, which aligns with the multi-record-per-message semantics used throughout the tests.


75-148: Comprehensive initialization test coverage.

The tests properly verify processor initialization behavior across different export levels, output directory creation, and file handling. The parametrized test approach is effective for exercising different configurations.


226-234: Verify the intended per-message vs per-result-dict export semantics.

The tests expect multiple JSONL records per MetricRecordsMessage (e.g., line 226 expects record_count == 2 for a message with 2 result dicts). However, past review comments on these exact lines indicated the processor should merge all result dicts into a single record per message and were marked as "✅ Addressed".

The current implementation iterates through message.results and writes one JSONL entry per result dict:

for record_dict in record_dicts:
    # ... write one line per dict
    self.record_count += 1

Please verify whether:

  1. The current behavior (multiple records per message) is correct, and the past comments were mistaken, OR
  2. The merging fix was not fully applied, despite being marked as addressed

If the current multi-record-per-message behavior is correct, you may want to generate a shell script to verify that message.results truly contains separate records from different processors (not redundant data that should be merged).

Also applies to: 337-344


248-352: Excellent coverage of edge cases and error scenarios.

The tests for empty display metrics, error handling, and multi-message accumulation are thorough and properly isolated using mocks. The error handling test correctly verifies that exceptions are logged rather than propagated.


354-437: Thorough validation of JSONL output format.

The file format tests properly verify both the JSONL structure (newline-delimited JSON) and the nested record structure using Pydantic model validation. The use of orjson for parsing aligns with the implementation.


439-507: Logging behavior is properly tested.

The periodic debug logging test correctly verifies that the processor logs every 100 records (achieved by sending 50 messages with 2 results each). The error logging test validates proper error messages.


509-570: Shutdown and summarization tests are complete.

The tests properly verify that the processor logs final statistics on shutdown and returns an empty list from summarize() (since no aggregation is performed for per-record export).


98-98: Static analysis warning is a false positive.

The unused lambda argument self on line 98 is intentional for property mocking with monkeypatch.setattr. This is a standard pattern for mocking property getters in pytest. The Ruff warning can be safely ignored.

@the-david-oy
Copy link
Contributor

Excellent work, this looks good! Can you address or respond to CodeRabbit's comments? This should then be ready to approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants