-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Add per-record metrics export to profile_export.jsonl #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@ajcasagrande has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 36 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds per-record export support, new ExportLevel enum and record models, refactors MetricRecordsMessage to carry metadata, updates processors/protocols to accept messages, implements RecordExportResultsProcessor (JSONL), propagates credit_num through timing/workers, updates metrics flags and token-counting parser, and expands tests/docs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Strategy
participant TimingMgr as TimingManager
participant Worker as Worker
participant RecordSvc as RecordProcessorService
Note over Strategy,TimingMgr: pass credit_num
Strategy->>TimingMgr: drop_credit(phase, credit_num=phase_stats.sent, ...)
TimingMgr->>Worker: CreditDropMessage{phase, credit_num, ...}
Worker->>Worker: _build_response_record(..., message=CreditDropMessage)
Worker->>Worker: record.credit_num = message.credit_num
Worker-->>RecordSvc: ParsedResponseRecord -> Record creation
sequenceDiagram
autonumber
participant RecordSvc as RecordProcessorService
participant Utils as compute_time_ns
participant Msg as MetricRecordsMessage
participant ResultsProc as MetricResultsProcessor
participant ExportProc as RecordExportResultsProcessor
participant FS as FileSystem
RecordSvc->>Utils: compute_time_ns(start_time, start_perf, perf_ns)
Utils-->>RecordSvc: wall_clock_ns
RecordSvc->>Msg: create MetricRecordsMessage(metadata, results, error)
RecordSvc-->>ResultsProc: process_result(message)
ResultsProc->>ResultsProc: iterate message.results -> MetricRecordDict
ResultsProc-->>ExportProc: process_result(message)
ExportProc->>ExportProc: to_display_dict(registry, show_internal?)
alt display non-empty
ExportProc->>FS: append JSONL line (async)
else empty
ExportProc-->>ExportProc: skip write
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
aiperf/common/config/output_config.py (1)
48-50
: Consider making export_level configurable.The
export_level
property currently returns a hardcodedExportLevel.RECORDS
. While this aligns with the PR objectives and default configuration, consider whether this should be a configurable field (similar toprofile_export_file
) to allow users to select different export levels (SUMMARY
,RECORDS
,RAW
) via CLI or config file.If making this configurable, you could add:
+ export_level: Annotated[ + ExportLevel, + Field( + description="The level of detail to export in the profile export file.", + ), + CLIParameter( + name=("--export-level",), + group=_CLI_GROUP, + ), + ] = OutputDefaults.EXPORT_LEVEL + - @property - def export_level(self) -> ExportLevel: - return ExportLevel.RECORDStests/post_processors/conftest.py (1)
200-238
: Stop passing unsupportederror
into MetricRecordMetadata
MetricRecordMetadata
(see aiperf/common/models/record_models.py) no longer defines anerror
field, so forwardingerror=...
here is either ignored or triggers a validation error depending on the model config. It also makes the helper’s docstring lie about what callers get back. Please drop the parameter/forwarding so the helper reflects the actual metadata surface.-def create_metric_metadata( +def create_metric_metadata( conversation_id: str | None = None, turn_index: int = 0, timestamp_ns: int = 1_000_000_000, worker_id: str = "worker-1", record_processor_id: str = "processor-1", credit_phase: CreditPhase = CreditPhase.PROFILING, x_request_id: str | None = None, - x_correlation_id: str | None = None, - error: ErrorDetails | None = None, + x_correlation_id: str | None = None, ) -> MetricRecordMetadata: @@ - error: Error details if any @@ - x_correlation_id=x_correlation_id, - error=error, + x_correlation_id=x_correlation_id,aiperf/post_processors/record_export_results_processor.py (1)
50-78
: Consider batching writes for better performance.The per-record file open/close pattern is safe but may become a bottleneck under high throughput. The broad exception catch (line 77) is appropriate here since it prevents a single malformed record from halting the entire export process.
For improved performance with high record counts, consider batching writes:
async def process_result(self, message: MetricRecordsMessage) -> None: + json_lines = [] record_dicts = [MetricRecordDict(result) for result in message.results] for record_dict in record_dicts: try: display_metrics = record_dict.to_display_dict( MetricRegistry, self.show_internal ) if not display_metrics: continue record_info = MetricRecordInfo( metadata=message.metadata, metrics=display_metrics, error=message.error, ) - json_str = record_info.model_dump_json() - - async with aiofiles.open( - self.output_file, mode="a", encoding="utf-8" - ) as f: - await f.write(json_str) - await f.write("\n") + json_lines.append(record_info.model_dump_json()) self.record_count += 1 if self.record_count % 100 == 0: self.debug(f"Wrote {self.record_count} record metrics") except Exception as e: self.error(f"Failed to write record metrics: {e}") + + if json_lines: + async with aiofiles.open( + self.output_file, mode="a", encoding="utf-8" + ) as f: + await f.write("\n".join(json_lines) + "\n")examples/parse_profile_export.py (1)
19-40
: LGTM! Loader implementations are correct.Both sync and async loaders properly parse JSONL files using Pydantic's
model_validate_json
. The conditional import ofaiofiles
in the async function is appropriate.For more robust example code, consider adding error handling for malformed lines:
def load_records(file_path: Path) -> list[MetricRecordInfo]: """Load profile_export.jsonl file into structured Pydantic models in sync mode.""" records = [] with open(file_path, encoding="utf-8") as f: for line in f: if line.strip(): - record = MetricRecordInfo.model_validate_json(line) - records.append(record) + try: + record = MetricRecordInfo.model_validate_json(line) + records.append(record) + except Exception as e: + print(f"Warning: Skipping malformed line: {e}") return records
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
aiperf/common/config/config_defaults.py
(2 hunks)aiperf/common/config/output_config.py
(2 hunks)aiperf/common/enums/__init__.py
(2 hunks)aiperf/common/enums/data_exporter_enums.py
(1 hunks)aiperf/common/enums/metric_enums.py
(1 hunks)aiperf/common/enums/post_processor_enums.py
(1 hunks)aiperf/common/exceptions.py
(1 hunks)aiperf/common/messages/inference_messages.py
(2 hunks)aiperf/common/models/__init__.py
(2 hunks)aiperf/common/models/record_models.py
(3 hunks)aiperf/common/protocols.py
(2 hunks)aiperf/metrics/metric_dicts.py
(2 hunks)aiperf/metrics/types/max_response_metric.py
(1 hunks)aiperf/metrics/types/min_request_metric.py
(1 hunks)aiperf/post_processors/__init__.py
(1 hunks)aiperf/post_processors/metric_results_processor.py
(2 hunks)aiperf/post_processors/record_export_results_processor.py
(1 hunks)aiperf/records/record_processor_service.py
(4 hunks)aiperf/records/records_manager.py
(6 hunks)examples/README.md
(1 hunks)examples/parse_profile_export.py
(1 hunks)tests/post_processors/conftest.py
(2 hunks)tests/post_processors/test_metric_results_processor.py
(4 hunks)tests/post_processors/test_post_processor_integration.py
(4 hunks)tests/post_processors/test_record_export_results_processor.py
(1 hunks)tests/records/conftest.py
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
examples/README.md
90-90: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
125-125: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.13.2)
tests/post_processors/test_record_export_results_processor.py
98-98: Unused lambda argument: self
(ARG005)
205-205: Unused method argument: mock_metric_registry
(ARG002)
254-254: Unused method argument: mock_metric_registry
(ARG002)
279-279: Unused method argument: mock_metric_registry
(ARG002)
309-309: Unused method argument: sample_metric_records_message
(ARG002)
310-310: Unused method argument: mock_metric_registry
(ARG002)
363-363: Unused method argument: mock_metric_registry
(ARG002)
398-398: Unused method argument: mock_metric_registry
(ARG002)
447-447: Unused method argument: mock_metric_registry
(ARG002)
486-486: Unused method argument: mock_metric_registry
(ARG002)
517-517: Unused method argument: sample_metric_records_message
(ARG002)
518-518: Unused method argument: mock_metric_registry
(ARG002)
aiperf/post_processors/record_export_results_processor.py
28-28: Unused method argument: service_id
(ARG002)
37-39: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Do not catch blind exception: Exception
(BLE001)
aiperf/post_processors/metric_results_processor.py
93-93: Abstract raise
to an inner function
(TRY301)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
96-96: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (14)
aiperf/common/models/__init__.py (1)
53-56
: LGTM! New model exports are correct.The re-export of
MetricRecordInfo
,MetricRecordMetadata
, andMetricValue
fromrecord_models
is correctly reflected in the__all__
list, making these classes part of the public API surface.Also applies to: 96-99
aiperf/metrics/types/min_request_metric.py (1)
5-5
: LGTM! MetricDateTimeUnit removal is consistent.The removal of the
MetricDateTimeUnit
import and the associateddisplay_unit
assignment aligns with the PR-wide removal of this enum. The metric now relies on default display behavior withunit = MetricTimeUnit.NANOSECONDS
.aiperf/common/config/config_defaults.py (1)
14-14
: LGTM! Configuration updates align with new export features.The addition of
ExportLevel
import, the update toPROFILE_EXPORT_FILE
to use.jsonl
extension, and the newEXPORT_LEVEL
constant correctly support the new per-record metrics export functionality.Also applies to: 117-117, 123-123
aiperf/common/exceptions.py (1)
146-148
: LGTM! New exception class is well-defined.The
PostProcessorDisabled
exception is correctly defined as a subclass ofAIPerfError
with a clear docstring. This exception enables processor initialization logic to skip disabled post-processors gracefully.aiperf/common/enums/data_exporter_enums.py (1)
19-29
: LGTM! ExportLevel enum is well-defined.The new
ExportLevel
enum correctly defines three export levels (SUMMARY
,RECORDS
,RAW
) with clear docstrings describing their semantics. This provides a clean abstraction for controlling export granularity.aiperf/common/enums/metric_enums.py (1)
171-171
: LGTM! MetricDateTimeUnit removal is correct.The removal of
MetricDateTimeUnit
and the narrowing of the type check inconvert_to
toMetricTimeUnit | MetricTimeUnitInfo
is consistent with the PR-wide removal of this enum. The datetime import and conversion path have been correctly removed.tests/post_processors/test_metric_results_processor.py (1)
12-12
: LGTM! Test updates correctly adopt message-based processing.The tests have been correctly updated to use the new
create_metric_records_message
helper fromtests.post_processors.conftest
, replacing directMetricRecordDict
construction with message-based inputs. This aligns with the broader refactor to message-centric processing.Also applies to: 17-17, 43-47, 54-59, 71-75, 91-103
aiperf/post_processors/record_export_results_processor.py (2)
26-48
: LGTM! Initialization logic is correct.The initialization properly validates export level, sets up the output file path, creates necessary directories, and clears any existing output file. The
service_id
parameter is unused but required by the factory protocol pattern.
80-88
: LGTM! Summarize and shutdown implementations are correct.The empty
summarize
method is appropriate for this export-only processor, and the shutdown hook provides useful diagnostic information about the total records written.examples/parse_profile_export.py (1)
71-87
: LGTM! Main function implementation is correct.The CLI setup with cyclopts and conditional async/sync loading is well-structured and appropriate for an example script.
aiperf/records/records_manager.py (4)
108-123
: LGTM! Defensive processor initialization is well-implemented.The try/except wrapper correctly handles processors that opt-out via
PostProcessorDisabled
, allowing the system to gracefully skip disabled processors without failing initialization.
126-144
: LGTM! Metadata access refactoring is correct.The migration to access
credit_phase
andworker_id
throughmessage.metadata
aligns with the newMetricRecordMetadata
structure, and the introduction ofMetricRecordDict
provides structured metric access.
177-212
: LGTM! Duration filtering correctly adapted to MetricRecordDict.The refactored filtering logic properly uses
MetricRecordDict.get()
to retrieve metric values while preserving the all-or-nothing filtering behavior.
264-273
: LGTM! Results processor invocation correctly updated.The refactored method properly passes the complete
MetricRecordsMessage
to each processor, enabling them to access metadata, metrics, and error information in a structured way.
8cfc99e
to
b04d012
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
examples/README.md (1)
90-90
: Use proper heading syntax for example sections.The bold text for "Example: Successful Request" and "Example: Failed Request" should use proper Markdown heading syntax (e.g.,
### Example: Successful Request
) for better document structure and navigation.Apply this diff:
-**Example: Successful Request** +### Example: Successful Request-**Example: Failed Request** +### Example: Failed RequestAlso applies to: 125-125
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
aiperf/common/config/config_defaults.py
(2 hunks)aiperf/common/config/output_config.py
(2 hunks)aiperf/common/enums/__init__.py
(2 hunks)aiperf/common/enums/data_exporter_enums.py
(1 hunks)aiperf/common/enums/metric_enums.py
(1 hunks)aiperf/common/enums/post_processor_enums.py
(1 hunks)aiperf/common/exceptions.py
(1 hunks)aiperf/common/models/__init__.py
(2 hunks)aiperf/common/models/record_models.py
(3 hunks)aiperf/common/protocols.py
(2 hunks)aiperf/metrics/metric_dicts.py
(2 hunks)aiperf/metrics/types/max_response_metric.py
(1 hunks)aiperf/metrics/types/min_request_metric.py
(1 hunks)aiperf/post_processors/__init__.py
(1 hunks)aiperf/post_processors/metric_results_processor.py
(2 hunks)aiperf/post_processors/record_export_results_processor.py
(1 hunks)aiperf/records/records_manager.py
(6 hunks)examples/README.md
(1 hunks)examples/parse_profile_export.py
(1 hunks)tests/post_processors/conftest.py
(2 hunks)tests/post_processors/test_metric_results_processor.py
(4 hunks)tests/post_processors/test_post_processor_integration.py
(4 hunks)tests/post_processors/test_record_export_results_processor.py
(1 hunks)tests/records/conftest.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- aiperf/common/protocols.py
- tests/records/conftest.py
- aiperf/common/models/init.py
- aiperf/metrics/types/min_request_metric.py
- aiperf/common/exceptions.py
- aiperf/metrics/types/max_response_metric.py
- aiperf/common/enums/data_exporter_enums.py
- aiperf/common/config/output_config.py
- tests/post_processors/test_metric_results_processor.py
- aiperf/common/config/config_defaults.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T21:15:21.514Z
Learnt from: ajcasagrande
PR: ai-dynamo/aiperf#325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.514Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.
Applied to files:
aiperf/metrics/metric_dicts.py
🧬 Code graph analysis (12)
aiperf/post_processors/__init__.py (1)
aiperf/post_processors/record_export_results_processor.py (1)
RecordExportResultsProcessor
(23-88)
aiperf/post_processors/metric_results_processor.py (7)
aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(30-61)aiperf/common/protocols.py (5)
process_result
(505-505)is_trace_enabled
(58-58)trace
(66-66)debug
(67-67)warning
(70-70)aiperf/post_processors/record_export_results_processor.py (1)
process_result
(50-78)aiperf/metrics/metric_dicts.py (4)
MetricRecordDict
(50-107)MetricArray
(133-207)extend
(148-155)append
(157-163)aiperf/common/enums/metric_enums.py (1)
MetricType
(273-288)aiperf/metrics/base_aggregate_metric.py (2)
BaseAggregateMetric
(13-103)current_value
(58-60)aiperf/common/exceptions.py (1)
NoMetricValue
(142-143)
aiperf/common/enums/__init__.py (1)
aiperf/common/enums/data_exporter_enums.py (1)
ExportLevel
(19-29)
tests/post_processors/test_post_processor_integration.py (3)
aiperf/metrics/metric_dicts.py (1)
MetricArray
(133-207)tests/post_processors/conftest.py (1)
create_metric_records_message
(241-278)aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)
aiperf/post_processors/record_export_results_processor.py (13)
aiperf/common/config/service_config.py (1)
ServiceConfig
(30-171)aiperf/common/config/user_config.py (1)
UserConfig
(34-280)aiperf/common/enums/data_exporter_enums.py (1)
ExportLevel
(19-29)aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/common/factories.py (1)
ResultsProcessorFactory
(576-596)aiperf/common/hooks.py (1)
on_stop
(270-287)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(30-61)aiperf/common/models/record_models.py (2)
MetricRecordInfo
(97-110)MetricResult
(24-56)aiperf/common/protocols.py (4)
ResultsProcessorProtocol
(501-507)process_result
(505-505)error
(72-72)summarize
(507-507)aiperf/metrics/metric_dicts.py (2)
MetricRecordDict
(50-107)to_display_dict
(61-107)aiperf/metrics/metric_registry.py (1)
MetricRegistry
(27-292)aiperf/post_processors/base_metrics_processor.py (1)
BaseMetricsProcessor
(13-66)aiperf/common/config/output_config.py (1)
export_level
(49-50)
tests/post_processors/conftest.py (5)
aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/enums/message_enums.py (1)
MessageType
(7-47)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(30-61)aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)aiperf/common/models/record_models.py (2)
MetricRecordMetadata
(66-94)timestamp_ns
(513-515)
aiperf/common/enums/post_processor_enums.py (1)
aiperf/common/enums/base_enums.py (1)
CaseInsensitiveStrEnum
(15-73)
aiperf/common/models/record_models.py (3)
aiperf/common/models/base_models.py (1)
AIPerfBaseModel
(25-53)aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)
aiperf/metrics/metric_dicts.py (5)
aiperf/common/aiperf_logger.py (1)
AIPerfLogger
(25-239)aiperf/common/enums/metric_enums.py (8)
MetricType
(273-288)MetricFlags
(365-448)missing_flags
(440-448)convert_to
(34-44)convert_to
(55-57)convert_to
(70-75)convert_to
(169-174)convert_to
(212-232)aiperf/common/exceptions.py (3)
MetricTypeError
(126-127)MetricUnitError
(130-131)NoMetricValue
(142-143)aiperf/common/models/record_models.py (2)
MetricResult
(24-56)MetricValue
(59-63)aiperf/metrics/metric_registry.py (2)
MetricRegistry
(27-292)get_class
(112-121)
examples/parse_profile_export.py (2)
aiperf/common/models/record_models.py (1)
MetricRecordInfo
(97-110)aiperf/metrics/metric_registry.py (1)
MetricRegistry
(27-292)
tests/post_processors/test_record_export_results_processor.py (6)
aiperf/common/config/output_config.py (2)
OutputConfig
(16-50)export_level
(49-50)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(30-61)aiperf/common/models/record_models.py (4)
MetricRecordInfo
(97-110)MetricRecordMetadata
(66-94)MetricValue
(59-63)timestamp_ns
(513-515)aiperf/metrics/metric_dicts.py (1)
MetricRecordDict
(50-107)aiperf/post_processors/record_export_results_processor.py (4)
RecordExportResultsProcessor
(23-88)process_result
(50-78)_shutdown
(85-88)summarize
(80-82)tests/post_processors/conftest.py (2)
create_metric_records_message
(241-278)mock_metric_registry
(176-197)
aiperf/records/records_manager.py (6)
aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/metrics/metric_dicts.py (2)
MetricRecordDict
(50-107)append
(157-163)aiperf/common/factories.py (16)
ResultsProcessorFactory
(576-596)create_instance
(183-209)create_instance
(284-315)create_instance
(354-364)create_instance
(375-381)create_instance
(390-395)create_instance
(406-414)create_instance
(425-430)create_instance
(439-447)create_instance
(456-464)create_instance
(491-499)create_instance
(536-550)create_instance
(561-573)create_instance
(584-596)create_instance
(607-611)create_instance
(620-628)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(30-61)aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)aiperf/post_processors/record_export_results_processor.py (1)
process_result
(50-78)
🪛 markdownlint-cli2 (0.18.1)
examples/README.md
90-90: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
125-125: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.13.2)
aiperf/post_processors/metric_results_processor.py
93-93: Abstract raise
to an inner function
(TRY301)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
96-96: Do not catch blind exception: Exception
(BLE001)
aiperf/post_processors/record_export_results_processor.py
28-28: Unused method argument: service_id
(ARG002)
37-39: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Do not catch blind exception: Exception
(BLE001)
tests/post_processors/test_record_export_results_processor.py
98-98: Unused lambda argument: self
(ARG005)
205-205: Unused method argument: mock_metric_registry
(ARG002)
254-254: Unused method argument: mock_metric_registry
(ARG002)
279-279: Unused method argument: mock_metric_registry
(ARG002)
309-309: Unused method argument: sample_metric_records_message
(ARG002)
310-310: Unused method argument: mock_metric_registry
(ARG002)
363-363: Unused method argument: mock_metric_registry
(ARG002)
398-398: Unused method argument: mock_metric_registry
(ARG002)
447-447: Unused method argument: mock_metric_registry
(ARG002)
486-486: Unused method argument: mock_metric_registry
(ARG002)
517-517: Unused method argument: sample_metric_records_message
(ARG002)
518-518: Unused method argument: mock_metric_registry
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (9)
aiperf/post_processors/__init__.py (1)
20-22
: LGTM!The new
RecordExportResultsProcessor
is properly exposed through the public API.Also applies to: 24-29
aiperf/common/enums/metric_enums.py (1)
169-174
: LGTM!The removal of
MetricDateTimeUnit
is clean, and the type check inconvert_to
has been appropriately narrowed to only handleMetricTimeUnit | MetricTimeUnitInfo
.aiperf/common/enums/__init__.py (1)
26-30
: LGTM!The public API correctly exposes
ExportLevel
and removes the deprecatedMetricDateTimeUnit
.Also applies to: 103-157
aiperf/common/enums/post_processor_enums.py (1)
7-12
: LGTM!The enhanced docstrings provide better context about the processing pipeline stages, and the new
RECORD_EXPORT
member is well-documented.Also applies to: 19-33
tests/post_processors/test_post_processor_integration.py (1)
36-50
: LGTM!The test updates correctly reflect the shift to message-based processing with
MetricRecordsMessage
. The tests now properly construct messages with metadata for each batch.Also applies to: 62-79
aiperf/metrics/metric_dicts.py (2)
7-7
: LGTM!The new imports and module-level logger are properly configured following AIPerf conventions.
Also applies to: 15-17, 21-21, 28-29
61-107
: LGTM!The
to_display_dict
method is well-implemented with proper:
- Registry lookup and missing metric handling
- Internal/experimental metric filtering (logic confirmed correct per previous review)
- Unit conversion with error handling
- MetricValue construction
Based on learnings: The
missing_flags
check correctly excludes metrics with either EXPERIMENTAL or INTERNAL flag whenshow_internal=False
.tests/post_processors/conftest.py (2)
10-12
: LGTM!The new imports properly support the message-based test infrastructure.
Also applies to: 20-21
200-278
: LGTM!Both factory functions (
create_metric_metadata
andcreate_metric_records_message
) are well-designed with:
- Sensible defaults that reduce test boilerplate
- Flexible parameter overrides
- Clean handling of optional metadata construction
- Proper typing
The special handling of
x_request_id
increate_metric_records_message
is a nice ergonomic touch.
async def process_result(self, message: MetricRecordsMessage) -> None: | ||
record_dicts = [MetricRecordDict(result) for result in message.results] | ||
for record_dict in record_dicts: | ||
try: | ||
display_metrics = record_dict.to_display_dict( | ||
MetricRegistry, self.show_internal | ||
) | ||
if not display_metrics: | ||
continue | ||
|
||
record_info = MetricRecordInfo( | ||
metadata=message.metadata, | ||
metrics=display_metrics, | ||
error=message.error, | ||
) | ||
json_str = record_info.model_dump_json() | ||
|
||
async with aiofiles.open( | ||
self.output_file, mode="a", encoding="utf-8" | ||
) as f: | ||
await f.write(json_str) | ||
await f.write("\n") | ||
|
||
self.record_count += 1 | ||
if self.record_count % 100 == 0: | ||
self.debug(f"Wrote {self.record_count} record metrics") | ||
|
||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge per-request metrics before writing JSONL
MetricRecordsMessage
can carry multiple MetricRecordDict
s for the same request (one per record processor). Emitting a line per dict duplicates metadata and splits the metrics across several rows, breaking the “one request per JSONL line with full detail” contract and double-counting once extensions add extra record processors. Please merge the dicts first and write a single record per message. For example:
- record_dicts = [MetricRecordDict(result) for result in message.results]
- for record_dict in record_dicts:
- try:
- display_metrics = record_dict.to_display_dict(
- MetricRegistry, self.show_internal
- )
- if not display_metrics:
- continue
+ record_dicts = [MetricRecordDict(result) for result in message.results]
+ merged_record = MetricRecordDict()
+ for record_dict in record_dicts:
+ merged_record.update(record_dict)
+
+ try:
+ display_metrics = merged_record.to_display_dict(
+ MetricRegistry, self.show_internal
+ )
+ if not display_metrics:
+ return
@@
- record_info = MetricRecordInfo(
- metadata=message.metadata,
- metrics=display_metrics,
- error=message.error,
- )
+ record_info = MetricRecordInfo(
+ metadata=message.metadata,
+ metrics=display_metrics,
+ error=message.error,
+ )
@@
- self.record_count += 1
- if self.record_count % 100 == 0:
- self.debug(f"Wrote {self.record_count} record metrics")
-
- except Exception as e:
- self.error(f"Failed to write record metrics: {e}")
+ self.record_count += 1
+ if self.record_count % 100 == 0:
+ self.debug(f"Wrote {self.record_count} record metrics")
+
+ except Exception as e:
+ self.error(f"Failed to write record metrics: {e}")
(Adjust control flow accordingly.)
This keeps the export aligned with the promised per-request detail. Based on learnings
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.13.2)
77-77: Do not catch blind exception: Exception
(BLE001)
assert processor.record_count == 2 # Two result dicts in the message | ||
assert processor.output_file.exists() | ||
|
||
# Verify file content | ||
with open(processor.output_file, "rb") as f: | ||
lines = f.readlines() | ||
|
||
assert len(lines) == 2 | ||
for line in lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align expectations with per-request export semantics
Once we merge all per-processor metric dicts into a single record (see processor fix), each MetricRecordsMessage
should emit exactly one JSONL entry. Please update the assertions in this test (and related ones) to expect a single line/count per message instead of record_count == 2
/ len(lines) == 2
.
🤖 Prompt for AI Agents
In tests/post_processors/test_record_export_results_processor.py around lines
226 to 234, the test still expects two records/lines per MetricRecordsMessage
(assert processor.record_count == 2 and assert len(lines) == 2) but per-request
export now emits a single merged JSONL entry per message; change the assertions
to expect a single record/line (assert processor.record_count == 1 and assert
len(lines) == 1) and adjust any loop or per-line checks accordingly to validate
that single JSONL entry.
assert processor.record_count == 10 # 5 messages * 2 results each | ||
assert processor.output_file.exists() | ||
|
||
with open(processor.output_file, "rb") as f: | ||
lines = f.readlines() | ||
|
||
assert len(lines) == 10 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust multi-message count to match merged-record output
After consolidating all metric dicts for a message into a single export record, the accumulator should report one record per message. Update this test to expect record_count == 5
and len(lines) == 5
(one per request), and add an assertion that the merged metrics include contributions from the different dicts.
🤖 Prompt for AI Agents
In tests/post_processors/test_record_export_results_processor.py around lines
337 to 344, the test still expects 10 records/lines even though messages are now
merged into a single export record per message; update the two assertions to
assert processor.record_count == 5 and assert len(lines) == 5, and add an
assertion verifying that each output record contains merged metrics from the
different input dicts (for example by loading each line and asserting expected
metric keys/values from both dicts are present in the merged record).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
aiperf/common/config/config_defaults.py
(2 hunks)aiperf/common/enums/metric_enums.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
aiperf/common/config/config_defaults.py (1)
aiperf/common/enums/data_exporter_enums.py (1)
ExportLevel
(19-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (3)
aiperf/common/config/config_defaults.py (2)
14-14
: LGTM: Import addition is correct.The
ExportLevel
enum is properly imported and used on line 124.
124-124
: LGTM: New export level default aligns with PR objectives.The new
EXPORT_LEVEL
constant with default valueExportLevel.RECORDS
correctly enables per-record metrics export as intended by this PR.aiperf/common/enums/metric_enums.py (1)
171-171
: LGTM: Type narrowing correctly reflects removal of MetricDateTimeUnit.The narrowed type check is consistent with the removal of
MetricDateTimeUnit
from the codebase. As noted in previous review comments, keeping timestamps as epoch nanoseconds simplifies usage for end users.Based on past review comments.
7062ef9
to
dc45705
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/post_processors/test_record_export_results_processor.py (1)
226-247
: Export one merged record perMetricRecordsMessage
These assertions only pass because
RecordExportResultsProcessor.process_result
loops overmessage.results
and emits a separate JSONL line for every sub-dict. EveryMetricRecordsMessage
represents a single request (complete metadata + error), and the PR objective promises “profile_export.jsonl … one record per line” with the full metric map. Duplicating metadata across multiple lines for the same request breaks that contract and forces downstream consumers to stitch partial metric fragments back together.Please merge the per-processor dicts before serializing so that each message yields exactly one JSONL entry, then update this test (and the related count-based expectations below) to assert a single record per request.
- record_dicts = [MetricRecordDict(result) for result in message.results] - for record_dict in record_dicts: - try: - display_metrics = record_dict.to_display_dict( - MetricRegistry, self.show_internal - ) - if not display_metrics: - continue - - record_info = MetricRecordInfo( - metadata=message.metadata, - metrics=display_metrics, - error=message.error, - ) - json_str = record_info.model_dump_json() - - async with aiofiles.open( - self.output_file, mode="a", encoding="utf-8" - ) as f: - await f.write(json_str) - await f.write("\n") - - self.record_count += 1 - if self.record_count % 100 == 0: - self.debug(f"Wrote {self.record_count} record metrics") - - except Exception as e: - self.error(f"Failed to write record metrics: {e}") + combined_record = MetricRecordDict() + for result in message.results: + combined_record.update(result) + + try: + display_metrics = combined_record.to_display_dict( + MetricRegistry, self.show_internal + ) + if not display_metrics: + return + + record_info = MetricRecordInfo( + metadata=message.metadata, + metrics=display_metrics, + error=message.error, + ) + json_str = record_info.model_dump_json() + + async with aiofiles.open( + self.output_file, mode="a", encoding="utf-8" + ) as f: + await f.write(json_str) + await f.write("\n") + + self.record_count += 1 + if self.record_count % 100 == 0: + self.debug(f"Wrote {self.record_count} record metrics") + + except Exception as e: + self.error(f"Failed to write record metrics: {e}")aiperf/post_processors/record_export_results_processor.py (1)
50-78
: Merge per-request metrics before writing JSONL.The current implementation writes one JSONL line per
MetricRecordDict
inmessage.results
. However,MetricRecordsMessage
can carry multiple dicts for the same request (one per record processor). This duplicates metadata and splits metrics across several rows, breaking the "one request per JSONL line with full detail" contract.Please merge the dicts first and write a single record per message, as suggested in the previous review.
Based on learnings
🧹 Nitpick comments (6)
examples/parse_profile_export.py (3)
19-27
: Consider adding error handling for file I/O operations.The sync
load_records
function does not handle potential file I/O errors (e.g., FileNotFoundError, PermissionError, encoding issues). While this is an example script, adding a try-except block with informative error messages would improve robustness and user experience.Apply this diff to add error handling:
def load_records(file_path: Path) -> list[MetricRecordInfo]: """Load profile_export.jsonl file into structured Pydantic models in sync mode.""" records = [] - with open(file_path, encoding="utf-8") as f: - for line in f: - if line.strip(): - record = MetricRecordInfo.model_validate_json(line) - records.append(record) + try: + with open(file_path, encoding="utf-8") as f: + for line in f: + if line.strip(): + record = MetricRecordInfo.model_validate_json(line) + records.append(record) + except FileNotFoundError: + print(f"Error: File not found: {file_path}") + raise + except PermissionError: + print(f"Error: Permission denied: {file_path}") + raise + except Exception as e: + print(f"Error loading records: {e}") + raise return records
30-40
: Consider adding error handling for async file I/O operations.The async
load_records_async
function does not handle potential file I/O errors (e.g., FileNotFoundError, PermissionError, encoding issues). Adding a try-except block with informative error messages would improve robustness and user experience.Apply this diff to add error handling:
async def load_records_async(file_path: Path) -> list[MetricRecordInfo]: """Load profile_export.jsonl file into structured Pydantic models in async mode.""" import aiofiles records = [] - async with aiofiles.open(file_path, encoding="utf-8") as f: - async for line in f: - if line.strip(): - record = MetricRecordInfo.model_validate_json(line) - records.append(record) + try: + async with aiofiles.open(file_path, encoding="utf-8") as f: + async for line in f: + if line.strip(): + record = MetricRecordInfo.model_validate_json(line) + records.append(record) + except FileNotFoundError: + print(f"Error: File not found: {file_path}") + raise + except PermissionError: + print(f"Error: Permission denied: {file_path}") + raise + except Exception as e: + print(f"Error loading records: {e}") + raise return records
58-63
: Consider handling potential MetricTypeError from MetricRegistry.get_class.Line 60 calls
MetricRegistry.get_class(metric_tag)
, which can raiseMetricTypeError
if the metric tag is not registered. While this is an example script, adding error handling would prevent crashes and provide a better user experience if the profile export contains unregistered metrics.Apply this diff to add error handling:
console.rule("[bold]Example Metrics[/]") for metric_tag, metric_value in record.metrics.items(): - metric_cls = MetricRegistry.get_class(metric_tag) - console.print( - f"[bold cyan]{metric_cls.header} ({metric_tag})[/]: [green]{metric_value.value} ({metric_value.unit})[/]" - ) + try: + metric_cls = MetricRegistry.get_class(metric_tag) + console.print( + f"[bold cyan]{metric_cls.header} ({metric_tag})[/]: [green]{metric_value.value} ({metric_value.unit})[/]" + ) + except Exception as e: + console.print( + f"[bold cyan]{metric_tag}[/]: [green]{metric_value.value} ({metric_value.unit})[/] [dim](header lookup failed: {e})[/]" + )aiperf/post_processors/record_export_results_processor.py (3)
28-28
: Consider removing unusedservice_id
parameter.The
service_id
parameter is not used in the class. If it's not required by the factory signature or for future extensibility, consider removing it to reduce clutter.Apply this diff:
def __init__( self, - service_id: str, service_config: ServiceConfig, user_config: UserConfig, **kwargs, ):
37-39
: Consider extracting exception message to class constant.The static analysis tool flags the long exception message. While this is a minor nitpick, extracting it to a class constant or using a shorter message would align with TRY003 guidance.
Example:
+ _DISABLED_MESSAGE_TEMPLATE = "Record export results processor is disabled for export level {}" + def __init__( ... ): ... if export_level not in (ExportLevel.RECORDS, ExportLevel.RAW): raise PostProcessorDisabled( - f"Record export results processor is disabled for export level {export_level}" + self._DISABLED_MESSAGE_TEMPLATE.format(export_level) )
77-78
: Consider narrowing exception handling or logging full traceback.The broad
Exception
catch is used to prevent one record failure from stopping the processor, which is reasonable. However, it could mask unexpected errors. Consider logging the full traceback for debugging:Apply this diff:
except Exception as e: - self.error(f"Failed to write record metrics: {e}") + self.error(f"Failed to write record metrics: {e}", exc_info=True)Alternatively, catch specific exceptions (e.g.,
IOError
,ValueError
) if the expected failure modes are known.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
examples/artifacts/run1/profile_export_aiperf.csv
is excluded by!**/*.csv
📒 Files selected for processing (27)
aiperf/common/config/config_defaults.py
(2 hunks)aiperf/common/config/output_config.py
(2 hunks)aiperf/common/enums/__init__.py
(2 hunks)aiperf/common/enums/data_exporter_enums.py
(1 hunks)aiperf/common/enums/metric_enums.py
(1 hunks)aiperf/common/enums/post_processor_enums.py
(1 hunks)aiperf/common/exceptions.py
(1 hunks)aiperf/common/messages/inference_messages.py
(2 hunks)aiperf/common/models/__init__.py
(2 hunks)aiperf/common/models/record_models.py
(3 hunks)aiperf/common/protocols.py
(2 hunks)aiperf/metrics/metric_dicts.py
(2 hunks)aiperf/metrics/types/max_response_metric.py
(1 hunks)aiperf/metrics/types/min_request_metric.py
(1 hunks)aiperf/post_processors/__init__.py
(1 hunks)aiperf/post_processors/metric_results_processor.py
(2 hunks)aiperf/post_processors/record_export_results_processor.py
(1 hunks)aiperf/records/record_processor_service.py
(4 hunks)aiperf/records/records_manager.py
(6 hunks)examples/README.md
(1 hunks)examples/artifacts/run1/profile_export_aiperf.json
(1 hunks)examples/parse_profile_export.py
(1 hunks)tests/post_processors/conftest.py
(2 hunks)tests/post_processors/test_metric_results_processor.py
(4 hunks)tests/post_processors/test_post_processor_integration.py
(4 hunks)tests/post_processors/test_record_export_results_processor.py
(1 hunks)tests/records/conftest.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- examples/artifacts/run1/profile_export_aiperf.json
🚧 Files skipped from review as they are similar to previous changes (9)
- aiperf/common/enums/post_processor_enums.py
- aiperf/common/config/output_config.py
- aiperf/metrics/types/min_request_metric.py
- aiperf/post_processors/init.py
- aiperf/common/enums/data_exporter_enums.py
- tests/post_processors/test_metric_results_processor.py
- aiperf/common/config/config_defaults.py
- aiperf/metrics/types/max_response_metric.py
- aiperf/common/exceptions.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T21:15:21.514Z
Learnt from: ajcasagrande
PR: ai-dynamo/aiperf#325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.514Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.
Applied to files:
aiperf/metrics/metric_dicts.py
🧬 Code graph analysis (15)
aiperf/common/protocols.py (3)
aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)aiperf/post_processors/record_export_results_processor.py (1)
process_result
(50-78)
tests/records/conftest.py (2)
aiperf/common/models/record_models.py (7)
ParsedResponse
(476-486)ParsedResponseRecord
(489-568)RequestRecord
(226-416)TextResponseData
(427-434)start_perf_ns
(508-510)timestamp_ns
(513-515)end_perf_ns
(520-531)tests/post_processors/conftest.py (2)
sample_request_record
(45-55)sample_parsed_record
(59-77)
aiperf/post_processors/metric_results_processor.py (6)
aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/common/protocols.py (3)
process_result
(505-505)debug
(67-67)warning
(70-70)aiperf/post_processors/record_export_results_processor.py (1)
process_result
(50-78)aiperf/metrics/metric_dicts.py (4)
MetricRecordDict
(50-107)MetricArray
(133-207)extend
(148-155)append
(157-163)aiperf/metrics/base_aggregate_metric.py (2)
BaseAggregateMetric
(13-103)current_value
(58-60)aiperf/common/exceptions.py (1)
NoMetricValue
(142-143)
aiperf/common/models/__init__.py (1)
aiperf/common/models/record_models.py (4)
MetricRecordInfo
(97-110)MetricRecordMetadata
(66-94)MetricResult
(24-56)MetricValue
(59-63)
tests/post_processors/test_post_processor_integration.py (4)
aiperf/metrics/metric_dicts.py (1)
MetricArray
(133-207)tests/post_processors/conftest.py (1)
create_metric_records_message
(241-278)aiperf/common/protocols.py (1)
process_result
(505-505)aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)
aiperf/common/enums/__init__.py (1)
aiperf/common/enums/data_exporter_enums.py (1)
ExportLevel
(19-29)
aiperf/metrics/metric_dicts.py (5)
aiperf/common/aiperf_logger.py (1)
AIPerfLogger
(25-239)aiperf/common/enums/metric_enums.py (8)
MetricType
(273-288)MetricFlags
(365-451)missing_flags
(443-451)convert_to
(34-44)convert_to
(55-57)convert_to
(70-75)convert_to
(169-174)convert_to
(212-232)aiperf/common/exceptions.py (3)
MetricTypeError
(126-127)MetricUnitError
(130-131)NoMetricValue
(142-143)aiperf/common/models/record_models.py (1)
MetricValue
(59-63)aiperf/metrics/metric_registry.py (1)
get_class
(112-121)
tests/post_processors/conftest.py (4)
aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/enums/message_enums.py (1)
MessageType
(7-47)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/common/models/record_models.py (2)
MetricRecordMetadata
(66-94)timestamp_ns
(513-515)
aiperf/post_processors/record_export_results_processor.py (12)
aiperf/common/config/user_config.py (1)
UserConfig
(34-280)aiperf/common/enums/data_exporter_enums.py (1)
ExportLevel
(19-29)aiperf/common/enums/post_processor_enums.py (1)
ResultsProcessorType
(19-33)aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/common/factories.py (1)
ResultsProcessorFactory
(576-596)aiperf/common/hooks.py (1)
on_stop
(270-287)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/common/models/record_models.py (2)
MetricRecordInfo
(97-110)MetricResult
(24-56)aiperf/common/protocols.py (4)
ResultsProcessorProtocol
(501-507)process_result
(505-505)error
(72-72)summarize
(507-507)aiperf/metrics/metric_dicts.py (2)
MetricRecordDict
(50-107)to_display_dict
(61-107)aiperf/metrics/metric_registry.py (1)
MetricRegistry
(27-292)aiperf/post_processors/base_metrics_processor.py (1)
BaseMetricsProcessor
(14-99)
tests/post_processors/test_record_export_results_processor.py (6)
aiperf/common/config/output_config.py (2)
OutputConfig
(16-50)export_level
(49-50)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/common/models/record_models.py (4)
MetricRecordInfo
(97-110)MetricRecordMetadata
(66-94)MetricValue
(59-63)timestamp_ns
(513-515)aiperf/metrics/metric_dicts.py (1)
MetricRecordDict
(50-107)aiperf/post_processors/record_export_results_processor.py (4)
RecordExportResultsProcessor
(23-88)process_result
(50-78)_shutdown
(85-88)summarize
(80-82)tests/post_processors/conftest.py (2)
create_metric_records_message
(241-278)mock_metric_registry
(176-197)
aiperf/records/records_manager.py (7)
aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/metrics/metric_dicts.py (2)
MetricRecordDict
(50-107)append
(157-163)aiperf/common/factories.py (15)
create_instance
(183-209)create_instance
(284-315)create_instance
(354-364)create_instance
(375-381)create_instance
(390-395)create_instance
(406-414)create_instance
(425-430)create_instance
(439-447)create_instance
(456-464)create_instance
(491-499)create_instance
(536-550)create_instance
(561-573)create_instance
(584-596)create_instance
(607-611)create_instance
(620-628)aiperf/common/protocols.py (4)
debug
(67-67)is_trace_enabled
(58-58)trace
(66-66)process_result
(505-505)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)aiperf/post_processors/record_export_results_processor.py (1)
process_result
(50-78)
examples/parse_profile_export.py (2)
aiperf/common/models/record_models.py (1)
MetricRecordInfo
(97-110)aiperf/metrics/metric_registry.py (1)
MetricRegistry
(27-292)
aiperf/common/models/record_models.py (3)
aiperf/common/models/base_models.py (1)
AIPerfBaseModel
(25-53)aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)
aiperf/common/messages/inference_messages.py (4)
aiperf/common/enums/message_enums.py (1)
MessageType
(7-47)aiperf/common/messages/service_messages.py (1)
BaseServiceMessage
(18-25)aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)aiperf/common/models/record_models.py (2)
MetricRecordMetadata
(66-94)MetricResult
(24-56)
aiperf/records/record_processor_service.py (6)
aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/common/models/record_models.py (3)
MetricRecordMetadata
(66-94)ParsedResponseRecord
(489-568)timestamp_ns
(513-515)aiperf/common/factories.py (17)
RecordProcessorFactory
(553-573)get_all_class_types
(240-242)create_instance
(183-209)create_instance
(284-315)create_instance
(354-364)create_instance
(375-381)create_instance
(390-395)create_instance
(406-414)create_instance
(425-430)create_instance
(439-447)create_instance
(456-464)create_instance
(491-499)create_instance
(536-550)create_instance
(561-573)create_instance
(584-596)create_instance
(607-611)create_instance
(620-628)aiperf/common/protocols.py (1)
push
(155-155)aiperf/zmq/push_client.py (1)
push
(106-115)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)
🪛 markdownlint-cli2 (0.18.1)
examples/README.md
90-90: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
125-125: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.13.3)
aiperf/post_processors/metric_results_processor.py
93-93: Abstract raise
to an inner function
(TRY301)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
96-96: Do not catch blind exception: Exception
(BLE001)
aiperf/post_processors/record_export_results_processor.py
28-28: Unused method argument: service_id
(ARG002)
37-39: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Do not catch blind exception: Exception
(BLE001)
tests/post_processors/test_record_export_results_processor.py
98-98: Unused lambda argument: self
(ARG005)
205-205: Unused method argument: mock_metric_registry
(ARG002)
254-254: Unused method argument: mock_metric_registry
(ARG002)
279-279: Unused method argument: mock_metric_registry
(ARG002)
309-309: Unused method argument: sample_metric_records_message
(ARG002)
310-310: Unused method argument: mock_metric_registry
(ARG002)
363-363: Unused method argument: mock_metric_registry
(ARG002)
398-398: Unused method argument: mock_metric_registry
(ARG002)
447-447: Unused method argument: mock_metric_registry
(ARG002)
486-486: Unused method argument: mock_metric_registry
(ARG002)
517-517: Unused method argument: sample_metric_records_message
(ARG002)
518-518: Unused method argument: mock_metric_registry
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (12)
aiperf/common/enums/metric_enums.py (1)
169-175
: LGTM!The removal of
MetricDateTimeUnit
from the type check is consistent with the PR objective to eliminate the datetime unit. The change aligns with the developer's decision to use epoch nanoseconds directly for timestamp values.tests/post_processors/test_post_processor_integration.py (2)
36-61
: LGTM!The test correctly uses the new message-based API by creating
MetricRecordsMessage
objects viacreate_metric_records_message
and passing them toprocess_result
. The assertions verify that the results processor correctly accumulates data from the message.
62-86
: LGTM!The test correctly verifies accumulation across multiple messages by creating distinct
MetricRecordsMessage
objects in a loop and processing each one. The use of uniquex_request_id
,timestamp_ns
, andx_correlation_id
per message ensures proper isolation and accumulation testing.aiperf/records/record_processor_service.py (2)
84-97
: LGTM!The addition of error handling to catch
PostProcessorDisabled
during processor initialization is a good practice. It allows the service to gracefully skip disabled processors and log the action without failing initialization.
131-147
: LGTM!The updated message construction correctly encapsulates request metadata in a
MetricRecordMetadata
object and sends it viaMetricRecordsMessage
. All required fields are properly populated from the incoming message and service ID.examples/README.md (1)
1-148
: LGTM!The README provides clear, comprehensive documentation for the example scripts and artifacts. The descriptions of data models, usage patterns, and example payloads are helpful for users understanding the new per-record metrics export feature.
Note: There are minor markdownlint warnings (lines 90, 125) about using emphasis instead of headings for "Example: Successful Request" and "Example: Failed Request". These can be safely ignored or optionally converted to proper headings (
### Example: Successful Request
) if strict markdown compliance is desired.aiperf/common/enums/__init__.py (2)
26-30
: LGTM!The addition of
ExportLevel
to the public API is consistent with the PR's introduction of per-record metrics export levels.
103-157
: LGTM!The all list correctly includes
ExportLevel
and no longer includesMetricDateTimeUnit
, aligning with the PR's public API changes.aiperf/common/models/__init__.py (2)
49-68
: LGTM!The addition of
MetricRecordInfo
,MetricRecordMetadata
, andMetricValue
to the imports is consistent with the PR's introduction of new per-record export models.
76-123
: LGTM!The all list correctly includes the new model classes
MetricRecordInfo
,MetricRecordMetadata
, andMetricValue
, making them part of the public API.aiperf/records/records_manager.py (2)
108-123
: LGTM: Defensive processor initialization.The try/except pattern gracefully handles processors that self-disable via
PostProcessorDisabled
, allowing the system to skip unavailable processors without crashing. This is a good extensibility practice.
264-273
: LGTM: Consistent message-based interface.The updated signature and call to
process_result(message)
align with theResultsProcessorProtocol
and allow processors to access full request metadata and error context. This is a clean refactor.
# Constants for test data | ||
DEFAULT_START_TIME_NS = 1_000_000 | ||
DEFAULT_FIRST_RESPONSE_NS = 1_050_000 | ||
DEFAULT_LAST_RESPONSE_NS = 1_100_000 | ||
DEFAULT_INPUT_TOKENS = 5 | ||
DEFAULT_OUTPUT_TOKENS = 2 | ||
|
||
|
||
@pytest.fixture | ||
def sample_request_record() -> RequestRecord: | ||
"""Create a sample RequestRecord for testing.""" | ||
return RequestRecord( | ||
conversation_id="test-conversation", | ||
turn_index=0, | ||
model_name="test-model", | ||
start_perf_ns=DEFAULT_START_TIME_NS, | ||
timestamp_ns=DEFAULT_START_TIME_NS, | ||
end_perf_ns=DEFAULT_LAST_RESPONSE_NS, | ||
error=None, | ||
) | ||
|
||
|
||
@pytest.fixture | ||
def sample_parsed_record(sample_request_record: RequestRecord) -> ParsedResponseRecord: | ||
"""Create a valid ParsedResponseRecord for testing.""" | ||
responses = [ | ||
ParsedResponse( | ||
perf_ns=DEFAULT_FIRST_RESPONSE_NS, | ||
data=TextResponseData(text="Hello"), | ||
), | ||
ParsedResponse( | ||
perf_ns=DEFAULT_LAST_RESPONSE_NS, | ||
data=TextResponseData(text=" world"), | ||
), | ||
] | ||
|
||
return ParsedResponseRecord( | ||
request=sample_request_record, | ||
responses=responses, | ||
input_token_count=DEFAULT_INPUT_TOKENS, | ||
output_token_count=DEFAULT_OUTPUT_TOKENS, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Eliminate code duplication with tests/post_processors/conftest.py.
The constants (DEFAULT_START_TIME_NS, DEFAULT_FIRST_RESPONSE_NS, DEFAULT_LAST_RESPONSE_NS, DEFAULT_INPUT_TOKENS, DEFAULT_OUTPUT_TOKENS) and fixtures (sample_request_record, sample_parsed_record) in this file are duplicates of those in tests/post_processors/conftest.py (lines 44-76).
Consider moving these shared test utilities to a common location (e.g., tests/conftest.py or a new tests/fixtures.py) and importing them in both test modules to eliminate duplication and ensure consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! The code looks good.
For the documentation, it's different than what we do now. Are we sure we want to store sample profile export artifacts in our repository as well as a parsing script? If so, we should be testing all of these.
Also, the new README format looks strange to me, as this PR is creating a new examples directory that's entirely focused on the per-record metrics feature rather than integrating with the current docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
aiperf/common/models/base_models.py (1)
26-28
: Update the docstring to reflect the new configuration.The docstring still describes the old behavior ("allow arbitrary types") but the configuration has changed to
extra="allow"
, which accepts undeclared fields rather than arbitrary types. This mismatch will confuse developers.Apply this diff to update the docstring:
-class AIPerfBaseModel(BaseModel): - """Base model for all AIPerf Pydantic models. This class is configured to allow - arbitrary types to be used as fields as to allow for more flexible model definitions - by end users without breaking the existing code. +class AIPerfBaseModel(BaseModel): + """Base model for all AIPerf Pydantic models. This class is configured to allow + extra fields not defined in the model schema, enabling flexible model definitions + by end users without breaking existing code.
🧹 Nitpick comments (4)
aiperf/post_processors/base_metrics_processor.py (1)
85-93
: Variable rename improves clarity.The rename from
supported_tags
toapplicable_tags
better reflects the semantics ofMetricRegistry.tags_applicable_to
, which filters tags based on flags and types. This is a readability improvement with no functional change.aiperf/parsers/inference_result_parser.py (1)
114-118
: Consider narrowing exception handling scope.The broad
Exception
catches in lines 116, 143, and 162 handle all failures when computing input token counts. While this prevents breaking the processing flow, it may silently hide unexpected errors. Consider:
- Catching more specific exceptions (e.g.,
TokenizationError
,ValueError
)- Using
self.exception()
instead ofself.warning()
for unexpected failures to preserve stack traces- Adding exception type to the log message for better debugging
Example refactor for one of the catches:
try: input_token_count = await self.compute_input_token_count(request_record) -except Exception as e: - self.warning(f"Error computing input token count for error record: {e}") +except (TokenizationError, ValueError) as e: + self.warning(f"Error computing input token count for error record: {e}") + input_token_count = None +except Exception: + self.exception("Unexpected error computing input token count for error record") input_token_count = NoneAlso applies to: 139-144, 160-163
tests/metrics/test_input_sequence_length_metric.py (1)
93-93
: Consider moving ErrorDetails import to module level.The
ErrorDetails
import appears locally within test methods. For consistency with the rest of the codebase and to make dependencies explicit, consider moving it to the top of the file alongside other imports.Apply this diff to move the import to the top of the file:
from aiperf.common.enums import MetricFlags from aiperf.common.exceptions import NoMetricValue +from aiperf.common.models import ErrorDetails from aiperf.metrics.metric_dicts import MetricRecordDict, MetricResultsDict
Then remove the local imports from the test methods:
def test_error_isl_basic(self): """Test basic error input sequence length extraction""" - from aiperf.common.models import ErrorDetails - record = create_record(def test_error_isl_none_raises(self): """Test handling of None input tokens raises error""" - from aiperf.common.models import ErrorDetails - record = create_record(Also applies to: 106-106
tests/post_processors/test_record_export_results_processor.py (1)
149-174
: Simplify file clearing assertion.The implementation calls
self.output_file.unlink(missing_ok=True)
in__init__
, which always removes the file. The conditional check at lines 169-174 is unnecessarily ambiguous.Apply this diff to assert the expected behavior more directly:
- # File should be cleared or not exist - if processor.output_file.exists(): - content = processor.output_file.read_text() - assert content == "" - else: - assert not processor.output_file.exists() + # File should be removed by unlink(missing_ok=True) + assert not processor.output_file.exists()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
README.md
(1 hunks)aiperf/common/config/__init__.py
(1 hunks)aiperf/common/enums/metric_enums.py
(2 hunks)aiperf/common/messages/credit_messages.py
(1 hunks)aiperf/common/models/base_models.py
(1 hunks)aiperf/common/models/record_models.py
(4 hunks)aiperf/common/utils.py
(1 hunks)aiperf/metrics/metric_dicts.py
(2 hunks)aiperf/metrics/metric_registry.py
(2 hunks)aiperf/metrics/types/error_request_count.py
(1 hunks)aiperf/metrics/types/input_sequence_length_metric.py
(3 hunks)aiperf/metrics/types/max_response_metric.py
(2 hunks)aiperf/metrics/types/min_request_metric.py
(2 hunks)aiperf/metrics/types/request_count_metric.py
(1 hunks)aiperf/parsers/inference_result_parser.py
(7 hunks)aiperf/post_processors/base_metrics_processor.py
(1 hunks)aiperf/records/record_processor_service.py
(5 hunks)aiperf/records/records_manager.py
(6 hunks)aiperf/timing/credit_manager.py
(1 hunks)aiperf/timing/fixed_schedule_strategy.py
(1 hunks)aiperf/timing/request_rate_strategy.py
(1 hunks)aiperf/timing/timing_manager.py
(2 hunks)aiperf/workers/worker.py
(4 hunks)docs/tutorials/working-with-profile-exports.md
(1 hunks)tests/metrics/test_input_sequence_length_metric.py
(2 hunks)tests/parsers/test_inference_result_parser.py
(2 hunks)tests/post_processors/conftest.py
(2 hunks)tests/post_processors/test_metric_results_processor.py
(4 hunks)tests/post_processors/test_post_processor_integration.py
(4 hunks)tests/post_processors/test_record_export_results_processor.py
(1 hunks)tests/timing_manager/conftest.py
(2 hunks)tests/workers/test_worker.py
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/post_processors/test_post_processor_integration.py
- aiperf/metrics/types/min_request_metric.py
- aiperf/metrics/types/max_response_metric.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-03T21:15:21.514Z
Learnt from: ajcasagrande
PR: ai-dynamo/aiperf#325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.514Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.
Applied to files:
aiperf/metrics/metric_dicts.py
aiperf/common/enums/metric_enums.py
🧬 Code graph analysis (17)
aiperf/metrics/metric_registry.py (2)
aiperf/common/enums/metric_enums.py (2)
MetricFlags
(365-455)has_flags
(438-441)aiperf/metrics/base_metric.py (1)
has_flags
(147-149)
aiperf/post_processors/base_metrics_processor.py (1)
aiperf/metrics/metric_registry.py (3)
MetricRegistry
(27-295)tags_applicable_to
(141-170)create_dependency_order_for
(255-295)
aiperf/metrics/types/error_request_count.py (1)
aiperf/common/enums/metric_enums.py (1)
MetricFlags
(365-455)
aiperf/common/utils.py (1)
aiperf/common/models/record_models.py (1)
start_perf_ns
(544-546)
aiperf/metrics/metric_dicts.py (5)
aiperf/common/aiperf_logger.py (1)
AIPerfLogger
(25-239)aiperf/common/enums/metric_enums.py (4)
MetricType
(273-288)MetricFlags
(365-455)missing_flags
(447-455)has_flags
(438-441)aiperf/common/exceptions.py (3)
MetricTypeError
(126-127)MetricUnitError
(130-131)NoMetricValue
(142-143)aiperf/common/models/record_models.py (1)
MetricValue
(59-63)aiperf/metrics/metric_registry.py (2)
MetricRegistry
(27-295)get_class
(112-121)
tests/post_processors/conftest.py (4)
aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/enums/message_enums.py (1)
MessageType
(7-47)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/common/models/record_models.py (1)
MetricRecordMetadata
(66-123)
aiperf/metrics/types/input_sequence_length_metric.py (2)
aiperf/common/enums/metric_enums.py (2)
GenericMetricUnit
(182-189)MetricFlags
(365-455)aiperf/metrics/derived_sum_metric.py (1)
DerivedSumMetric
(15-63)
aiperf/common/messages/credit_messages.py (2)
aiperf/common/models/progress_models.py (1)
phase
(143-145)aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)
tests/parsers/test_inference_result_parser.py (5)
aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)aiperf/common/models/record_models.py (3)
RequestRecord
(256-452)has_error
(365-367)has_error
(582-584)aiperf/common/models/dataset_models.py (2)
Text
(24-27)Turn
(43-70)aiperf/common/tokenizer.py (2)
Tokenizer
(25-161)encode
(79-94)aiperf/parsers/inference_result_parser.py (3)
get_tokenizer
(92-101)get_turn
(216-240)parse_request_record
(103-169)
aiperf/records/records_manager.py (7)
aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/metrics/metric_dicts.py (2)
MetricRecordDict
(50-111)append
(161-167)aiperf/common/factories.py (16)
ResultsProcessorFactory
(576-596)create_instance
(183-209)create_instance
(284-315)create_instance
(354-364)create_instance
(375-381)create_instance
(390-395)create_instance
(406-414)create_instance
(425-430)create_instance
(439-447)create_instance
(456-464)create_instance
(491-499)create_instance
(536-550)create_instance
(561-573)create_instance
(584-596)create_instance
(607-611)create_instance
(620-628)aiperf/common/protocols.py (4)
debug
(67-67)is_trace_enabled
(58-58)trace
(66-66)process_result
(505-505)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/post_processors/record_export_results_processor.py (1)
process_result
(50-78)aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)
aiperf/metrics/types/request_count_metric.py (1)
aiperf/common/enums/metric_enums.py (1)
MetricFlags
(365-455)
tests/metrics/test_input_sequence_length_metric.py (5)
aiperf/metrics/types/input_sequence_length_metric.py (3)
ErrorInputSequenceLengthMetric
(66-80)InputSequenceLengthMetric
(12-42)TotalErrorInputSequenceLengthMetric
(83-104)aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)tests/metrics/conftest.py (2)
create_record
(21-63)create_metric_array
(113-118)aiperf/common/exceptions.py (1)
NoMetricValue
(142-143)aiperf/common/enums/metric_enums.py (2)
has_flags
(438-441)MetricFlags
(365-455)
aiperf/parsers/inference_result_parser.py (2)
aiperf/common/models/record_models.py (2)
ParsedResponseRecord
(525-604)RequestRecord
(256-452)aiperf/records/record_processor_service.py (1)
get_tokenizer
(107-116)
aiperf/common/models/record_models.py (3)
aiperf/common/models/base_models.py (1)
AIPerfBaseModel
(25-54)aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/models/error_models.py (1)
ErrorDetails
(11-47)
tests/post_processors/test_metric_results_processor.py (2)
aiperf/post_processors/metric_results_processor.py (1)
process_result
(68-100)tests/post_processors/conftest.py (1)
create_metric_records_message
(247-284)
aiperf/records/record_processor_service.py (6)
aiperf/common/exceptions.py (1)
PostProcessorDisabled
(146-147)aiperf/common/models/record_models.py (5)
MetricRecordMetadata
(66-123)ParsedResponseRecord
(525-604)timestamp_ns
(549-551)start_perf_ns
(544-546)end_perf_ns
(556-567)aiperf/common/protocols.py (3)
PushClientProtocol
(154-155)RecordProcessorProtocol
(492-497)push
(155-155)aiperf/common/utils.py (1)
compute_time_ns
(110-128)aiperf/common/factories.py (15)
create_instance
(183-209)create_instance
(284-315)create_instance
(354-364)create_instance
(375-381)create_instance
(390-395)create_instance
(406-414)create_instance
(425-430)create_instance
(439-447)create_instance
(456-464)create_instance
(491-499)create_instance
(536-550)create_instance
(561-573)create_instance
(584-596)create_instance
(607-611)create_instance
(620-628)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)
tests/post_processors/test_record_export_results_processor.py (7)
aiperf/common/config/output_config.py (2)
OutputConfig
(16-50)export_level
(49-50)aiperf/common/enums/timing_enums.py (1)
CreditPhase
(32-42)aiperf/common/messages/inference_messages.py (1)
MetricRecordsMessage
(24-43)aiperf/common/models/record_models.py (3)
MetricRecordInfo
(126-140)MetricRecordMetadata
(66-123)MetricValue
(59-63)aiperf/metrics/metric_dicts.py (1)
MetricRecordDict
(50-111)aiperf/post_processors/record_export_results_processor.py (4)
RecordExportResultsProcessor
(23-88)process_result
(50-78)_shutdown
(85-88)summarize
(80-82)tests/post_processors/conftest.py (2)
create_metric_records_message
(247-284)mock_metric_registry
(176-197)
🪛 Ruff (0.13.3)
aiperf/common/utils.py
127-127: Avoid specifying long messages outside the exception class
(TRY003)
aiperf/parsers/inference_result_parser.py
116-116: Do not catch blind exception: Exception
(BLE001)
143-143: Do not catch blind exception: Exception
(BLE001)
162-162: Do not catch blind exception: Exception
(BLE001)
tests/post_processors/test_record_export_results_processor.py
98-98: Unused lambda argument: self
(ARG005)
205-205: Unused method argument: mock_metric_registry
(ARG002)
254-254: Unused method argument: mock_metric_registry
(ARG002)
279-279: Unused method argument: mock_metric_registry
(ARG002)
309-309: Unused method argument: sample_metric_records_message
(ARG002)
310-310: Unused method argument: mock_metric_registry
(ARG002)
363-363: Unused method argument: mock_metric_registry
(ARG002)
398-398: Unused method argument: mock_metric_registry
(ARG002)
447-447: Unused method argument: mock_metric_registry
(ARG002)
486-486: Unused method argument: mock_metric_registry
(ARG002)
517-517: Unused method argument: sample_metric_records_message
(ARG002)
518-518: Unused method argument: mock_metric_registry
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (43)
aiperf/common/config/__init__.py (1)
174-174
: Verify that this change was made through mkinit regeneration.This file is auto-generated by mkinit with an explicit warning not to edit below line 10. If
parse_str_as_numeric_dict
was manually added to__all__
, the change will be lost on the next mkinit run.Please confirm this change was made by:
- Updating the mkinit configuration or flags (if needed)
- Running mkinit to regenerate the file
If added manually, consider following the proper mkinit workflow to ensure the change persists.
aiperf/timing/credit_manager.py (1)
26-35
: LGTM! Protocol signature extended for per-credit tracking.The addition of the required
credit_num: int
parameter todrop_credit
is consistent with the broader PR goal of enabling per-record metrics export. The parameter is positioned logically aftercredit_phase
and the change propagates correctly through implementations (timing_manager, strategies, and tests).aiperf/metrics/types/request_count_metric.py (1)
24-25
: LGTM! Appropriate flag and display order adjustments for aggregate metric.The addition of
NO_INDIVIDUAL_RECORDS
flag is correct for an aggregate counter metric—request count is a summary statistic not meaningful at the per-record level. The display order adjustment (1000 → 1100) is a minor reordering that doesn't affect functionality.tests/timing_manager/conftest.py (1)
109-131
: LGTM! Test fixture correctly updated for protocol change.The
MockCreditManager.drop_credit
signature andCreditDropMessage
construction are properly updated to include the newcredit_num
parameter, maintaining consistency with theCreditManagerProtocol
change.aiperf/metrics/types/error_request_count.py (1)
24-24
: LGTM! Correct flag for aggregate error metric.Adding
NO_INDIVIDUAL_RECORDS
toErrorRequestCountMetric
is appropriate—error count is a summary statistic not relevant at the per-record level. This is consistent with the same flag addition toRequestCountMetric
.tests/workers/test_worker.py (3)
225-233
: LGTM! Test correctly includes new credit_num parameter.The
CreditDropMessage
construction is properly updated withcredit_num=1
, aligning with the protocol change and ensuring test coverage of the new per-credit tracking field.
275-283
: LGTM! Test correctly includes new credit_num parameter.Consistent with other test updates,
credit_num=1
is properly included in the message construction.
326-330
: LGTM! Test correctly includes new credit_num parameter.The test properly constructs
CreditDropMessage
with the requiredcredit_num
field.aiperf/common/utils.py (1)
110-128
: LGTM! Utility function correctly converts perf_counter to wall clock time.The
compute_time_ns
function properly handles:
- Timestamp conversion from perf_counter_ns to time_ns reference frame
- None propagation for optional timestamps
- Input validation preventing time travel (perf_ns < start_perf_ns)
The error message at line 127 includes contextual values for debugging, which is appropriate despite the static analysis hint (TRY003). The message is concise and aids troubleshooting.
aiperf/metrics/metric_registry.py (1)
154-170
: Align flag logic with documentation
The condition now correctly treatsMetricFlags.NONE
as “no required flags,” matching the docstring. No external callers were found relying on the previous stricter behavior.aiperf/timing/fixed_schedule_strategy.py (1)
107-117
: LGTM!The credit sequencing logic is correct. Passing
phase_stats.sent
before incrementing ensures each credit gets a unique, sequential number. The explanatory comment helpfully clarifies why the increment occurs after the call.aiperf/timing/request_rate_strategy.py (1)
70-77
: LGTM!Credit numbering follows the same pattern as
fixed_schedule_strategy.py
, maintaining consistency across timing strategies.docs/tutorials/working-with-profile-exports.md (1)
1-314
: Excellent documentation!This comprehensive guide provides clear examples for working with profile exports, including:
- Well-structured data model documentation with source references
- Detailed field descriptions for both successful and failed requests
- Practical code examples for synchronous and asynchronous loading
- Useful correlation patterns between inputs and results
The documentation will significantly improve the developer experience when working with AIPerf outputs.
aiperf/timing/timing_manager.py (1)
175-198
: LGTM!The signature change correctly adds
credit_num
as a required positional parameter and forwards it toCreditDropMessage
. Making subsequent parameters keyword-only (with*
) is good practice for API stability.Note: This is a breaking change to the
drop_credit
interface, but all call sites in the PR have been updated consistently.aiperf/common/enums/metric_enums.py (2)
169-174
: LGTM!The type check now accepts both
MetricTimeUnit
andMetricTimeUnitInfo
, providing more flexibility for unit conversions.
434-436
: LGTM!The
NO_INDIVIDUAL_RECORDS
flag is well-documented and provides a clear mechanism to exclude aggregate metrics from per-record exports.aiperf/common/messages/credit_messages.py (1)
25-33
: LGTM!The new
credit_num
field is well-documented with appropriate validation (ge=0
). The description clearly explains its purpose for tracking progress and request ordering within a credit phase.aiperf/records/record_processor_service.py (3)
86-98
: LGTM!Wrapping processor initialization in try/except to handle
PostProcessorDisabled
allows processors to be gracefully skipped when disabled. The debug message provides helpful feedback.
118-153
: LGTM!The
_create_metric_record_metadata
helper cleanly consolidates metadata construction. Usingcompute_time_ns
to convertperf_ns
timestamps to wall-clocktime_ns
ensures consistent timestamp representation for export.
169-178
: LGTM!The refactoring consolidates metadata into a single structured field, improving message schema consistency and reducing the number of top-level fields in
MetricRecordsMessage
.tests/post_processors/test_metric_results_processor.py (1)
43-103
: LGTM!Test updates consistently migrate from direct dict construction to using
create_metric_records_message
, aligning with the new message-based processing flow. The mechanical nature of these changes reduces the risk of test-specific bugs.aiperf/parsers/inference_result_parser.py (1)
187-187
: LGTM! Cleaner tokenizer retrieval pattern.The refactor to fetch the tokenizer internally within
compute_input_token_count()
(line 250) rather than pre-fetching it (removed from line 187) improves encapsulation and reduces coupling. This aligns well with the on-demand tokenizer creation pattern inget_tokenizer()
.Also applies to: 242-254
aiperf/common/models/record_models.py (4)
59-64
: LGTM! Clean value-unit pairing for export.The
MetricValue
model provides a simple and effective structure for pairing metric values with their units, which is essential for the per-record export functionality.
66-124
: LGTM! Comprehensive metadata model.The
MetricRecordMetadata
model captures all essential request metadata for export. The field descriptions clearly explain each attribute's purpose and measurement method (e.g.,time.time_ns()
for timestamps).
126-141
: LGTM! Well-structured record info model.
MetricRecordInfo
effectively combines metadata, metrics, and error information into a single exportable structure, aligning with the JSON Lines format described in the PR objectives.
263-268
: LGTM! credit_num field properly constrained.The new
credit_num
field includes appropriate constraints (ge=0
) and a clear description explaining its purpose in tracking credit phase progress and request ordering.aiperf/records/records_manager.py (4)
109-123
: LGTM! Graceful handling of disabled processors.The try/except block properly handles
PostProcessorDisabled
exceptions during processor initialization, allowing the service to continue with remaining processors. The debug messages provide clear feedback about which processors were created or skipped.
137-139
: LGTM! Clean conversion to MetricRecordDict.The conversion to
MetricRecordDict
(lines 137-139) enables structured access to metric values and theto_display_dict
functionality. The updated_should_include_request_by_duration()
signature and implementation correctly operate on the converted record dictionaries.Also applies to: 178-212
264-273
: LGTM! Improved processor interface.Passing the full
MetricRecordsMessage
to processors (line 270) provides processors with complete context including metadata, results, and error information, which is cleaner than passing individual fields.
131-135
: All message metadata fields consistently accessed viamessage.metadata
. No remaining directmessage.credit_phase
,message.worker_id
, ormessage.benchmark_phase
usages detected.aiperf/metrics/types/input_sequence_length_metric.py (3)
14-14
: LGTM: Docstring clarity improvement.The docstring updates clarify that the existing metrics apply to "valid records," which makes sense given the addition of error-specific variants.
Also applies to: 47-47
66-80
: LGTM: Error-specific metric follows established pattern.The
ErrorInputSequenceLengthMetric
correctly:
- Inherits from
InputSequenceLengthMetric
to reuse the extraction logic- Adds the
ERROR_ONLY
flag to ensure it only computes for error records- Maintains consistent metadata and flag combinations
83-104
: Review NO_INDIVIDUAL_RECORDS on DerivedSumMetric classes
TotalErrorInputSequenceLengthMetric (and its siblings TotalInputSequenceLengthMetric, TotalReasoningTokensMetric, TotalOutputTokensMetric) currently omit MetricFlags.NO_INDIVIDUAL_RECORDS and will therefore be included in profile_export.jsonl. If these “total” metrics should be excluded from per‐record exports, add MetricFlags.NO_INDIVIDUAL_RECORDS to their flags tuples.tests/metrics/test_input_sequence_length_metric.py (2)
90-122
: LGTM: Comprehensive test coverage for error-specific metrics.The test class covers:
- Basic extraction of input tokens from error records
- Handling of
None
input tokens (expectingNoMetricValue
)- Metadata and flags validation
The tests appropriately mirror the structure of the non-error variant tests, ensuring consistency.
124-146
: LGTM: Complete test coverage for total error metric.The parametrized tests cover various sum calculation scenarios (multiple values, single value, empty list), and metadata validation confirms correct flag configuration.
tests/post_processors/test_record_export_results_processor.py (8)
1-73
: Test setup and fixtures are well-structured.The imports, fixtures, and sample data are appropriate for testing the record export processor. The
sample_metric_records_message
fixture creates a message with 2 result dicts, which aligns with the multi-record-per-message semantics used throughout the tests.
75-148
: Comprehensive initialization test coverage.The tests properly verify processor initialization behavior across different export levels, output directory creation, and file handling. The parametrized test approach is effective for exercising different configurations.
226-234
: Verify the intended per-message vs per-result-dict export semantics.The tests expect multiple JSONL records per
MetricRecordsMessage
(e.g., line 226 expectsrecord_count == 2
for a message with 2 result dicts). However, past review comments on these exact lines indicated the processor should merge all result dicts into a single record per message and were marked as "✅ Addressed".The current implementation iterates through
message.results
and writes one JSONL entry per result dict:for record_dict in record_dicts: # ... write one line per dict self.record_count += 1Please verify whether:
- The current behavior (multiple records per message) is correct, and the past comments were mistaken, OR
- The merging fix was not fully applied, despite being marked as addressed
If the current multi-record-per-message behavior is correct, you may want to generate a shell script to verify that
message.results
truly contains separate records from different processors (not redundant data that should be merged).Also applies to: 337-344
248-352
: Excellent coverage of edge cases and error scenarios.The tests for empty display metrics, error handling, and multi-message accumulation are thorough and properly isolated using mocks. The error handling test correctly verifies that exceptions are logged rather than propagated.
354-437
: Thorough validation of JSONL output format.The file format tests properly verify both the JSONL structure (newline-delimited JSON) and the nested record structure using Pydantic model validation. The use of
orjson
for parsing aligns with the implementation.
439-507
: Logging behavior is properly tested.The periodic debug logging test correctly verifies that the processor logs every 100 records (achieved by sending 50 messages with 2 results each). The error logging test validates proper error messages.
509-570
: Shutdown and summarization tests are complete.The tests properly verify that the processor logs final statistics on shutdown and returns an empty list from
summarize()
(since no aggregation is performed for per-record export).
98-98
: Static analysis warning is a false positive.The unused lambda argument
self
on line 98 is intentional for property mocking withmonkeypatch.setattr
. This is a standard pattern for mocking property getters in pytest. The Ruff warning can be safely ignored.
Excellent work, this looks good! Can you address or respond to CodeRabbit's comments? This should then be ready to approve. |
Export Per-Record Metrics
Scripts
examples/parse_profile_export.py
Purpose: Load and display
profile_export.jsonl
files using AIPerf's native Pydantic models.Example Artifacts
The
artifacts/run1/
directory contains sample output files from an actual AIPerf run.profile_export.jsonl
Format: JSON Lines (one record per line)
Purpose: Per-request metric data with full detail
Each line contains:
Data Models
AIPerf uses Pydantic models for type-safe data handling:
Model Definitions:
MetricRecordInfo
- Complete per-request record structureMetricRecordMetadata
- Request metadata (timestamps, IDs, worker info)MetricValue
- Individual metric value with unitErrorDetails
- Error information (code, type, message)Example: Successful Request
Example: Failed Request
Summary by CodeRabbit
New Features
Changes
Documentation
Tests