-
Notifications
You must be signed in to change notification settings - Fork 462
Add model latency endpoint #4599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Docker Image Sizes
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds model latency tracking capabilities by implementing a MetricsCollector that uses shared memory for process-safe storage of inference latency measurements, along with a new API endpoint to retrieve pipeline metrics.
- Implements a MetricsCollector using shared memory to track model inference latencies across processes
- Adds pipeline metrics API endpoint that returns latency statistics (avg, min, max, p95, latest) over configurable time windows
- Integrates latency collection into the inference workflow by recording start/end times and storing measurements
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/app/services/metrics_collector.py | Core MetricsCollector implementation with shared memory and circular buffer |
| backend/app/workers/inference.py | Integration of latency measurement recording in inference workflow |
| backend/app/services/pipeline_service.py | Pipeline metrics calculation and percentile computation logic |
| backend/app/api/endpoints/pipelines.py | New GET endpoint for retrieving pipeline metrics with validation |
| backend/app/schemas/metrics.py | Pydantic models for metrics API response structure |
| backend/app/services/model_service.py | Enhanced LoadedModel to include model ID for metrics tracking |
| backend/app/schemas/model_activation.py | Added active_model_id field to ModelActivationState |
| backend/tests/unit/services/test_metrics_collector.py | Comprehensive unit tests for MetricsCollector functionality |
| backend/tests/unit/services/test_pipeline_service.py | Unit tests for pipeline metrics calculation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…nsions into aurelien/4537-model-latency-endpoint
itallix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! I've added a few suggestions for further improvement.
…nsions into aurelien/4537-model-latency-endpoint # Conflicts: # backend/app/workers/inference.py
itallix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
Added a MetricsService which collects and retrieves the model's inference latencies. It uses shared memory so that it is process safe when accessing the recorded measurements.
The latencies are stored in memory and only a total of 1024 entries are kept.
How to test
http://[geti-tune.localhost/api/pipelines/ace3f1da-fdd9-4048-a95e-a647ed969442/metrics](http://geti-tune.localhost/api/pipelines/ace3f1da-fdd9-4048-a95e-a647ed969442/metrics)Checklist
License
Feel free to contact the maintainers if that's a concern.