-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[GPU] Add event completion for optimized out instance #33122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[GPU] Add event completion for optimized out instance #33122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical synchronization issue in the GPU plugin where optimized-out operations (like concat) were not properly completing their dependent events before CPU implementations (like NMS) accessed their outputs, causing accuracy degradation in MSCOCO validation (39.3% → 30.4% mAP).
Key Changes:
- Added event completion check for optimized-out instances that have CPU consumers requiring synchronization
- Ensures
wait_for_events()is called when an optimized instance needs completion events before returning aggregated events
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hyunback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| if (instance.needs_completion_event()) { | ||
| stream.wait_for_events(events); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can return here instead of calling the next line aggregate_events().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e-ddykim I added return here. Thanks!
| if (instance.can_be_optimized()) { | ||
| if (instance.needs_completion_event()) { | ||
| stream.wait_for_events(events); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you review existing places to handle needs_completion_event? If I remember correctly, we already handled such case and it is surprising that it is not working as expected..
I found this comment in primitive_inst.cpp and this seems to be relevant.
// Prepare dependencies events in case of OOO queue, CPU implementation,
// or optimized_out impl which has CPU users (needs_completion_event() && !is_output() condition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isanghao Yes, the dep_events is created from prepare_primitive() as concat is optimized out/needs_completion_event()/!is_output(). And this dep_events is not waited in execute() because BMG has SyncMethods=none. That's why this accuracy happens.
Description of the issue(symptom, root-cause, how it was resolved)
The code and line that caused this issue (if it is not changed directly)
openvino/src/plugins/intel_gpu/src/graph/impls/ocl/primitive_base.hpp
Line 247 in fb89d8b
Problematic graph
Checklist
Tickets: