Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,10 @@ struct typed_primitive_impl_ocl : public typed_primitive_impl<PType> {
typed_primitive_inst<PType>& instance) override {
stream& stream = instance.get_network().get_stream();
if (instance.can_be_optimized()) {
if (instance.needs_completion_event()) {
stream.wait_for_events(events);
return instance.is_output() ? stream.create_user_event(true) : nullptr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you review existing places to handle needs_completion_event? If I remember correctly, we already handled such case and it is surprising that it is not working as expected..
I found this comment in primitive_inst.cpp and this seems to be relevant.

        // Prepare dependencies events in case of OOO queue, CPU implementation,
        // or optimized_out impl which has CPU users (needs_completion_event() && !is_output() condition)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isanghao Yes, the dep_events is created from prepare_primitive() as concat is optimized out/needs_completion_event()/!is_output(). And this dep_events is not waited in execute() because BMG has SyncMethods=none. That's why this accuracy happens.

return stream.aggregate_events(events, false, instance.is_output());
}
std::vector<event::ptr> tmp_events(events);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,24 +141,21 @@ TEST(network_test, has_proper_event_for_in_order_queue_optimized_out) {
net.set_input_data("input1", input_mem);
net.execute();

// reshape is optimized out with need_completion_event=true. So it doesn't have event.
ASSERT_TRUE(net.has_event("concat"));
ASSERT_TRUE(net.has_event("reshape"));
ASSERT_TRUE(net.has_event("reorder"));
ASSERT_TRUE(net.has_event("activation"));

auto concat_ev = net.get_primitive_event("concat");
auto reshape_ev = net.get_primitive_event("reshape");
auto reorder_ev = net.get_primitive_event("reorder");
auto activation_ev = net.get_primitive_event("activation");

OV_ASSERT_NO_THROW(downcast<ocl::ocl_base_event>(concat_ev.get()));
OV_ASSERT_NO_THROW(downcast<ocl::ocl_base_event>(reshape_ev.get()));
OV_ASSERT_NO_THROW(downcast<ocl::ocl_base_event>(reorder_ev.get()));
OV_ASSERT_NO_THROW(downcast<ocl::ocl_base_event>(activation_ev.get()));

// Check if we have real underlying OpenCL events
ASSERT_TRUE(downcast<ocl::ocl_base_event>(concat_ev.get())->get().get() != nullptr);
ASSERT_TRUE(downcast<ocl::ocl_base_event>(reshape_ev.get())->get().get() != nullptr);
ASSERT_TRUE(downcast<ocl::ocl_base_event>(reorder_ev.get())->get().get() != nullptr);
ASSERT_TRUE(downcast<ocl::ocl_base_event>(activation_ev.get())->get().get() != nullptr);
}
Expand Down
Loading