Skip to content

Conversation

@andrew-k-park
Copy link
Contributor

Details:

  • When FP16 dynamic convolution has small input channels (≤4) and large output channels (e.g., 1024), the current format selection logic chooses bfyx → fsv16, which triggers oneDNN reference kernel instead of optimized JIT kernel, resulting in significant performance degradation.
  • Override output format to planar (bfyx) when input channels are small (≤ 16), and output channels are large (≥ 32)

Current behavior:

  • Input: 3 channels → Converted to bfyx
  • Output: 1024 channels → Remains fsv16 (only changed when output ≤ 4)
  • Result: bfyx → fsv16 combination uses reference kernel (slow)

Root Cause

The fsv16 blocked format is optimized for reading many channels but introduces overhead when used for writing outputs in channel-expansion scenarios (small input → large output). oneDNN's reference kernel is selected because:

  1. Inefficient write pattern: fsv16 output requires interleaved writes every 16 elements (non-contiguous)
  2. No optimized implementation: oneDNN doesn't provide JIT-optimized kernel for fsv16 output generation from small input channels
  3. Scatter write overhead: Writing 1024 channels in fsv16 format requires complex block-strided access

Tickets:

…ge channel expansion

Signed-off-by: Andrew Park <andrew.park@intel.com>
@andrew-k-park andrew-k-park requested review from a team as code owners December 5, 2025 07:08
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant