Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Op4dTensorGeneric kernel upgrade #3458

Draft
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

novakovicdj
Copy link
Contributor

@novakovicdj novakovicdj commented Jan 3, 2025

This PR is for new, upgraded, Op4dTensorGeneric kernel, this is part of porting kernels from OCL to HIP

Below is performance (speed-up and drops in performance) comparison between new Op4dTensorGeneric kernel and other OpTensor kernels used for 4d tensors.

This PR is opened as draft for now, if everyone is ok with this new Op4dTensorGeneric kernel I will update this PR and replace old kernel with this new one.

Test cases generated and run from tensor_4d_generic_ocl_hip.cpp file

New Op4dTensorGeneric - Old OpTensorFwdBias (B - 1C11 case)

  • 47502 test runs, float data type
  • On whole test set average speed-up is x15.06
Tensor size Speed-up
size <= 8192B 1.31
8192B < size <= 1048576B 8.5
size > 1048576B 19.86
Performance drop % of test runs
more than 5% 24.4
more than 10% 15.1
more than 20% 6.8

New Op4dTensorGeneric - Old OpTensorLeadingOnes (B - N111, NC11, NCH1, 1111)

  • 190009 test runs, float data type
  • On whole test set average speed-up is x26.12
Tensor size Speed-up
size <= 8192B 1.39
8192B < size <= 1048576B 12.69
size > 1048576B 35.49
Performance drop % of test runs
more than 5% 12.1
more than 10% 9.3
more than 20% 5.3

New Op4dTensorGeneric - Old Op4dTensorLite (B - NCHW)

  • Tried on 2750 and 7280 test runs, float data type
  • On whole test set average speed-up is below 1 (~0.75)

New Op4dTensorGeneric - Old Op4dTensorGeneric (B - all cases)

  • 760032 test runs, float data type
  • On whole test set average speed-up is x29.58
Tensor size Speed-up
size <= 8192B 1.95
8192B < size <= 1048576B 15.94
size > 1048576B 39.39
Performance drop % of test runs
more than 5% 3.1
more than 10% 1.8
more than 20% 0.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants