Op4dTensorGeneric kernel upgrade #3458
Draft
+82
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is for new, upgraded, Op4dTensorGeneric kernel, this is part of porting kernels from OCL to HIP
Below is performance (speed-up and drops in performance) comparison between new Op4dTensorGeneric kernel and other OpTensor kernels used for 4d tensors.
This PR is opened as draft for now, if everyone is ok with this new Op4dTensorGeneric kernel I will update this PR and replace old kernel with this new one.
Test cases generated and run from tensor_4d_generic_ocl_hip.cpp file
New Op4dTensorGeneric - Old OpTensorFwdBias (B - 1C11 case)
New Op4dTensorGeneric - Old OpTensorLeadingOnes (B - N111, NC11, NCH1, 1111)
New Op4dTensorGeneric - Old Op4dTensorLite (B - NCHW)
New Op4dTensorGeneric - Old Op4dTensorGeneric (B - all cases)