Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: More Optimizations and SIMD fixes for MSVC & ARM #413

Merged
merged 24 commits into from
Feb 12, 2025
Merged

Conversation

recp
Copy link
Owner

@recp recp commented Apr 6, 2024

  • [WIP] More SIMD optimizations
    • Matrix invert
    • Non-Square matrices
    • Transforms
    • AABB
    • Frustum
    • simd for int types
    • ...
  • Fix compiling on MSVC + ARM32 ( dont align types on MSVC + ARM32 due to "719: formal parameter with requested alignment of 16 won't be aligned" )
  • msvc, simd: fix simd headers for _M_ARM64EC
  • arm, neon: fix neon support on GCC ARM
  • Try interleave independent instructions to take advantages of ILP if possible ( compilers may do this already but manually giving the hint is nice )
  • Try reduce port pressure where possible e.g. use some _mm_blend_ps instead lot of _mm_shuffle_ps ( this step may take a time also needs to be profiled e.g Intel VTune can be used to see the bottleneck + speed test... ). Maybe on another PRs...

EDIT: Will do the remaining tasks later asap.

@recp recp added the unfinished-postponed should be finished later label Feb 9, 2025
@recp recp marked this pull request as ready for review February 9, 2025 12:13
@recp recp self-assigned this Feb 9, 2025
@recp recp merged commit fb4eac2 into master Feb 12, 2025
178 of 191 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant