Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-based blitting using ARM Neon and Intel AVX #5428

Open
nfeske opened this issue Jan 21, 2025 · 2 comments
Open

SIMD-based blitting using ARM Neon and Intel AVX #5428

nfeske opened this issue Jan 21, 2025 · 2 comments
Labels

Comments

@nfeske
Copy link
Member

nfeske commented Jan 21, 2025

At the moment, we use SIMD instructions for 2D-pixel copying only on the x86 platform. I'd like to foster the use of SIMD on ARMv8 as well, and also leverage more modern SIMD variants (AVX) on x86_64.

@nfeske nfeske added the feature label Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
This is a precondition for using Blit::back2front at the driver side.

Issue genodelabs#5428
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
@nfeske
Copy link
Member Author

nfeske commented Jan 21, 2025

Implemented for ARM Neon and SSE on my simd branch https://github.com/nfeske/genode/commits/simd/.

nfeske added a commit to nfeske/genode-allwinner that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode-rpi that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode-imx that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
This is a precondition for using Blit::back2front at the driver side.

Issue genodelabs#5428
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 21, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
This is a precondition for using Blit::back2front at the driver side.

Issue genodelabs#5428
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
cnuke pushed a commit to cnuke/genode that referenced this issue Jan 23, 2025
cnuke pushed a commit to cnuke/genode that referenced this issue Jan 23, 2025
This is a precondition for using Blit::back2front at the driver side.

Issue genodelabs#5428
cnuke pushed a commit to cnuke/genode that referenced this issue Jan 23, 2025
cnuke pushed a commit to cnuke/genode that referenced this issue Jan 23, 2025
cnuke pushed a commit to cnuke/genode that referenced this issue Jan 23, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
This is a precondition for using Blit::back2front at the driver side.

Issue genodelabs#5428
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
nfeske added a commit to nfeske/genode that referenced this issue Jan 23, 2025
@nfeske
Copy link
Member Author

nfeske commented Jan 23, 2025

My updated simd branch contains quite a few changes made after benchmarking on the PinePhone. The non-SIMD (slow) variant reads sequentially now, instead of writing sequentially. This is useful because the source is usually cached memory whereas the destination is uncached in most display drivers.

The Neon variant has been improved for cache locality and - most importantly - cache prefetching in the rotation case. On the PinePhone, the rotation adds about 40% overhead compared the non-rotated back2front copy. A complete screen update (1440x720) takes about 5.4 ms w/o rotation, and 7.7 ms with rotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant