Skip to content

Conversation

DiamonDinoia
Copy link
Contributor

@DiamonDinoia DiamonDinoia commented Sep 24, 2025

  1. Adding stream API for non temporal data transfers
  2. Adding xsimd::fence as a wrapper around std atomic for cache coherence
  3. Adding tests

Draft because I need to double check the API levels ( i.e I am not using AVX2 functions in AVX and so on). I just wanted some feedback while I do the finishing touches.

@serge-sans-paille
Copy link
Contributor

Some generic thoughts:

  • I'm unsure the fence belongs to xsimd, but I like being proven wrong, maybe show us a code example that uses it?
  • load_stream or stream_load or streaming_load?

On arm64, there's no support for non-temporal loads (https://developer.arm.com/documentation/100048/0100/level-1-memory-system/memory-prefetching/non-temporal-loads), the corresponding instruction do exist (LDNP/STNP) but I failed to find the related intrinsic.

There seems to be something equivalent in riscv (see riscv-non-isa/riscv-c-api-doc#47)

I couldn't find anything for webassembly nor Power. So that's quite a niche, but I'm fine with adding those though.

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Sep 24, 2025

  1. I went for load_stream and store_stream so that it is consistent with [load|store]_[un]aligned... (Also load_non_temporal was too long and load_nta is not clear).
  2. I added fence for convenience. I have no strong feelings on it. We can always think about adding it in the future. In the end on x86, I was recently made aware that it is not needed on a single core application. In parallel applications, atomic is likely to be imported anyway.
  3. About ARM and RISK-V what about making our own intrinsics by wrapping the inline assembly? I sadly do not know about ARM all that much to be able to promise I will help

Cheers,
Marco

PS: sse2 adds APIS for non temporal stores of scalars of 32/64 bits. I am not sure the fit within xsimd though

@DiamonDinoia DiamonDinoia marked this pull request as ready for review September 24, 2025 20:59
2. Adding xsimd::fence as a wrapper around std atomic for cache coherence
3. Adding tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants