Skip to content

Commit e2f8cc7

Browse files
committed
Docs: Describe memory algorithms
1 parent 696797d commit e2f8cc7

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1367,6 +1367,23 @@ With that solved, the SIMD implementation will become 5x faster than the serial
13671367
[faq-dipeptide]: https://en.wikipedia.org/wiki/Dipeptide
13681368
[faq-titin]: https://en.wikipedia.org/wiki/Titin
13691369

1370+
### Memory Copying, Fills, and Moves
1371+
1372+
A lot has been written about the time computers spend copying memory and how that operation is implemented in LibC.
1373+
Interestingly, the operation can still be improved, as most Assembly implementations use outdated instructions.
1374+
Even performance-oriented STL replacements, like Meta's [Folly v2024.09.23 focus on AVX2](https://github.com/facebook/folly/blob/main/folly/memset.S), and don't take advantage of the new masked instructions in AVX-512 or SVE.
1375+
1376+
In AVX-512, StringZilla uses non-temporal stores to avoid cache pollution, when dealing with very large strings.
1377+
Moreover, it handles the unaligned head and the tails of the `target` buffer separately, ensuring that writes in big copies are always aligned to cache-line boundaries.
1378+
That's true for both AVX2 and AVX-512 backends.
1379+
1380+
StringZilla also contains "drafts" of smarter, but less efficient algorithms, that minimize the number of unaligned loads, perfoming shuffles and permutations.
1381+
That's a topic for future research, as the performance gains are not yet satisfactory.
1382+
1383+
> § Reading materials.
1384+
> [`memset` benchmarks](https://github.com/nadavrot/memset_benchmark?tab=readme-ov-file) by Nadav Rotem.
1385+
> [Cache Associativity](https://en.algorithmica.org/hpc/cpu-cache/associativity/) by Sergey Slotin.
1386+
13701387
### Random Generation
13711388

13721389
Generating random strings from different alphabets is a very common operation.

0 commit comments

Comments
 (0)