-
Patch 1 (small vectors unit stride loads/stores).
- COMPLETE.
-
Patch 2 (large vectors unit stride loads/stores).
- Minor fixes posted to address Richard Henderson's review.
- Updated patch, no further responses.
- COMPLETE.
Generate TCG Ops for vector whole word load/store.
-
IN PROGRESS.
- Latest version of the patch.
- Performance improvement for all except large data with large vectors.
- We are identifying the cut-off point, so we use helper functions in this case, and will then submit the final version of the patch.
- Key limit to performance is maintaining the
vstart
CSR.
-
Improve first-fault handling for vector load/store helper functions.
- IN PROGRESS.
- No new work to report this week.
-
Improve strided load/store helper functions.
- IN PROGRESS.
- No new work to report this week.
We used a simple assembler benchmark to obtain timings for each instruction and hence for each instruction the speedup from TCGOp generation. The graphs show the results of 10 separate runs, with standard error bars.
This uses the variant implementation of memcpy
using whole word load/store (see the report from 18 Dec 2024. There is no update from last week's report
Next meeting 22 January 2025.