forked from JakeStevens/RISCVBusiness
-
Notifications
You must be signed in to change notification settings - Fork 3
Vector Extension #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
1fahadaloufi
wants to merge
222
commits into
stage3
Choose a base branch
from
rvv-testing
base: stage3
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Vector Extension #35
Changes from 216 commits
Commits
Show all changes
222 commits
Select commit
Hold shift + click to select a range
e596d2e
Copy relevant rv32v files from previous work
OxyMagnesium 31419fc
Add interface for vector register banks
OxyMagnesium a7015c6
Add vector exec lane interface and redo vector FU control enums
OxyMagnesium 78f9588
Add operand alignment unit interface
OxyMagnesium 0b607d2
load-store controller init + python scripts change
maxmichalec d7a2375
updated elf2hex path
maxmichalec a0c3330
Move rv32v include files to dedicated subfolder
OxyMagnesium ccf2a7f
initial uop queue integration done. still need to address misa.S file
2541508
testing dispatch_size=queue_length with extra NOPs
b20529c
Add register class identifier bits to select signals
OxyMagnesium 8d0b609
modified epc handling and ran other tests
e8b1a15
Add initial vector decode logic
OxyMagnesium a5d8e39
Add vector control unit interface and update existing interfaces and …
OxyMagnesium b1da13a
moved decode from execute to the uop stage
f4b5156
Save work for end of semester
OxyMagnesium 84715ad
init mdbook doc
maxmichalec a67c4e8
Create rv32v_overview.md
OxyMagnesium d5ca18c
Locally host pipeline diagram image
OxyMagnesium 8778968
Create uop queue documentation
1fahadaloufi 5336409
change hazard unit if
d3377f9
added initial documentation for uop queue
194f507
clean up queue files for PR
4360867
fix scalar decode block comments
c76a96e
add uop_queue.sv file
6d9946f
create a new folder for uop arch
1c9d4d8
create new folder for uop arch
155d5dd
merge uop_queue
74691aa
checkout old stage3 folder
1b59dd9
start some blocks in the execute stage
ccadf25
finished needed blocks in ex stage & write xbar
477cd15
setup vector testing infra
df9b55e
cleaned testing folders to setup vector testing
cb34985
changes to struct
9046b62
organize folder structure
da21e4c
fix build errors
7f2a55a
setup folder structure
fe687cf
Fix false verilator comb loop warnings in decode
OxyMagnesium 9cbcb85
compile w/ memory stage
f4cf0dd
verif info
a4cc5ba
Merge branch 'rvv-f23' of github.com:Purdue-SoCET/RISCVBusiness into …
f821e14
added vcsr_addr enum, fixed mem stage so scalar tests pass
maxmichalec 28e087f
Add handling for vset* instructions
OxyMagnesium b811d14
Merge branch 'rvv-f23' of github.com:Purdue-SoCET/RISCVBusiness into …
OxyMagnesium 296afec
create branch
cf5e49c
adding ex logic
c91b878
fixed circular logic on hazard unit exception from invalid_csr assign…
maxmichalec 254a32b
signal for stalling due to vect mem acc serialization, vect-scal move…
maxmichalec 314834e
more integration work for running vector config and mem instructions
af6aea2
updated logic for determining vlmax based on lmul and sew
maxmichalec 19cc7d5
fixed using non-RV32I arch for tests
maxmichalec 34a30c6
fixed using non-RV32I arch for verilator testing
maxmichalec f39407e
add tb to dump vcd
maxmichalec f879810
workingish
65fbdca
Actually output vector decode valid flag
OxyMagnesium 21a515b
create branch
598a9e7
adding ex logic
50218e5
more integration work for running vector config and mem instructions
3442a6a
workingish
f57fc2f
Actually output vector decode valid flag
OxyMagnesium 68f5f6c
Add scripts for running a single test with vsim
OxyMagnesium b93a701
passing simple vsetivli, vvalid clears scalar illegal_isn
maxmichalec d05ab23
delete some files
f46b3a1
fix merge conflicts
685ea25
vcsrs mostly passing. edge case with vtype accesses needs fix
f3aaae8
vcsrs passing
fb19e15
vle32, vse32 passing for vl=4
maxmichalec 549dd1c
Add scalar rf to waveforms
OxyMagnesium 34957ae
output test failed number
66fbbf9
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium f6fbe89
simple vle test passing
619f94a
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
03e2ada
passing simple vmem tests
95cf694
fixed bug in write xbar
b3c3ee7
vlse8 passing, fixed read xbar for SEW8
maxmichalec f90c220
vluexi8 passing, indexed not using rs2
maxmichalec 7b1a2a6
Add script to generate vector load/store tests and some generated uni…
OxyMagnesium 5a9568a
Add option to look in subdirectory for tests
OxyMagnesium 900ba3a
Improve test generation and add more tests, fix incorrect LMUL encoding
OxyMagnesium 3417f70
Add tests for non-register aligned load/stores
OxyMagnesium eade15b
Fix incorrect check offsets
OxyMagnesium 5ef097d
initial logic for arth instructions
d49fcaa
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
00fa59a
edit serializer stall logic
003253c
Initial support for mask load/store instructions
OxyMagnesium aa43cca
Initial whole reg load/store support
OxyMagnesium 0d59f49
consider vstart when masking elements in mem/wb
maxmichalec 6db140a
mask load and store instructions pass
d92d3bd
Add tests for strided load/stores
OxyMagnesium c201d9a
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium 7a5cf1c
Fix accidental intentional errors in strided tests
OxyMagnesium bd26b4b
fixed hazard tracking and added more implementation for the arithmeti…
54bc3e2
fix conflict
36e12da
fix make errors
ae3fb54
implement arith instructions up until 11.6
66c2afa
fix indexed load store test
00eb791
add rest of arithmetic instructions except mul and divide
6937ae1
vstart checking in mem, setting on precise exception, inital mult imp…
maxmichalec 81c8798
implement rv32v divider, vdiv passing
maxmichalec dfd3592
Regenerate load/store unit tests
OxyMagnesium df5972d
Initial work on reduction decoding
OxyMagnesium 3b3508f
Changes for reduction logic
OxyMagnesium f75a761
add code for setting mask bits
428b66b
edit vcontrol for mask setting
89950e4
simple add benchmark
45013b7
edit add benchmark
7d54dfd
Save work on reduction integration
OxyMagnesium 4a52a30
Merge branch 'rvv-decode' into fahad
OxyMagnesium aafee16
kill me
OxyMagnesium fc999a0
Implement scratch register
OxyMagnesium 7759414
Add cross-lane reduction unit
OxyMagnesium bed583f
debug arithm instructions
2655fc8
fixed conflicts
9f106c7
Improve reduction
OxyMagnesium f5529a7
rebased
maxmichalec 0f6815c
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium 5d37a6d
Fix merge error
OxyMagnesium 5c4e878
fixed vmv
maxmichalec 1309240
fixed bugs with mask set instructions
8bfa068
handle mixed width operations
40c56a9
Basic reduction working
OxyMagnesium 342781b
Fix reductions for longer lengths
OxyMagnesium c5b2616
Add more reduction tests
OxyMagnesium 26cedcb
Fix reductions with odd lens and masks
OxyMagnesium ca1df48
Add basic reduction benchmark
OxyMagnesium 8b896e8
Improve handling of small reductions
OxyMagnesium fe1c749
Add unit tests for smaller EEW reductions
OxyMagnesium 328aa2a
Fix stall issue with scratch reg, add reduction min/max test
OxyMagnesium db5e7b2
Improve reduction unit tests more
OxyMagnesium 088aff9
initial implementation of mask instr
ae20715
edit test
3b086fa
small bug fix
578134b
merge reduction and mask calc
046c441
tested and debugged vfirst
530845b
testing and debugging mask instructions
b2742fd
Fix widening reductions
OxyMagnesium 5daf794
Fix masked reduction operations
OxyMagnesium 7a29446
reset vstart on last vector uop
maxmichalec 0a17ba1
vslideup/down
maxmichalec 17b9810
fixed vres in ex datapath
maxmichalec d938006
Implement whole register moves
OxyMagnesium 0d14416
Matrix multiplication self tests
OxyMagnesium 1d8df60
enable rv32m
maxmichalec 05bf92b
Pull in fix for missing scalar multiplier
OxyMagnesium 98730a2
Switch to gcc for compiling matmul kernel, fix vsetvl issue
OxyMagnesium a39d5c2
Move benchmark self-tests into dedicated folder
OxyMagnesium 41d0ac1
Add maxpool test (correctness issue)
OxyMagnesium abd867a
Fix issue with vector-scalar moves
OxyMagnesium 92f6479
Increase max pool input and window size
OxyMagnesium 6490b7b
Add fir filter test (multiplier issue)
OxyMagnesium 1fe6bd9
FIR filter passing
OxyMagnesium cff26c9
Add FIR filter benchmarks
OxyMagnesium d32a9ac
simple segment instructions working
cb8e101
segmented load store instructions tested
fda3a07
merge in seg mem instructions
ccef0ce
vrgather and vslide tests
maxmichalec edd09f9
merge permutation isns
maxmichalec 1c8bf25
use explicit vbank_offset
maxmichalec 2810108
fix vrgather, add support for .v{i,x}
maxmichalec 514a563
Fix bug with branch not flushing fetch-decode latch
OxyMagnesium 921e399
Add hand-tuned matrix multiplication benchmark
OxyMagnesium cf5f0af
Add relu benchmarks
OxyMagnesium d52034d
bug fixes w/ arith instr
c2bb2d0
write more tests
1ae1580
fix add benchmark
87083c8
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
ac9f351
minor edit
4a008b1
different data width benchmarks
939c78e
interrupt handling infrastructure, fix vslide
maxmichalec 219d0c7
Add linear interpolation benchmark
OxyMagnesium a8b50b7
more tests
222af2f
Add doc for RVV compiler information
OxyMagnesium dce2edf
Loop interchange optimization for matmul, add additional tricky test
OxyMagnesium 84db25d
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium de92e03
small typo
dc5bf6a
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
642f5e3
Update overview and compilers docs
OxyMagnesium ee3370b
add microarchitecture file
8a142b8
add results doc
086f8cb
Fix vmerge bug in division version of lininterp
OxyMagnesium 0e6f442
modify arch.md file
84af025
fix hazard tracking when instr depends on 3 operand reads
637358d
add mem stage and permutation documentation
maxmichalec 9861a98
add ex documentation & adjust hazard tracking
6121638
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
e87ef6b
added separate RVV core build target, integration with build system (…
maxmichalec b1e9646
Merge remote-tracking branch 'origin/stage3' into rvv-testing
maxmichalec e5d4af7
update README for new make targets
maxmichalec 6c93581
revert to just one core file (pipeline selection uses macro defines f…
maxmichalec 0d95b88
move mask unit verif files
maxmichalec 507ef8c
fix rv32m div-by-0 return value and rv32v divider to work with update…
maxmichalec cbc134e
tb_core.cc outputs .fst
maxmichalec f395c74
remove sim_build dir
maxmichalec d1b1026
spec compliance for extreme values in rvv divider, load-use fix for v…
maxmichalec 4a5bc0f
edited L1 and bus if + init. mem coalescer
wrcunnin e94ee85
updated logic for mcu
wrcunnin 131e9da
lsc able to select words per vlane from rdata_wide
wrcunnin 95b4fd2
got reads mostly working, needed to checkout to main branch for testing
wrcunnin c2c2adc
fixed bug where next strided addr would not update if vuop == 0
wrcunnin b9e511d
added minor write logic, major read debugging, passes most selftest b…
wrcunnin 1c17878
various bug fixes, strided + unit strided + benchmarks all pass, fail…
wrcunnin 9c64ebc
wide writes work for some tests, switching to rvv-testing to debug
wrcunnin 00554a7
minor bug fixes to vector lane masks
wrcunnin 426cf45
fixed e16 storing issues, every test minus uninterruptable passes
wrcunnin badd086
shared data in repo for SURF purposes
wrcunnin f3e0907
revised pmp matching, added configurable granularity
wrcunnin af9e033
Revert "revised pmp matching, added configurable granularity"
wrcunnin 7802cd8
moved my waveform into sim_scripts
wrcunnin c54a36b
rebasing deleted coalescer interface and altered test config, fixing …
wrcunnin c695d7c
fixed bug in mcu where different back to back strided loads would loa…
wrcunnin 339f1f6
fixed bug where coalesced accesses may not work if the addressing lan…
wrcunnin 117f798
fixed bug where strided addresses did not update if no lanes were masked
wrcunnin 8b4d72f
cleaning up
wrcunnin 2eb4265
merge multicore & vector
maxmichalec 7d4fc4e
atomics support in stage4 pipe (serialized mem)
maxmichalec 865ef4d
removed *_wide signals from LSC-D$ interface, accesses >= noncacheabl…
maxmichalec 5afa225
fixed passthrough bug, reads were being treated as writebacks
maxmichalec 3916c03
cleanup + fixed vmadd/vnmsub bug
maxmichalec 882061e
more cleanup
maxmichalec ab68e3f
coherency_unit: put busy low when aborting so cache can properly tran…
devins2518 a1aa9cc
added vectorized multicore test (matmul_vector.s), Devin see build_al…
wrcunnin 989bfa3
verification/multicore: fix tests and add multicore vector test
devins2518 11a5b6c
bus/caches: fix issues with improperly handled ccabort signals
devins2518 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# RISC-V "V" extension compiler support | ||
|
||
## Introduction | ||
Despite the relative user-friendliness of the RISC-V "V" extension, for most practical purposes it's still critical to have good compiler support so that programs can be generally accelerated without needing specific optimization for the vector core. Fortunately, both GCC and Clang have support for auto-vectorization with the "V" extension and can generate quite performant vector code if given well-written data-parallel programs. | ||
|
||
A good tool to test out and verify auto-vectorization is [Compiler Explorer](https://godbolt.org/). It has toolchains for RISC-V including the trunk versions, which may be critical depending on how much further integration of the RISC-V "V" extension has progressed when you are reading this. You can simply paste C code, set the appropriate compiler and flags, and easily inspect the generated assembly. | ||
|
||
## General notes | ||
### Writing code | ||
The compilers will generally do a pretty good job of vectorizing data-parallel code, but you may occasionally have to nudge them to do what you want. Take the following code as an example: | ||
|
||
[Matrix Multiplication](https://gcc.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYhAAVgjSAAdUBUJbBmc3Dz04hJsBX38gllDwqItMKyyGIQImYgIU908uYtKkiqqCHMCQsMjohUrq2rSG3tb2vILugEoLVFdiZHYOAHpFgGoAUgAmAGYsGn8VxwAJbEcAaTWNAEFl9e3dv0wVgBUASQBZbAvLr82tvCpds93p8rr8/Mg3Fh1ltHL18KgAHQINZbEHfbbgyGPFGOGxsJEotG/Bz/FYrG6vD4/UF3TB7R4AJQA8gB1IRcFZcABsP1p9IOTO87M5PJpOzpDxWzLZGxFvPF/McgqEsu51MufgIKxYTAIxDwqi4awiACFpeyAFRKoVGiIAEWhDrWAHYTRsuFtSCttlyvRsABzOr3OrjRFZcjZhrhcDZeiJcjRxhorZ0ATljnNTXv9Gn92a2Ya5kl9K1TEWk4a5Qe9gez/uTkn9GedXLzpeTW2dbbLGakJcjJa4GizKwiOezqf7GZ7Xsk7q9Hs9K2LGa2yaLba4/qXjb9qYrqcnC4Lda9a%2BnXa9XNT1bLXsPzY0Hed1fdya4zozrZH153S7XS5RNW/pcmGkgaBW3KJimGwVgGFb1rORThmmC7cnGyE5kuqbjuGqZLu6UYen6GjTlsJaptGV7EeGGxftswYaCWXLJk22FbCOEQ0c6Y6zhszFHt6PFxqm0H%2BoJaYUTR/rbnGtbesWwb%2BlGlFocx84rDJYYbBo0GiSO/Fts6WwVpIVHhjRnYVly5Fxi%2BZ5bkhS5btBNkjqJBGdveGw/jGs5bDuzrQdG3YRAZFYttB/FLgm/7IaJm7IVwh5nvhV4hn6LZXqBSklhEqGjv6B5%2BSsWzKfeGg7iBSEdmG6YqRmGZhSO24lkVBm8d6JY8Rm4HBZI7klcZwHGZlFYRBot4BguglbFFk3ZjZ96hn6E2zgVCa9cJpVcoOS48WG%2BXJjeX4RmeYVnkWwazuBwamVs06OeGXF%2BoOBkfutUVrUJnHoaOw4idBB3BmuV5zXGPnnXlkb3mdpUPdm30hoOGWcgRTEORR14LjDmmibOMlnrJy5BX6dFE22ja3iWZkjpIo3hrZKYnozI7OpIYZYQu4n3tZBVlpzqnPdhGNddWkgI8u8beYWkMpjeqWcajAbJtsvY5ZpGsRCZcZM7tf43e58mdthgkTYBV2ch9IptiZr6o5OYak5p5mdhmMkaC6dooiaXyatqur6qoGzGmarIqlayoh/ajrrK6AbOWZV6rrt3N1cdqNSLbTbBgVOYVgF0FeZyc4LqXy4QdmyZptpBEMZpqNduuUizqL9P9hzZesRs4ukXGZ4aaGhYJtmkiQU%2BwYaZOzmVWhutIUTIObgNZ596Whaz6WZWzhrHOQfzGtVquiHMyOnbV5LCHb8u1VCU133KcxTPbIB2s3bXRc66Vq5Pfl1bs%2BtYq3VFIoX7CtUqSdSxjwJg2BswFO6lTqgzfiFYopf0DO7QSOZqxrgPJ1IK2ke5g1MoTUsWVlz8SvBoJ2LEFwAJ2olP041JYBWfMBP6HpXzQKgWfFhuZUpRhXqVNu5Mt5th0jdSClDwwTjZgzIK1NewaWvLbfGtE4reVVqLD8qsQGWwmlFBm%2B4vyX22lEHc5kgqq2vkWMMJljr1x7t2BB24OypwbtBMc1ZozuxHl1CiulZx3hdquFKy4aL9m7MTEyX5xb015nBDWJki4AzxmJBWeEvr3nIQmNma9yLjUCRXOxu5mYRQZslDsAUmHATfuGHOmkhHsyLjhDCeV9p%2BIGv2dM60rzBJDARdxZYpEHgordWi3jN5zntpEuR7ttpzWCuAuibUqx0OrvXUSUZ5KhmhsxOpzUHIGQWszXqHEgl1SEa0ihvVaGcnITmKM9CJJyM5lc/csMWrxPhuw4C4M0l%2Bj%2Bq2Ge3jznqJmiHZ03sti%2ByuP7TAqgYiYGsKHc0XBI5Cmjk6LYTpXRlQgkOAuoltiV1Ki%2BScBUOL4V0nBSidFYKziHIGUG4Sohj20k2G81TvSdiKvZLqRUxycTHiuVW252ZrNKspci19yLbmHLbIsZVJXTPpSWcik1DymRpZ%2Bd624yy9xwl2V82t8KytEnOIRyr3T0JQe6O57oIJcQLi%2BYyxdtYTRjDg/iOl8qr0DJ%2BW2uZozljPA9CIdFTIRqrN8h6rZSIp3rDeVWOl6aUrHEPYKZZjGmQ3FIIuPkArcJ7oS0RdEuI2RwZNVNW19zJQLnRQMwS1yPxZuRKQLbMo5kmnYtMLYEEPT5c268Ply78RfDGOKCVXYxnEshOc2s5yNRzIGUWPlYI2Wcjhal4iExPgZRQhN1yfJcW1gRfcpFxnJPwt9Huc1V3nRbDGnexlSJ03wgGHm4TWES3WrswSS6Ixw1NY7QKgrglzk7DZVctMdL/g4rpMF8YtyRsuluMyoSCESJvvlBM4ttg4S/o2AKylKZmQDaZIKc53FmRlXc8s8Yog01Ivla%2B5ZwIRpaW%2B5KjKF0NKLfhIRZlA31z/qBTq4Fr3/kPN2uBnH%2BHLlAhBSWEa43NuMuWIc/l9yTltslR14tdpnrschtySYqPS3CRa0p7bjJPQDJJZ2AUcKNgbDeclz4hxMTppVFtmGTJFVSo2GSNNYIhks2uahUhxYxi80G1qV66IBS/j69M5V4bsqWoguaUhGqTkqo46MlFb6wTKhqwB4lLaFvEwXD09VvWhjlXuYsOZeohTKsfLC3yBrxgDUTfLEGwrDm65VaLvVF1VkM0xHuZ8J1xtXoR42kWux5RYiBXepEWIlXItefTjKArIz4tRy2HN3QytbmPD8%2BGbLiRwSKoKNMeJMrPh%2BAK0ix66XZndjmN4z4ZrKrSj0TiB6dmob1eqc1vUAQQeunNN0Y27x7j3KZg65qrifOG22rrGxwS0vTF1dXUmRbHGCjrbqooYPIujgaRUCIflzGCnShEWUYZ%2B3BeMw5sNrjTOViBuZhwdjVp%2BdVD5pVnnZUxAu9MCwvWEbpbYiqyofg7r13e96mwxci%2BMny9assDnZkLAs7MxzQUbCxZNq9htc7oiGW%2ByWtzROocM7OtG5Zve1llsqyljJwXZoXBsLHVPnQHKk2juZcJlXLHW8X8ZPz/h1wOPc%2B4HqFOJWEgs97Y1O8gw5YyRV1UfejA2T1HMab08mqZAaWyi7fg5mBCMLYGb3ojBkoV%2B5xngWGWCvqVYnpvY4ttgs24NLlgDGFHBjyzKm%2BV3NcaQ4gf/jMlBP84Etxn0PFSmm3OTIFwjeJJmHMeJFiLluU/lHyz51bnVpbRYLf%2BK1Tgqx544yTbYmeKjTcBEKOYz46hYb3PfTFgsRlZoY6T8aUQZqrgvg05kYxg%2Bp8Q5jch0whpAEDzKSNYDxdhvavQeina54G7VjawQTsri6PxpSlShiSa2wS7fQcTjqkoLppjKRew%2Bx%2ByCArDECYAKCuC0AECorhzorWgqjGjYq4q%2BxQqsGghXCmC6h6h4DBCuAECYCmBQAMCoB%2BC0APATATBfAABu6h6AAcBALAPBEAmoFoHBXBch1gRhyY5hlhvQ%2BoNhOoBAGY9hnBjheANhHhXo/sxAdh7ByAARWoxAbhgRGwEwccsKlwZIZI/AxAKwZh7BeAscnsMKKwKROIHBRo6ReAmwvsGwJokRLo0RsRZRKw8RiR/sAA1qkT7CsLUVkUEfUdUfkfkcUa6F8OUd0RUSQFUewdoHUekYMU0SHMMW0YUR0aUT0TMR4aHHgBaMgLKPkSsNoCIbcL7DirYfMYsRyCsa0THBYS4dHCaNUYscsYUasSIZITETMbESwVcDMQ8bcWSM8W8XCuwTqH4BAFMfKP8ICJSGiGURCKgMgNUcYFqMMNULHCCWCT8TccSAwPgFQOqGUS4SYbQBAC4UHMmNiQaBmB4TwQQF6Gil6EIcmOaBmEIREQiWKP8XSECFSI8bEbCeCVqA4DCbQKCdUfCTCl0bEegDMMEPQCsCUEwDEEoIYSiA6BAIKa4MKZgBMBAByQALQrBQkECRGrCODeBMhnBCDGDKDYAMjGBCAnA3FlExD6iCBUAQCbAbB2gCCPB%2BC3ARAokRCOAMD2lehikSmYDoA6F8ligkgolSEvEZHsFhDEAkAKBDHRF/EAgMlHAnDnDMlxF9FJFaiZFbFpEmgZHQiOBSgCEYrCG5ETFFFRH8ndGkgQCEm8HzHrFgBgDSmimIrIp8Gmh5H2i/Fpl3FRkxkbEtk5HTHdHPFlFvHbAhlkg3DJlnComxGcEECzAMCinEDRnEAKAIkSFXAcBTC0CcARC8CeAcBaCkCoCcCFkKAzBzDYjbA8CkAECaC7lTDVGRCJj7kcCSBHlPlnmcC8AKAgCJiPknm7mkBwCwAwCIAgAIrIqKEkDkCUBVDAAKDKCGAlBCAICoAADux595aALAMQdAuoSQqF/gtAGF2Fx5p5%2BFhF9A4QyAwAQOpANFdAYQAQrACwvALFdFTIihFFOFP5MFyAlwxAyFf5pAQlFQ%2BAx5vA/AggIgYg7ALccl8gSgagP5ugDQBgRgIApgxg5gmhwQAFkAUwqAMQZQAFHAvAqAuhUZeAWAxlPxpAxArgggeAbATwqALgjlUwV5sw8wegcIfgpF6FmFAl3AvAWFxA4pnAPAe5B535IFv5HA2AqgsFRACRqgIEKpRYKwwAyAyAaMCIsoEAjgXouAhAfRvwXAEwvAwFWg2hpAr5a0H5X5pALAIAa4CIRUT4%2BEVGe6OOpAVF1l4lAFQFT5Uw4FUF3FYQCFEAM14QxAuhD0KpDFBVeorl1RfAdAShG5lAwQP5wQfgVQAAnrFbwEdcwMQCdUyMENoO2edcxagCwGwIIEyAwLQGdUlVgAocAI4GILQJZfeVgDqEYOIN9XgJwdYHgLZZZaeUJYoQsPeZqCUD%2BYZdFddc4FgD%2BXIR1RFaQLZcQMEPEJgHaJgKDcAJoTpSBVMFQAYMhQAGp4CYBYVMhIq4WyWyAKXiDKWyCKAqDqBJWaX6CGAmBmD6DyE%2BVnnmVJCWUqlMiygqk6hzDIg4pLUPTuXGAABetlD0qgKwStTAR10pdAMQD0BtLAS1uhhtqgKptAGJ0p6AJ1zALAXhKw1lhN%2BoDl8AvlJQ7ZSQ9gSJAw9QElSJownQ4QDQGQiQAgwd6Q8QMdDA4d%2BQXQQwft0NAgLQ/QLgdQgV6dZQWdbQDwYwqdFgfQNQOdgwZdIwxdEdEgvl15AVNV%2BgCVQ1P555HAUoLwQgjgKpDNiRK1R1BACgkR%2BVhVEAG1DA1RkREAFVGVtwa4tVD5E1UwCAmATAWA4QTlrVvAHVrYCIoYOE02ZYVKb8w1yV/5gFy9NNL5b5LdHAWwiVp5HddVK999GwT9I1Vl19DVUwhNCQdgkgQAA%3D%3D) | ||
|
||
There's two things worth noting here: | ||
* This is a pretty simple implementation of matrix multiplication. Other than the basic loop interchange optimization, there are no other weird intrinsics or anything like that required for it to vectorize. As long as you write clean, independent loops, there's a good chance the compilers will be able to vectorize it. | ||
* The input pointers are marked `restrict` for a reason. Pointer aliasing is the bane of auto-vectorization, and if you're not careful, you can end up with fully-scalar inner loops. For instance, the Embench matrix multiplication implementation suffers from this problem and generates fully scalar code, which is why we see no uplift. | ||
|
||
### Compiler flags | ||
To enable auto-vectorization, you will need to specify a vector extension as part of your `march=` flag, e.g. `march=rv32im_zve32x`. It's a good idea to also specify `-mabi=ilp32` to make sure that hardware floating point instructions are not generated. | ||
|
||
Additionally, you will want to use `-O2` to generate good quality vector code. Generally it seems that `-O3` results in both compilers generating a bunch of unrolled loops and extra code, which will probably not work so well with the limited instruction cache our CPU has to work with. `-Os` may also work, but makes GCC more reluctant to generate vector instructions. Clang seems happy to vectorize away with `-Os` though, so that's probably the best option if you're using it. | ||
|
||
Finally, a useful flag to add is `-mrvv-max-lmul=dynamic` for GCC. This makes GCC actually use an LMUL greater than `m1`, which is good because higher LMUL generally means lower overhead and higher performance. Clang also has some similar flags you can use to tune the vector code generation, but I haven't been able to find good documentation on those. | ||
|
||
### Code quality | ||
As mentioned before, generally the vector code generated by the compilers is of decent quality. However, there are a couple of quirks to be aware of: | ||
|
||
* While GCC does a good job of using RISC-V "V" extension-specific features, Clang generates code that looks more like traditional vector ISAs. This leads to it often having worse code size and performance compared to GCC. | ||
* If you write code with a lot of weird memory access patterns or conditionals, the compilers may end up outputting code that actually runs slower than scalar. This is mostly because some instructions (like `vrgather` and the segmented load/store instructions) have low-performance implementations, and using them in a hot loop may not work well. | ||
|
||
It's a good idea to always inspect the generated assembly and make sure there's nothing too strange going on in there. | ||
|
||
## Setting up | ||
### GCC | ||
GCC is set to integrate auto-vectorization into the mainline builds in the [GCC 14 release](https://gcc.gnu.org/gcc-14/changes.html). As of May 2024, auto-vectorization already seems to be working pretty well on the trunk builds, and as long as you're careful when setting up the toolchain you should be able to build the trunk version locally and compile stuff that way. | ||
|
||
To build trunk GCC, you will need a system with the appropriate dependencies. As of this writing, `asicfab` does not appear to have recent enough tool versions to be able to do this, so you may have to try `asicfabu` or ideally a Linux (or Windows with WSL) machine where you have the ability to install additional packages. | ||
|
||
First, go ahead and clone the [RISC-V GNU toolchain](https://github.com/riscv-collab/riscv-gnu-toolchain) and install the relevant dependencies mentioned there: | ||
|
||
git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git | ||
|
||
Next, you will need to configure the toolchain with the appropriate options. For some reason, the instructions in the repository tell you to run the configure script from the toolchain directory, but I've had better luck creating an adjacent directory and running the script from there: | ||
|
||
mkdir riscv32-gcc | ||
cd riscv32-gcc | ||
../riscv-gnu-toolchain/configure --prefix=`pwd` --with-arch=rv32im_zve32x_zicsr_zifencei --with-abi=ilp32 | ||
|
||
Here, the `--with-arch` flag needs to have the extensions supported by the core and ONLY the extensions supported by the core. If you are not specific enough (i.e. you use `g` or `v`), the standard libraries may get built with incompatible instructions (like floating point) that will cause you to fault even if you later provide the correct flags during compilation. | ||
|
||
Now, you're ready to run the build (expect this to take some time): | ||
|
||
make -j `nproc` | ||
|
||
Once the build is done, you'll want to add the `bin` directory to your PATH so that you can easily access the compiler. Don't worry about it conflicting with your native GCC installation since all the binaries are prefixed with `riscv32-unknown-elf-` so there should be no conflicts. | ||
|
||
### Clang | ||
Clang already has auto-vectorization support for the RISC-V "V" extension integrated into the latest release. In theory, the process of setting up Clang for the vector core should be as simple as getting the appropriate release and adding it to your PATH. However, this has not been tested so be warned that you may run into snags. | ||
|
||
## Contributors | ||
- Om Gupta (guptao@purdue.edu) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.