Skip to content

Vector Extension #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 222 commits into
base: stage3
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 216 commits
Commits
Show all changes
222 commits
Select commit Hold shift + click to select a range
e596d2e
Copy relevant rv32v files from previous work
OxyMagnesium Nov 4, 2023
31419fc
Add interface for vector register banks
OxyMagnesium Nov 4, 2023
a7015c6
Add vector exec lane interface and redo vector FU control enums
OxyMagnesium Nov 7, 2023
78f9588
Add operand alignment unit interface
OxyMagnesium Nov 7, 2023
0b607d2
load-store controller init + python scripts change
maxmichalec Nov 7, 2023
d7a2375
updated elf2hex path
maxmichalec Nov 8, 2023
a0c3330
Move rv32v include files to dedicated subfolder
OxyMagnesium Nov 14, 2023
ccf2a7f
initial uop queue integration done. still need to address misa.S file
Nov 19, 2023
2541508
testing dispatch_size=queue_length with extra NOPs
Nov 20, 2023
b20529c
Add register class identifier bits to select signals
OxyMagnesium Nov 20, 2023
8d0b609
modified epc handling and ran other tests
Nov 28, 2023
e8b1a15
Add initial vector decode logic
OxyMagnesium Dec 4, 2023
a5d8e39
Add vector control unit interface and update existing interfaces and …
OxyMagnesium Dec 4, 2023
b1da13a
moved decode from execute to the uop stage
Dec 5, 2023
f4b5156
Save work for end of semester
OxyMagnesium Dec 11, 2023
84715ad
init mdbook doc
maxmichalec Dec 7, 2023
a67c4e8
Create rv32v_overview.md
OxyMagnesium Dec 11, 2023
d5ca18c
Locally host pipeline diagram image
OxyMagnesium Dec 11, 2023
8778968
Create uop queue documentation
1fahadaloufi Dec 14, 2023
5336409
change hazard unit if
Dec 15, 2023
d3377f9
added initial documentation for uop queue
Dec 15, 2023
194f507
clean up queue files for PR
Jan 18, 2024
4360867
fix scalar decode block comments
Jan 18, 2024
c76a96e
add uop_queue.sv file
Jan 18, 2024
6d9946f
create a new folder for uop arch
Jan 21, 2024
1c9d4d8
create new folder for uop arch
Jan 21, 2024
155d5dd
merge uop_queue
Feb 1, 2024
74691aa
checkout old stage3 folder
Feb 1, 2024
1b59dd9
start some blocks in the execute stage
Feb 5, 2024
ccadf25
finished needed blocks in ex stage & write xbar
Feb 11, 2024
477cd15
setup vector testing infra
Feb 12, 2024
df9b55e
cleaned testing folders to setup vector testing
Feb 12, 2024
cb34985
changes to struct
Feb 12, 2024
9046b62
organize folder structure
Feb 14, 2024
da21e4c
fix build errors
Feb 15, 2024
7f2a55a
setup folder structure
Feb 15, 2024
fe687cf
Fix false verilator comb loop warnings in decode
OxyMagnesium Feb 15, 2024
9cbcb85
compile w/ memory stage
Feb 15, 2024
f4cf0dd
verif info
Feb 15, 2024
a4cc5ba
Merge branch 'rvv-f23' of github.com:Purdue-SoCET/RISCVBusiness into …
Feb 15, 2024
f821e14
added vcsr_addr enum, fixed mem stage so scalar tests pass
maxmichalec Feb 15, 2024
28e087f
Add handling for vset* instructions
OxyMagnesium Feb 15, 2024
b811d14
Merge branch 'rvv-f23' of github.com:Purdue-SoCET/RISCVBusiness into …
OxyMagnesium Feb 15, 2024
296afec
create branch
Feb 17, 2024
cf5e49c
adding ex logic
Feb 17, 2024
c91b878
fixed circular logic on hazard unit exception from invalid_csr assign…
maxmichalec Feb 17, 2024
254a32b
signal for stalling due to vect mem acc serialization, vect-scal move…
maxmichalec Feb 18, 2024
314834e
more integration work for running vector config and mem instructions
Feb 19, 2024
af6aea2
updated logic for determining vlmax based on lmul and sew
maxmichalec Feb 19, 2024
19cc7d5
fixed using non-RV32I arch for tests
maxmichalec Feb 19, 2024
34a30c6
fixed using non-RV32I arch for verilator testing
maxmichalec Feb 19, 2024
f39407e
add tb to dump vcd
maxmichalec Feb 19, 2024
f879810
workingish
Feb 19, 2024
65fbdca
Actually output vector decode valid flag
OxyMagnesium Feb 19, 2024
21a515b
create branch
Feb 17, 2024
598a9e7
adding ex logic
Feb 17, 2024
50218e5
more integration work for running vector config and mem instructions
Feb 19, 2024
3442a6a
workingish
Feb 19, 2024
f57fc2f
Actually output vector decode valid flag
OxyMagnesium Feb 19, 2024
68f5f6c
Add scripts for running a single test with vsim
OxyMagnesium Feb 19, 2024
b93a701
passing simple vsetivli, vvalid clears scalar illegal_isn
maxmichalec Feb 20, 2024
d05ab23
delete some files
Feb 21, 2024
f46b3a1
fix merge conflicts
Feb 21, 2024
685ea25
vcsrs mostly passing. edge case with vtype accesses needs fix
Feb 21, 2024
f3aaae8
vcsrs passing
Feb 21, 2024
fb19e15
vle32, vse32 passing for vl=4
maxmichalec Feb 21, 2024
549dd1c
Add scalar rf to waveforms
OxyMagnesium Feb 21, 2024
34957ae
output test failed number
Feb 21, 2024
66fbbf9
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium Feb 21, 2024
f6fbe89
simple vle test passing
Feb 21, 2024
619f94a
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
Feb 21, 2024
03e2ada
passing simple vmem tests
Feb 21, 2024
95cf694
fixed bug in write xbar
Feb 22, 2024
b3c3ee7
vlse8 passing, fixed read xbar for SEW8
maxmichalec Feb 22, 2024
f90c220
vluexi8 passing, indexed not using rs2
maxmichalec Feb 22, 2024
7b1a2a6
Add script to generate vector load/store tests and some generated uni…
OxyMagnesium Feb 22, 2024
5a9568a
Add option to look in subdirectory for tests
OxyMagnesium Feb 22, 2024
900ba3a
Improve test generation and add more tests, fix incorrect LMUL encoding
OxyMagnesium Feb 23, 2024
3417f70
Add tests for non-register aligned load/stores
OxyMagnesium Feb 23, 2024
eade15b
Fix incorrect check offsets
OxyMagnesium Feb 23, 2024
5ef097d
initial logic for arth instructions
Feb 23, 2024
d49fcaa
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
Feb 23, 2024
00fa59a
edit serializer stall logic
Feb 23, 2024
003253c
Initial support for mask load/store instructions
OxyMagnesium Feb 28, 2024
aa43cca
Initial whole reg load/store support
OxyMagnesium Feb 28, 2024
0d59f49
consider vstart when masking elements in mem/wb
maxmichalec Feb 28, 2024
6db140a
mask load and store instructions pass
Feb 28, 2024
d92d3bd
Add tests for strided load/stores
OxyMagnesium Feb 28, 2024
c201d9a
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium Feb 28, 2024
7a5cf1c
Fix accidental intentional errors in strided tests
OxyMagnesium Feb 28, 2024
bd26b4b
fixed hazard tracking and added more implementation for the arithmeti…
Feb 29, 2024
54bc3e2
fix conflict
Feb 29, 2024
36e12da
fix make errors
Feb 29, 2024
ae3fb54
implement arith instructions up until 11.6
Mar 1, 2024
66c2afa
fix indexed load store test
Mar 1, 2024
00eb791
add rest of arithmetic instructions except mul and divide
Mar 3, 2024
6937ae1
vstart checking in mem, setting on precise exception, inital mult imp…
maxmichalec Mar 3, 2024
81c8798
implement rv32v divider, vdiv passing
maxmichalec Mar 4, 2024
dfd3592
Regenerate load/store unit tests
OxyMagnesium Mar 5, 2024
df5972d
Initial work on reduction decoding
OxyMagnesium Mar 6, 2024
3b3508f
Changes for reduction logic
OxyMagnesium Mar 6, 2024
f75a761
add code for setting mask bits
Mar 8, 2024
428b66b
edit vcontrol for mask setting
Mar 15, 2024
89950e4
simple add benchmark
Mar 15, 2024
45013b7
edit add benchmark
Mar 15, 2024
7d54dfd
Save work on reduction integration
OxyMagnesium Mar 16, 2024
4a52a30
Merge branch 'rvv-decode' into fahad
OxyMagnesium Mar 16, 2024
aafee16
kill me
OxyMagnesium Mar 16, 2024
fc999a0
Implement scratch register
OxyMagnesium Mar 17, 2024
7759414
Add cross-lane reduction unit
OxyMagnesium Mar 17, 2024
bed583f
debug arithm instructions
Mar 18, 2024
2655fc8
fixed conflicts
Mar 18, 2024
9f106c7
Improve reduction
OxyMagnesium Mar 18, 2024
f5529a7
rebased
maxmichalec Mar 18, 2024
0f6815c
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium Mar 18, 2024
5d37a6d
Fix merge error
OxyMagnesium Mar 18, 2024
5c4e878
fixed vmv
maxmichalec Mar 18, 2024
1309240
fixed bugs with mask set instructions
Mar 19, 2024
8bfa068
handle mixed width operations
Mar 19, 2024
40c56a9
Basic reduction working
OxyMagnesium Mar 20, 2024
342781b
Fix reductions for longer lengths
OxyMagnesium Mar 20, 2024
c5b2616
Add more reduction tests
OxyMagnesium Mar 20, 2024
26cedcb
Fix reductions with odd lens and masks
OxyMagnesium Mar 20, 2024
ca1df48
Add basic reduction benchmark
OxyMagnesium Mar 20, 2024
8b896e8
Improve handling of small reductions
OxyMagnesium Mar 20, 2024
fe1c749
Add unit tests for smaller EEW reductions
OxyMagnesium Mar 20, 2024
328aa2a
Fix stall issue with scratch reg, add reduction min/max test
OxyMagnesium Mar 21, 2024
db5e7b2
Improve reduction unit tests more
OxyMagnesium Mar 21, 2024
088aff9
initial implementation of mask instr
Mar 24, 2024
ae20715
edit test
Mar 24, 2024
3b086fa
small bug fix
Mar 24, 2024
578134b
merge reduction and mask calc
Mar 25, 2024
046c441
tested and debugged vfirst
Mar 26, 2024
530845b
testing and debugging mask instructions
Mar 30, 2024
b2742fd
Fix widening reductions
OxyMagnesium Mar 31, 2024
5daf794
Fix masked reduction operations
OxyMagnesium Apr 1, 2024
7a29446
reset vstart on last vector uop
maxmichalec Mar 19, 2024
0a17ba1
vslideup/down
maxmichalec Apr 1, 2024
17b9810
fixed vres in ex datapath
maxmichalec Apr 1, 2024
d938006
Implement whole register moves
OxyMagnesium Apr 3, 2024
0d14416
Matrix multiplication self tests
OxyMagnesium Apr 3, 2024
1d8df60
enable rv32m
maxmichalec Apr 3, 2024
05bf92b
Pull in fix for missing scalar multiplier
OxyMagnesium Apr 5, 2024
98730a2
Switch to gcc for compiling matmul kernel, fix vsetvl issue
OxyMagnesium Apr 5, 2024
a39d5c2
Move benchmark self-tests into dedicated folder
OxyMagnesium Apr 5, 2024
41d0ac1
Add maxpool test (correctness issue)
OxyMagnesium Apr 5, 2024
abd867a
Fix issue with vector-scalar moves
OxyMagnesium Apr 5, 2024
92f6479
Increase max pool input and window size
OxyMagnesium Apr 5, 2024
6490b7b
Add fir filter test (multiplier issue)
OxyMagnesium Apr 6, 2024
1fe6bd9
FIR filter passing
OxyMagnesium Apr 6, 2024
cff26c9
Add FIR filter benchmarks
OxyMagnesium Apr 6, 2024
d32a9ac
simple segment instructions working
Apr 11, 2024
cb8e101
segmented load store instructions tested
Apr 15, 2024
fda3a07
merge in seg mem instructions
Apr 15, 2024
ccef0ce
vrgather and vslide tests
maxmichalec Apr 15, 2024
edd09f9
merge permutation isns
maxmichalec Apr 15, 2024
1c8bf25
use explicit vbank_offset
maxmichalec Apr 16, 2024
2810108
fix vrgather, add support for .v{i,x}
maxmichalec Apr 16, 2024
514a563
Fix bug with branch not flushing fetch-decode latch
OxyMagnesium Apr 17, 2024
921e399
Add hand-tuned matrix multiplication benchmark
OxyMagnesium Apr 17, 2024
cf5f0af
Add relu benchmarks
OxyMagnesium Apr 19, 2024
d52034d
bug fixes w/ arith instr
Apr 21, 2024
c2bb2d0
write more tests
Apr 21, 2024
1ae1580
fix add benchmark
Apr 21, 2024
87083c8
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
Apr 21, 2024
ac9f351
minor edit
Apr 21, 2024
4a008b1
different data width benchmarks
Apr 21, 2024
939c78e
interrupt handling infrastructure, fix vslide
maxmichalec Apr 22, 2024
219d0c7
Add linear interpolation benchmark
OxyMagnesium Apr 23, 2024
a8b50b7
more tests
Apr 23, 2024
222af2f
Add doc for RVV compiler information
OxyMagnesium Apr 26, 2024
dce2edf
Loop interchange optimization for matmul, add additional tricky test
OxyMagnesium Apr 26, 2024
84db25d
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
OxyMagnesium Apr 26, 2024
de92e03
small typo
Apr 27, 2024
dc5bf6a
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
Apr 27, 2024
642f5e3
Update overview and compilers docs
OxyMagnesium Apr 28, 2024
ee3370b
add microarchitecture file
Apr 28, 2024
8a142b8
add results doc
Apr 28, 2024
086f8cb
Fix vmerge bug in division version of lininterp
OxyMagnesium Apr 29, 2024
0e6f442
modify arch.md file
Apr 29, 2024
84af025
fix hazard tracking when instr depends on 3 operand reads
May 2, 2024
637358d
add mem stage and permutation documentation
maxmichalec May 2, 2024
9861a98
add ex documentation & adjust hazard tracking
May 2, 2024
6121638
Merge branch 'rvv-testing' of github.com:Purdue-SoCET/RISCVBusiness i…
May 2, 2024
e87ef6b
added separate RVV core build target, integration with build system (…
maxmichalec Jul 16, 2024
b1e9646
Merge remote-tracking branch 'origin/stage3' into rvv-testing
maxmichalec Jul 16, 2024
e5d4af7
update README for new make targets
maxmichalec Jul 16, 2024
6c93581
revert to just one core file (pipeline selection uses macro defines f…
maxmichalec Jul 16, 2024
0d95b88
move mask unit verif files
maxmichalec Jul 18, 2024
507ef8c
fix rv32m div-by-0 return value and rv32v divider to work with update…
maxmichalec Jul 18, 2024
cbc134e
tb_core.cc outputs .fst
maxmichalec Jul 24, 2024
f395c74
remove sim_build dir
maxmichalec Jul 25, 2024
d1b1026
spec compliance for extreme values in rvv divider, load-use fix for v…
maxmichalec Jul 25, 2024
4a5bc0f
edited L1 and bus if + init. mem coalescer
wrcunnin Jun 10, 2024
e94ee85
updated logic for mcu
wrcunnin Jun 12, 2024
131e9da
lsc able to select words per vlane from rdata_wide
wrcunnin Jun 12, 2024
95b4fd2
got reads mostly working, needed to checkout to main branch for testing
wrcunnin Jun 12, 2024
c2c2adc
fixed bug where next strided addr would not update if vuop == 0
wrcunnin Jun 12, 2024
b9e511d
added minor write logic, major read debugging, passes most selftest b…
wrcunnin Jun 14, 2024
1c17878
various bug fixes, strided + unit strided + benchmarks all pass, fail…
wrcunnin Jun 15, 2024
9c64ebc
wide writes work for some tests, switching to rvv-testing to debug
wrcunnin Jun 19, 2024
00554a7
minor bug fixes to vector lane masks
wrcunnin Jun 19, 2024
426cf45
fixed e16 storing issues, every test minus uninterruptable passes
wrcunnin Jun 24, 2024
badd086
shared data in repo for SURF purposes
wrcunnin Jul 2, 2024
f3e0907
revised pmp matching, added configurable granularity
wrcunnin Jul 22, 2024
af9e033
Revert "revised pmp matching, added configurable granularity"
wrcunnin Jul 30, 2024
7802cd8
moved my waveform into sim_scripts
wrcunnin Jul 31, 2024
c54a36b
rebasing deleted coalescer interface and altered test config, fixing …
wrcunnin Jul 31, 2024
c695d7c
fixed bug in mcu where different back to back strided loads would loa…
wrcunnin Aug 2, 2024
339f1f6
fixed bug where coalesced accesses may not work if the addressing lan…
wrcunnin Aug 6, 2024
117f798
fixed bug where strided addresses did not update if no lanes were masked
wrcunnin Aug 7, 2024
8b4d72f
cleaning up
wrcunnin Aug 7, 2024
2eb4265
merge multicore & vector
maxmichalec Aug 9, 2024
7d4fc4e
atomics support in stage4 pipe (serialized mem)
maxmichalec Aug 9, 2024
865ef4d
removed *_wide signals from LSC-D$ interface, accesses >= noncacheabl…
maxmichalec Aug 15, 2024
5afa225
fixed passthrough bug, reads were being treated as writebacks
maxmichalec Aug 15, 2024
3916c03
cleanup + fixed vmadd/vnmsub bug
maxmichalec Aug 24, 2024
882061e
more cleanup
maxmichalec Aug 26, 2024
ab68e3f
coherency_unit: put busy low when aborting so cache can properly tran…
devins2518 Aug 27, 2024
a1aa9cc
added vectorized multicore test (matmul_vector.s), Devin see build_al…
wrcunnin Sep 22, 2024
989bfa3
verification/multicore: fix tests and add multicore vector test
devins2518 Sep 28, 2024
11a5b6c
bus/caches: fix issues with improperly handled ccabort signals
devins2518 Sep 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# ignore the build and run files
build/
sim_out/
sim_build/
obj_dir/
rvb_out/
# ignore memory files from test cases
Expand Down Expand Up @@ -39,6 +40,8 @@ run_tests_cache.json
# Ignore log files and Verilator outputs
*.log
*.fst
*.vcd
*.wlf
*.gtkw
memsim.dump
meminit.bin
Expand Down
64 changes: 63 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
ROOT := $(shell pwd)

TEST_FILE_NAME ?= add


RISCV := $(ROOT)/source_code
RISCV_CORE := $(RISCV)/standard_core
PIPELINE := $(RISCV)/pipelines
Expand All @@ -24,19 +27,26 @@ CACHE_FILES := $(CACHES)/caches_wrapper.sv $(CACHES)/pass_through/pass_through_c
SPARCE_FILES := $(SPARCE)/sparce_wrapper.sv $(SPARCE)/sparce_disabled/sparce_disabled.sv $(SPARCE)/sparce_enabled/sparce_cfid.sv $(SPARCE)/sparce_enabled/sparce_enabled.sv $(SPARCE)/sparce_enabled/sparce_psru.sv $(SPARCE)/sparce_enabled/sparce_sasa_table.sv $(SPARCE)/sparce_enabled/sparce_sprf.sv $(SPARCE)/sparce_enabled/sparce_svc.sv
RISCV_BUS_FILES := $(RISCV_BUS)/generic_nonpipeline.sv $(RISCV_BUS)/ahb.sv
TRACKER_FILES := $(RISCV)/trackers/cpu_tracker.sv $(RISCV)/trackers/branch_tracker.sv

COMPONENT_FILES_SV := $(CORE_PKG_FILES) $(RISC_MGMT_FILES) $(RISC_EXT_FILES) $(CORE_FILES) $(RV32C_FILES) $(PIPELINE_FILES) $(SPARCE_FILES) $(PREDICTOR_FILES) $(PRIV_FILES) $(CACHE_FILES) $(RISCV_BUS_FILES) $(TRACKER_FILES)

TOP_ENTITY := RISCVBusiness

HEADER_FILES := -I$(RISCV)/include

# Default config
CFG_FILE := example.yml
CORE := RISCVBusiness


define USAGE
@echo "----------------------------------------------------------------------"
@echo " Build Targets:"
@echo " config: config core with example.yml"
@echo " verilate: Invoke 'FuseSoC run --build' to build Verilator target"
@echo " verilate.%: Invoke 'FuseSoC run --build' to build specific"
@echo " Verilator target:"
@echo " - 's' for 3-stage scalar pipeline"
@echo " - 'v' for 4-stage vector pipeline"
@echo " xcelium: Invoke 'FuseSoC run --build' to build Xcelium target"
@echo " lint: Invoke 'FuseSoC run --build' to run the Verilator lint target"
@echo " clean: Remove build directories"
Expand All @@ -50,19 +60,63 @@ endef
default:
$(USAGE)

##
# Define config (varset.%) here
varset.s:
$(eval CFG_FILE := example.yml)

varset.v:
$(eval CFG_FILE := rvv.rvbcfg.yml)
##

config:
@echo "----------------------"
@echo " Running config_core"
@echo "----------------------"
@python3 scripts/config_core.py example.yml

config.%: varset.%
@echo "----------------------"
@echo " Running config_core"
@echo "----------------------"
@python3 scripts/config_core.py $(CFG_FILE)

test_asm_file:
python3 compile_asm_for_self.py -a RV32V verification/self-tests/RV32V/$(TEST_FILE_NAME).S
riscv64-unknown-elf-objcopy -O binary sim_out/RV32V/$(TEST_FILE_NAME)/$(TEST_FILE_NAME).elf sim_out/RV32V/$(TEST_FILE_NAME)/$(TEST_FILE_NAME).bin
./rvb_out/sim-verilator/Vtop_core sim_out/RV32V/$(TEST_FILE_NAME)/$(TEST_FILE_NAME).bin


# test_verilog_file: $(VERILOG_FILE) $(VERILOG_TB_FILE)
# @echo "----------------------------------------------------------------"
# @echo "Creating executable for source compilation ....."
# @echo "----------------------------------------------------------------\n\n"
# @mkdir -p ./sim_build/
# @ iverilog -g2012 -gspecify -Tmax -v -o ./sim_build/sim_file.vvp $(VERILOG_FILE) $(VERILOG_TB_FILE)
# @echo "\n\n"
# @echo "Compilation complete\n\n"

# @echo "----------------------------------------------------------------"
# @echo "Simulating source ....."
# @echo "----------------------------------------------------------------"
# @vvp ./sim_build/sim_file.vvp
# @ gtkwave dump.vcd


verilate: config
@fusesoc --cores-root . run --setup --build --build-root rvb_out --target sim --tool verilator socet:riscv:RISCVBusiness --make_options='-j'
@echo "------------------------------------------------------------------"
@echo "Build finished, you can run with 'fusesoc run', or by navigating"
@echo "to the build directory created by FuseSoC and using the Makefile there."
@echo "------------------------------------------------------------------"

verilate.%: config.%
@fusesoc --cores-root . run --setup --build --build-root rvb_out --target sim --tool verilator socet:riscv:$(CORE) --make_options='-j'
@echo "------------------------------------------------------------------"
@echo "Build finished, you can run with 'fusesoc run', or by navigating"
@echo "to the build directory created by FuseSoC and using the Makefile there."
@echo "------------------------------------------------------------------"

no_mem: config
@fusesoc --cores-root . run --setup --build --build-root rvb_out --target no_mc --tool verilator socet:riscv:RISCVBusiness --make_options='-j'
@echo "------------------------------------------------------------------"
Expand All @@ -82,7 +136,15 @@ lint: config
clean:
rm -rf build
rm -rf rvb_out
rm *.vcd
rm *.wlf

clean_waveforms:
rm *.wlf
rm *.vcd

veryclean:
rm -rf fusesoc_libraries
rm fusesoc.conf

.PHONY: test_verilog_file clean_waveforms
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,16 @@ make config
# or python3 scripts/config_core.py <custom>.yml
# if you want to use a config other than example.yml

make verilate # build with Verilator, or...
make xcelium # build with Xcelium
# Build the core
make verilate # build default config with Verilator, or...
make xcelium # build default config with Xcelium, or...
make verilate.% # build '%' config with Verilator
# % configs: 's' for 3-stage scalar, 'v' for 4-stage vector
```

> Congrats! All dependencies are now set up. Now you can run simulations/tests:
> Congrats! All dependencies are now set up. Based on the configuration used (the extensions it supports), you may need to modify the `"march"` string in `run_tests_config.json` which is used by the compiler.
>
> Now you can run simulations/tests:


```bash
Expand Down
6 changes: 4 additions & 2 deletions RISCVBusiness.core
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
CAPI=2:
name: socet:riscv:RISCVBusiness:0.1.1
name: socet:riscv:RISCVBusiness:0.2.0
description: RISC-V Core for AFTx series

filesets:
Expand All @@ -10,6 +10,7 @@ filesets:
- "socet:bus-components:apb_if"
- "socet:riscv:packages"
- "socet:riscv:stage3"
- "socet:riscv:stage4"
- "socet:riscv:priv"
- "socet:riscv:caches"
#- "socet:riscv:risc_mgmt"
Expand All @@ -18,6 +19,7 @@ filesets:
- "socet:riscv:rv32a"
- "socet:riscv:rv32c"
- "socet:riscv:rv32m"
- "socet:riscv:rv32v"
- "socet:riscv:rv32b"
files:
- source_code/branch_predictors/branch_predictor_wrapper.sv
Expand Down Expand Up @@ -111,7 +113,7 @@ targets:
- "tool_verilator? (top_core)"
tools:
verilator:
verilator_options: ["-Wno-SYMRSVDWORD", "-Wno-lint", "--trace", "--trace-fst", "--trace-structs"]
verilator_options: ["-Wno-SYMRSVDWORD", "-Wno-lint", "--trace", "--trace-structs"]

fpga:
<<: *default
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added doc/src/rv32v/images/decode_stage_design.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/src/rv32v/images/execute_stage.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/src/rv32v/images/mask_set_layer.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/src/rv32v/images/rvv_mem.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions doc/src/rv32v/images/rvv_naive_detailed_f23.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/src/rv32v/images/rvv_pipeline.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/src/rv32v/images/unit_seg_load.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/src/rv32v/images/uop_queue_design.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 61 additions & 0 deletions doc/src/rv32v/rv32v_compilers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# RISC-V "V" extension compiler support

## Introduction
Despite the relative user-friendliness of the RISC-V "V" extension, for most practical purposes it's still critical to have good compiler support so that programs can be generally accelerated without needing specific optimization for the vector core. Fortunately, both GCC and Clang have support for auto-vectorization with the "V" extension and can generate quite performant vector code if given well-written data-parallel programs.

A good tool to test out and verify auto-vectorization is [Compiler Explorer](https://godbolt.org/). It has toolchains for RISC-V including the trunk versions, which may be critical depending on how much further integration of the RISC-V "V" extension has progressed when you are reading this. You can simply paste C code, set the appropriate compiler and flags, and easily inspect the generated assembly.

## General notes
### Writing code
The compilers will generally do a pretty good job of vectorizing data-parallel code, but you may occasionally have to nudge them to do what you want. Take the following code as an example:

[Matrix Multiplication](https://gcc.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYhAAVgjSAAdUBUJbBmc3Dz04hJsBX38gllDwqItMKyyGIQImYgIU908uYtKkiqqCHMCQsMjohUrq2rSG3tb2vILugEoLVFdiZHYOAHpFgGoAUgAmAGYsGn8VxwAJbEcAaTWNAEFl9e3dv0wVgBUASQBZbAvLr82tvCpds93p8rr8/Mg3Fh1ltHL18KgAHQINZbEHfbbgyGPFGOGxsJEotG/Bz/FYrG6vD4/UF3TB7R4AJQA8gB1IRcFZcABsP1p9IOTO87M5PJpOzpDxWzLZGxFvPF/McgqEsu51MufgIKxYTAIxDwqi4awiACFpeyAFRKoVGiIAEWhDrWAHYTRsuFtSCttlyvRsABzOr3OrjRFZcjZhrhcDZeiJcjRxhorZ0ATljnNTXv9Gn92a2Ya5kl9K1TEWk4a5Qe9gez/uTkn9GedXLzpeTW2dbbLGakJcjJa4GizKwiOezqf7GZ7Xsk7q9Hs9K2LGa2yaLba4/qXjb9qYrqcnC4Lda9a%2BnXa9XNT1bLXsPzY0Hed1fdya4zozrZH153S7XS5RNW/pcmGkgaBW3KJimGwVgGFb1rORThmmC7cnGyE5kuqbjuGqZLu6UYen6GjTlsJaptGV7EeGGxftswYaCWXLJk22FbCOEQ0c6Y6zhszFHt6PFxqm0H%2BoJaYUTR/rbnGtbesWwb%2BlGlFocx84rDJYYbBo0GiSO/Fts6WwVpIVHhjRnYVly5Fxi%2BZ5bkhS5btBNkjqJBGdveGw/jGs5bDuzrQdG3YRAZFYttB/FLgm/7IaJm7IVwh5nvhV4hn6LZXqBSklhEqGjv6B5%2BSsWzKfeGg7iBSEdmG6YqRmGZhSO24lkVBm8d6JY8Rm4HBZI7klcZwHGZlFYRBot4BguglbFFk3ZjZ96hn6E2zgVCa9cJpVcoOS48WG%2BXJjeX4RmeYVnkWwazuBwamVs06OeGXF%2BoOBkfutUVrUJnHoaOw4idBB3BmuV5zXGPnnXlkb3mdpUPdm30hoOGWcgRTEORR14LjDmmibOMlnrJy5BX6dFE22ja3iWZkjpIo3hrZKYnozI7OpIYZYQu4n3tZBVlpzqnPdhGNddWkgI8u8beYWkMpjeqWcajAbJtsvY5ZpGsRCZcZM7tf43e58mdthgkTYBV2ch9IptiZr6o5OYak5p5mdhmMkaC6dooiaXyatqur6qoGzGmarIqlayoh/ajrrK6AbOWZV6rrt3N1cdqNSLbTbBgVOYVgF0FeZyc4LqXy4QdmyZptpBEMZpqNduuUizqL9P9hzZesRs4ukXGZ4aaGhYJtmkiQU%2BwYaZOzmVWhutIUTIObgNZ596Whaz6WZWzhrHOQfzGtVquiHMyOnbV5LCHb8u1VCU133KcxTPbIB2s3bXRc66Vq5Pfl1bs%2BtYq3VFIoX7CtUqSdSxjwJg2BswFO6lTqgzfiFYopf0DO7QSOZqxrgPJ1IK2ke5g1MoTUsWVlz8SvBoJ2LEFwAJ2olP041JYBWfMBP6HpXzQKgWfFhuZUpRhXqVNu5Mt5th0jdSClDwwTjZgzIK1NewaWvLbfGtE4reVVqLD8qsQGWwmlFBm%2B4vyX22lEHc5kgqq2vkWMMJljr1x7t2BB24OypwbtBMc1ZozuxHl1CiulZx3hdquFKy4aL9m7MTEyX5xb015nBDWJki4AzxmJBWeEvr3nIQmNma9yLjUCRXOxu5mYRQZslDsAUmHATfuGHOmkhHsyLjhDCeV9p%2BIGv2dM60rzBJDARdxZYpEHgordWi3jN5zntpEuR7ttpzWCuAuibUqx0OrvXUSUZ5KhmhsxOpzUHIGQWszXqHEgl1SEa0ihvVaGcnITmKM9CJJyM5lc/csMWrxPhuw4C4M0l%2Bj%2Bq2Ge3jznqJmiHZ03sti%2ByuP7TAqgYiYGsKHc0XBI5Cmjk6LYTpXRlQgkOAuoltiV1Ki%2BScBUOL4V0nBSidFYKziHIGUG4Sohj20k2G81TvSdiKvZLqRUxycTHiuVW252ZrNKspci19yLbmHLbIsZVJXTPpSWcik1DymRpZ%2Bd624yy9xwl2V82t8KytEnOIRyr3T0JQe6O57oIJcQLi%2BYyxdtYTRjDg/iOl8qr0DJ%2BW2uZozljPA9CIdFTIRqrN8h6rZSIp3rDeVWOl6aUrHEPYKZZjGmQ3FIIuPkArcJ7oS0RdEuI2RwZNVNW19zJQLnRQMwS1yPxZuRKQLbMo5kmnYtMLYEEPT5c268Ply78RfDGOKCVXYxnEshOc2s5yNRzIGUWPlYI2Wcjhal4iExPgZRQhN1yfJcW1gRfcpFxnJPwt9Huc1V3nRbDGnexlSJ03wgGHm4TWES3WrswSS6Ixw1NY7QKgrglzk7DZVctMdL/g4rpMF8YtyRsuluMyoSCESJvvlBM4ttg4S/o2AKylKZmQDaZIKc53FmRlXc8s8Yog01Ivla%2B5ZwIRpaW%2B5KjKF0NKLfhIRZlA31z/qBTq4Fr3/kPN2uBnH%2BHLlAhBSWEa43NuMuWIc/l9yTltslR14tdpnrschtySYqPS3CRa0p7bjJPQDJJZ2AUcKNgbDeclz4hxMTppVFtmGTJFVSo2GSNNYIhks2uahUhxYxi80G1qV66IBS/j69M5V4bsqWoguaUhGqTkqo46MlFb6wTKhqwB4lLaFvEwXD09VvWhjlXuYsOZeohTKsfLC3yBrxgDUTfLEGwrDm65VaLvVF1VkM0xHuZ8J1xtXoR42kWux5RYiBXepEWIlXItefTjKArIz4tRy2HN3QytbmPD8%2BGbLiRwSKoKNMeJMrPh%2BAK0ix66XZndjmN4z4ZrKrSj0TiB6dmob1eqc1vUAQQeunNN0Y27x7j3KZg65qrifOG22rrGxwS0vTF1dXUmRbHGCjrbqooYPIujgaRUCIflzGCnShEWUYZ%2B3BeMw5sNrjTOViBuZhwdjVp%2BdVD5pVnnZUxAu9MCwvWEbpbYiqyofg7r13e96mwxci%2BMny9assDnZkLAs7MxzQUbCxZNq9htc7oiGW%2ByWtzROocM7OtG5Zve1llsqyljJwXZoXBsLHVPnQHKk2juZcJlXLHW8X8ZPz/h1wOPc%2B4HqFOJWEgs97Y1O8gw5YyRV1UfejA2T1HMab08mqZAaWyi7fg5mBCMLYGb3ojBkoV%2B5xngWGWCvqVYnpvY4ttgs24NLlgDGFHBjyzKm%2BV3NcaQ4gf/jMlBP84Etxn0PFSmm3OTIFwjeJJmHMeJFiLluU/lHyz51bnVpbRYLf%2BK1Tgqx544yTbYmeKjTcBEKOYz46hYb3PfTFgsRlZoY6T8aUQZqrgvg05kYxg%2Bp8Q5jch0whpAEDzKSNYDxdhvavQeina54G7VjawQTsri6PxpSlShiSa2wS7fQcTjqkoLppjKRew%2Bx%2ByCArDECYAKCuC0AECorhzorWgqjGjYq4q%2BxQqsGghXCmC6h6h4DBCuAECYCmBQAMCoB%2BC0APATATBfAABu6h6AAcBALAPBEAmoFoHBXBch1gRhyY5hlhvQ%2BoNhOoBAGY9hnBjheANhHhXo/sxAdh7ByAARWoxAbhgRGwEwccsKlwZIZI/AxAKwZh7BeAscnsMKKwKROIHBRo6ReAmwvsGwJokRLo0RsRZRKw8RiR/sAA1qkT7CsLUVkUEfUdUfkfkcUa6F8OUd0RUSQFUewdoHUekYMU0SHMMW0YUR0aUT0TMR4aHHgBaMgLKPkSsNoCIbcL7DirYfMYsRyCsa0THBYS4dHCaNUYscsYUasSIZITETMbESwVcDMQ8bcWSM8W8XCuwTqH4BAFMfKP8ICJSGiGURCKgMgNUcYFqMMNULHCCWCT8TccSAwPgFQOqGUS4SYbQBAC4UHMmNiQaBmB4TwQQF6Gil6EIcmOaBmEIREQiWKP8XSECFSI8bEbCeCVqA4DCbQKCdUfCTCl0bEegDMMEPQCsCUEwDEEoIYSiA6BAIKa4MKZgBMBAByQALQrBQkECRGrCODeBMhnBCDGDKDYAMjGBCAnA3FlExD6iCBUAQCbAbB2gCCPB%2BC3ARAokRCOAMD2lehikSmYDoA6F8ligkgolSEvEZHsFhDEAkAKBDHRF/EAgMlHAnDnDMlxF9FJFaiZFbFpEmgZHQiOBSgCEYrCG5ETFFFRH8ndGkgQCEm8HzHrFgBgDSmimIrIp8Gmh5H2i/Fpl3FRkxkbEtk5HTHdHPFlFvHbAhlkg3DJlnComxGcEECzAMCinEDRnEAKAIkSFXAcBTC0CcARC8CeAcBaCkCoCcCFkKAzBzDYjbA8CkAECaC7lTDVGRCJj7kcCSBHlPlnmcC8AKAgCJiPknm7mkBwCwAwCIAgAIrIqKEkDkCUBVDAAKDKCGAlBCAICoAADux595aALAMQdAuoSQqF/gtAGF2Fx5p5%2BFhF9A4QyAwAQOpANFdAYQAQrACwvALFdFTIihFFOFP5MFyAlwxAyFf5pAQlFQ%2BAx5vA/AggIgYg7ALccl8gSgagP5ugDQBgRgIApgxg5gmhwQAFkAUwqAMQZQAFHAvAqAuhUZeAWAxlPxpAxArgggeAbATwqALgjlUwV5sw8wegcIfgpF6FmFAl3AvAWFxA4pnAPAe5B535IFv5HA2AqgsFRACRqgIEKpRYKwwAyAyAaMCIsoEAjgXouAhAfRvwXAEwvAwFWg2hpAr5a0H5X5pALAIAa4CIRUT4%2BEVGe6OOpAVF1l4lAFQFT5Uw4FUF3FYQCFEAM14QxAuhD0KpDFBVeorl1RfAdAShG5lAwQP5wQfgVQAAnrFbwEdcwMQCdUyMENoO2edcxagCwGwIIEyAwLQGdUlVgAocAI4GILQJZfeVgDqEYOIN9XgJwdYHgLZZZaeUJYoQsPeZqCUD%2BYZdFddc4FgD%2BXIR1RFaQLZcQMEPEJgHaJgKDcAJoTpSBVMFQAYMhQAGp4CYBYVMhIq4WyWyAKXiDKWyCKAqDqBJWaX6CGAmBmD6DyE%2BVnnmVJCWUqlMiygqk6hzDIg4pLUPTuXGAABetlD0qgKwStTAR10pdAMQD0BtLAS1uhhtqgKptAGJ0p6AJ1zALAXhKw1lhN%2BoDl8AvlJQ7ZSQ9gSJAw9QElSJownQ4QDQGQiQAgwd6Q8QMdDA4d%2BQXQQwft0NAgLQ/QLgdQgV6dZQWdbQDwYwqdFgfQNQOdgwZdIwxdEdEgvl15AVNV%2BgCVQ1P555HAUoLwQgjgKpDNiRK1R1BACgkR%2BVhVEAG1DA1RkREAFVGVtwa4tVD5E1UwCAmATAWA4QTlrVvAHVrYCIoYOE02ZYVKb8w1yV/5gFy9NNL5b5LdHAWwiVp5HddVK999GwT9I1Vl19DVUwhNCQdgkgQAA%3D%3D)

There's two things worth noting here:
* This is a pretty simple implementation of matrix multiplication. Other than the basic loop interchange optimization, there are no other weird intrinsics or anything like that required for it to vectorize. As long as you write clean, independent loops, there's a good chance the compilers will be able to vectorize it.
* The input pointers are marked `restrict` for a reason. Pointer aliasing is the bane of auto-vectorization, and if you're not careful, you can end up with fully-scalar inner loops. For instance, the Embench matrix multiplication implementation suffers from this problem and generates fully scalar code, which is why we see no uplift.

### Compiler flags
To enable auto-vectorization, you will need to specify a vector extension as part of your `march=` flag, e.g. `march=rv32im_zve32x`. It's a good idea to also specify `-mabi=ilp32` to make sure that hardware floating point instructions are not generated.

Additionally, you will want to use `-O2` to generate good quality vector code. Generally it seems that `-O3` results in both compilers generating a bunch of unrolled loops and extra code, which will probably not work so well with the limited instruction cache our CPU has to work with. `-Os` may also work, but makes GCC more reluctant to generate vector instructions. Clang seems happy to vectorize away with `-Os` though, so that's probably the best option if you're using it.

Finally, a useful flag to add is `-mrvv-max-lmul=dynamic` for GCC. This makes GCC actually use an LMUL greater than `m1`, which is good because higher LMUL generally means lower overhead and higher performance. Clang also has some similar flags you can use to tune the vector code generation, but I haven't been able to find good documentation on those.

### Code quality
As mentioned before, generally the vector code generated by the compilers is of decent quality. However, there are a couple of quirks to be aware of:

* While GCC does a good job of using RISC-V "V" extension-specific features, Clang generates code that looks more like traditional vector ISAs. This leads to it often having worse code size and performance compared to GCC.
* If you write code with a lot of weird memory access patterns or conditionals, the compilers may end up outputting code that actually runs slower than scalar. This is mostly because some instructions (like `vrgather` and the segmented load/store instructions) have low-performance implementations, and using them in a hot loop may not work well.

It's a good idea to always inspect the generated assembly and make sure there's nothing too strange going on in there.

## Setting up
### GCC
GCC is set to integrate auto-vectorization into the mainline builds in the [GCC 14 release](https://gcc.gnu.org/gcc-14/changes.html). As of May 2024, auto-vectorization already seems to be working pretty well on the trunk builds, and as long as you're careful when setting up the toolchain you should be able to build the trunk version locally and compile stuff that way.

To build trunk GCC, you will need a system with the appropriate dependencies. As of this writing, `asicfab` does not appear to have recent enough tool versions to be able to do this, so you may have to try `asicfabu` or ideally a Linux (or Windows with WSL) machine where you have the ability to install additional packages.

First, go ahead and clone the [RISC-V GNU toolchain](https://github.com/riscv-collab/riscv-gnu-toolchain) and install the relevant dependencies mentioned there:

git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git

Next, you will need to configure the toolchain with the appropriate options. For some reason, the instructions in the repository tell you to run the configure script from the toolchain directory, but I've had better luck creating an adjacent directory and running the script from there:

mkdir riscv32-gcc
cd riscv32-gcc
../riscv-gnu-toolchain/configure --prefix=`pwd` --with-arch=rv32im_zve32x_zicsr_zifencei --with-abi=ilp32

Here, the `--with-arch` flag needs to have the extensions supported by the core and ONLY the extensions supported by the core. If you are not specific enough (i.e. you use `g` or `v`), the standard libraries may get built with incompatible instructions (like floating point) that will cause you to fault even if you later provide the correct flags during compilation.

Now, you're ready to run the build (expect this to take some time):

make -j `nproc`

Once the build is done, you'll want to add the `bin` directory to your PATH so that you can easily access the compiler. Don't worry about it conflicting with your native GCC installation since all the binaries are prefixed with `riscv32-unknown-elf-` so there should be no conflicts.

### Clang
Clang already has auto-vectorization support for the RISC-V "V" extension integrated into the latest release. In theory, the process of setting up Clang for the vector core should be as simple as getting the appropriate release and adding it to your PATH. However, this has not been tested so be warned that you may run into snags.

## Contributors
- Om Gupta (guptao@purdue.edu)
Loading
Loading