-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme.txt
110 lines (80 loc) · 6.24 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# LatteMetal
LatteMetal is my own-brewed RISC-based ISA and micro-architecture written in Java.
It is a pipelined OoO superscalar implementation which aims to approximate the behaviours and limitations of a real RISC processor.
# Startup & running guide
This requires you have javac and java versions at least 17.0.2.
The below commands are of the form "$ command" to represent the shell.
Navigate to the './src' folder before you run anything below.
- To run with the default arguments+quiet (note this script does not take any arguments itself) please run $ chmod 700 "compile_and_run" && ./compile_and_run
or if you do not have zsh as a shell environment installed please run $ javac -d ./class Main.java && java -cp ./class Main quiet
- The program will take all of the following arguments in any order (<key>=<value> or where there is no equals it is a flag):
prog=<assembly_file_path> (program to run)
(default "./kernels_benchmark/collatz.latte")
testing (run test kernels quietly to inspect correctness)
(default off)
width=<num_columns> (prints memory at end of program as a table)
(default 1)
quiet (pipeline view is hidden)
(default off)
predictor=<"fixedTaken" | "fixedNotTaken" | "bckTknFwdNTkn" | "bckNTknFwdTkn" | "oneBit" | "twoBit">
(default "twoBit")
btb_size=<num_entries> (how many branches we can hold the prediction of)
(default 32)
ss_width=<num_n_way> (n way superscalar)
(default 8)
alus=<num_alus> (up to eight)
(default 4)
lsus=<num_lsus> (up to four)
(default 2)
brus=<num_brus> (up to four branch units)
(default 2)
alu_rss=<num_n> (any number n >= 1 alu reservation stations)
(default 8)
lsu_rss=<num_n> (any number n >= 1 lsu reservation stations)
(default 4)
bru_rss=<num_n> (any number n >= 1 bru reservation stations)
(default 4)
dp_acc=<num_decimal_places> (stat decimal-point accuracy in trailing digits)
(default 4)
rob_size=<num_entries> (default 64)
phys_regs=<size_prf> (default 128)
align_fetch (fetch is aligned)
(default off)
show_commit (print all instructions committed in order)
(default off)
- For example, to run my vector dot product test kernel displaying 10 integers for each line of memory, please run
$ javac -d ./class Main.java && java -cp ./class Main prog="./kernels_test/vec_dot.latte" width=10
...
...(pipeline view)
...
registers (dirty): t0:10 s0:10 s1:20 s2:30 s3:40 s4:-2400 s5:120 s6:-8550 s11:1
memory:
[00] 10 [01] 20 [02] 30 [03] 40 [04] 0 [05] 0 [06] 0 [07] 0 [08] 0 [09] 0
[10] 92 [11] 84 [12] 76 [13] 68 [14] 50 [15] 42 [16] 34 [17] 26 [18] 18 [19] 0
[20] -2 [21] -4 [22] -6 [23] -8 [24] -10 [25] -12 [26] -14 [27] -16 [28] -18 [29] -20
[30] 12 [31] 24 [32] 36 [33] 48 [34] 51 [35] 62 [36] 74 [37] 86 [38] 98 [39] 120
[40] -8550 [41] -1 [42] -1 [43] -1 [44] -1 [45] -1 [46] -1 [47] -1 [48] -1 [49] -1
[50] [51] [52] [53] [54] [55] [56] [57] [58] [59]
[60] [61] [62] [63]
settings: CLOCK_SPEED_MHZ=500.0
settings: PREDICTOR=twoBit
settings: BTB_CACHE_SIZE=32
settings: SUPERSCALAR_WIDTH=8
settings: ALU_COUNT=4
settings: LSU_COUNT=2
settings: BRU_COUNT=2
settings: ALU_RS_COUNT=8
settings: LSU_RS_COUNT=4
settings: BRU_RS_COUNT=4
settings: DP_ACC=4
settings: ROB_ENTRIES=64
settings: PHYSICAL_REGISTER_COUNT=128
run: program finished in 63 cycles
run: program finished after committing 118 instructions
run: program incorrectly speculated and thereby flushed 27 instructions
run: instructions per cycle 1.873
run: cpu time 0.126μs @ 500.0MHz
run: percentage mispredicted instructions added to rob 18.6207%
run: percentage mispredicted branches 27.2727%
mem: [10, 20, 30, 40, 0, 0, 0, 0, 0, 0, 92, 84, 76, 68, 50, 42, 34, 26, 18, 0, -2, -4, -6, -8, -10, -12, -14, -16, -18, -20, 12, 24, 36, 48, 51, 62, 74, 86, 98, 120, -8550, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
registers (dirty): t0:10 s0:10 s1:20 s2:30 s3:40 s4:-2400 s5:120 s6:-8550 s11:1