-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathunaligned_memory_access.html
225 lines (196 loc) · 10.3 KB
/
unaligned_memory_access.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<title>Unaligned Memory Access</title>
<link rel="stylesheet" type="text/css" href="/htmlize.css"/>
<link rel="stylesheet" type="text/css" href="./htmlize.css"/>
<link rel="stylesheet" type="text/css" href="../htmlize.css"/>
<link rel="stylesheet" type="text/css" href="../../htmlize.css"/>
<link rel="stylesheet" type="text/css" href="/readtheorg.css"/>
<link rel="stylesheet" type="text/css" href="./readtheorg.css"/>
<link rel="stylesheet" type="text/css" href="../readtheorg.css"/>
<link rel="stylesheet" type="text/css" href="../../readtheorg.css"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML"></script>
<link rel="stylesheet" type="text/css" href="/main.css" media="screen" />
<link rel="stylesheet" type="text/css" href="../main.css" media="screen" />
<link rel="stylesheet" type="text/css" href="../../main.css" media="screen" />
<link rel="stylesheet" type="text/css" href="./main.css" media="screen" />
<link rel = "icon" href = "/icon.png" type = "image/x-icon">
</head>
<body>
<div id="content" class="content">
<h1 class="title">Unaligned Memory Access</h1>
<div id="table-of-contents" role="doc-toc">
<h2>Table of Contents</h2>
<div id="text-table-of-contents" role="doc-toc">
<ul>
<li><a href="#org0000004">1. Unaligned Memory Access</a>
<ul>
<li><a href="#org0000001">1.1. 通过 trap handler 模拟 UMA</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-org0000004" class="outline-2">
<h2 id="org0000004"><span class="section-number-2">1.</span> Unaligned Memory Access</h2>
<div class="outline-text-2" id="text-1">
<p>
cpu 对 unaligned memory access (UMA) 的支持分为几种情况:
</p>
<ol class="org-ol">
<li>不支持, 会触发 trap</li>
<li>支持, 但比较慢</li>
<li>支持且不慢</li>
</ol>
<p>
如果编译器能判断出当前指令是 UMA, 则会根据 <a href="toolchain/gcc_mtune.html#ID-98f35525-3f8c-4f76-8478-6cab3f3182be">slow_unaligned_access</a> 和
`-mstrict-align` 参数决定对于 UMA 是直接生成 load/store 指令还是通过其它多条指令.
例如, 为了支持非对齐的 sw, 编译器可能会使用 4 条 sb 指令
</p>
</div>
<div id="outline-container-org0000001" class="outline-3">
<h3 id="org0000001"><span class="section-number-3">1.1.</span> 通过 trap handler 模拟 UMA</h3>
<div class="outline-text-3" id="text-1-1">
<p>
有些时候编译器能判断出是否是 UMA, 例如
</p>
<div class="org-src-container">
<pre class="src src-c"><span class="org-keyword">struct</span> <span class="org-keyword">__attribute__</span>((packed)) <span class="org-type">X</span> {
<span class="org-type">short</span> <span class="org-variable-name">a</span>;
<span class="org-type">int</span> <span class="org-variable-name">b</span>;
};
</pre>
</div>
<p>
编译器知道访问 b 时不是 4 字节对齐的.
</p>
<p>
但有时编译器无法判断出是否对齐, 例如:
</p>
<div class="org-src-container">
<pre class="src src-c"><span class="org-type">short</span> *<span class="org-variable-name">y</span> = ((<span class="org-type">char</span> *)&x) + k;
*y = 1;
</pre>
</div>
<p>
可见如果硬件不支持 UMA, 试图通过编译器来完全避免 UMA 是不可行的, 这时需要用户自己提供 trap handler 通过其它方式 (例如使用支持 UMA 的指令或转换为以字节为单位的指令例如 riscv 的 sb 指令) 模拟当前指令.
</p>
<p>
以 riscv 为例:
</p>
<div class="org-src-container">
<pre class="src src-c"><span class="org-type">int</span> <span class="org-function-name">main</span>(<span class="org-type">int</span> <span class="org-variable-name">argc</span>, <span class="org-type">char</span> *<span class="org-variable-name">argv</span>[]) {
<span class="org-type">int</span> <span class="org-variable-name">x</span> = 10;
<span class="org-type">short</span> *<span class="org-variable-name">y</span> = ((<span class="org-type">char</span> *)&x) + 1;
printf(<span class="org-string">"test:%p\n"</span>, y);
*y = 1;
<span class="org-keyword">return</span> 0;
}
</pre>
</div>
<p>
spike 对应 store 指令的代码:
</p>
<div class="org-src-container">
<pre class="src src-c"><span class="org-type">void</span> <span class="org-variable-name">mmu_t</span>::store_slow_path(
<span class="org-type">reg_t</span> <span class="org-variable-name">addr</span>, <span class="org-type">reg_t</span> <span class="org-variable-name">len</span>, <span class="org-keyword">const</span> <span class="org-type">uint8_t</span>* <span class="org-variable-name">bytes</span>, <span class="org-type">uint32_t</span> <span class="org-variable-name">xlate_flags</span>,
<span class="org-type">bool</span> <span class="org-variable-name">actually_store</span>, bool UNUSED require_alignment) {
<span class="org-keyword">if</span> (actually_store)
check_triggers(
triggers::OPERATION_STORE, addr, reg_from_bytes(len, bytes));
printf(<span class="org-string">"ssp:1:%x %d\n"</span>, addr, len);
<span class="org-keyword">if</span> (addr & (len - 1)) {
<span class="org-type">bool</span> <span class="org-variable-name">gva</span> = ((proc) ? proc->state.v : <span class="org-constant">false</span>) ||
(RISCV_XLATE_VIRT & xlate_flags);
<span class="org-keyword">if</span> (<span class="org-negation-char">!</span>is_misaligned_enabled()) {
printf(<span class="org-string">"ssp:2:%x\n"</span>, addr);
<span class="org-type">throw</span> <span class="org-function-name">trap_store_address_misaligned</span>(gva, addr, 0, 0);
}
<span class="org-keyword">if</span> (require_alignment) <span class="org-type">throw</span> <span class="org-function-name">trap_store_access_fault</span>(gva, addr, 0, 0);
<span class="org-type">reg_t</span> <span class="org-variable-name">len_page0</span> = std::min(len, PGSIZE - addr % PGSIZE);
store_slow_path_intrapage(
addr, len_page0, bytes, xlate_flags, actually_store);
<span class="org-keyword">if</span> (len_page0 != len)
store_slow_path_intrapage(
addr + len_page0, len - len_page0, bytes + len_page0,
xlate_flags, actually_store);
} <span class="org-keyword">else</span> {
store_slow_path_intrapage(
addr, len, bytes, xlate_flags, actually_store);
}
printf(<span class="org-string">"ssp:3:%x\n"</span>, addr);
}
</pre>
</div>
<p>
执行的 log 为:
</p>
<pre class="example" id="org0000000">
#> spike pk ./a.out
test:0x3ffffff9f5
ssp:1:fffff9f5 2
ssp:2:fffff9f5
...
...
ssp:1:fffff9f5 1
ssp:2:fffff9f5
ssp:3:fffff9f5
ssp:1:fffff9f6 1
ssp:2:fffff9f6
ssp:3:fffff9f6
// 做为对比, spike 的 `--misaligned` 参数表示硬件支持 UMA
#> spike --misaligned pk ./a.out
test:0x3ffffff9f5 2
ssp:1:fffff9f5 2
ssp:3:fffff9f5
</pre>
<p>
上面例子中发生 trap 后由 pk 的 misaligned_store_trap 负责把指令转换为两次一字节的 store
</p>
<div class="org-src-container">
<pre class="src src-c"><span class="org-type">void</span> <span class="org-function-name">misaligned_store_trap</span>(<span class="org-type">uintptr_t</span>* <span class="org-variable-name">regs</span>, <span class="org-type">uintptr_t</span> <span class="org-variable-name">mcause</span>, <span class="org-type">uintptr_t</span> <span class="org-variable-name">mepc</span>) {
<span class="org-keyword">union</span> <span class="org-type">byte_array</span> <span class="org-variable-name">val</span>;
<span class="org-type">uintptr_t</span> <span class="org-variable-name">mstatus</span>;
<span class="org-type">insn_t</span> <span class="org-variable-name">insn</span> = get_insn(mepc, &mstatus);
<span class="org-type">uintptr_t</span> <span class="org-variable-name">npc</span> = mepc + insn_len(insn);
<span class="org-type">int</span> <span class="org-variable-name">len</span>;
val.intx = GET_RS2(insn, regs);
<span class="org-keyword">if</span> ((insn & MASK_SW) == MATCH_SW)
len = 4;
<span class="org-keyword">else</span> <span class="org-keyword">if</span> ((insn & MASK_SH) == MATCH_SH)
len = 2;
<span class="org-keyword">else</span> {
mcause = CAUSE_STORE_ACCESS;
write_csr(mcause, mcause);
<span class="org-keyword">return</span> truly_illegal_insn(regs, mcause, mepc, mstatus, insn);
}
<span class="org-comment-delimiter">/* </span><span class="org-comment">NOTE: misaligned_store 异常时 mtval 为要 store 的地址</span><span class="org-comment-delimiter"> */</span>
<span class="org-comment-delimiter">/* </span><span class="org-comment">NOTE: val 是 RS2, 即要 store 的寄存器</span><span class="org-comment-delimiter"> */</span>
<span class="org-type">uintptr_t</span> <span class="org-variable-name">addr</span> = read_csr(mtval);
<span class="org-type">intptr_t</span> <span class="org-variable-name">offs</span> = 0;
<span class="org-comment-delimiter">/* </span><span class="org-comment">NOTE: 循环多次, 每次写一个 byte</span><span class="org-comment-delimiter"> */</span>
<span class="org-keyword">for</span> (<span class="org-type">int</span> <span class="org-variable-name">i</span> = 0; i < len; i++)
store_uint8_t((<span class="org-type">void</span>*)(addr + i), val.bytes[offs + i], mepc);
write_csr(mepc, npc);
}
</pre>
</div>
<p>
另外, 上述只针对 unaligned store/load, 对于其它要求对齐的情况例如 branch 地址对齐, 原子指令内存对齐等并不适用.
</p>
</div>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="author">Author: <a href="mailto:sunway@dogdog.run">sunway@dogdog.run</a><br />
Date: 2023-08-17 Thu 13:43<br />
Last updated: 2023-08-17 Thu 15:44</p>
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
</div>
</body>
</html>