-
Notifications
You must be signed in to change notification settings - Fork 150
/
README.tex
348 lines (302 loc) · 14.5 KB
/
README.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
\documentclass[11pt]{article}
\usepackage{graphicx}
\usepackage{listings}
\usepackage{courier}
\usepackage[pdfborder=0 0 0]{hyperref}
\usepackage{url}
\usepackage[cm]{fullpage}
\newcommand{\HRule}{\rule{\linewidth}{0.5mm}}
\setlength{\parindent}{0cm}
\setlength{\parskip}{0.3cm}
\begin{document}
\title{DRAMSim2}
\author{Elliott Cooper-Balis \\
Paul Rosenfeld \\
Bruce Jacob \\
University of Maryland \\
\texttt{\footnotesize dramninjas \textit{[at]} gmail \textit{[dot]} com}
}
\date{}
\maketitle
\HRule
\tableofcontents
\HRule
\lstset{basicstyle={\scriptsize\ttfamily},tabsize=2,frame=single}
\section{Why Do We Need to Simulate DRAM Systems This Accurately?}
Modern computer system performance is increasingly limited
by the performance of DRAM-based memory systems. As a result, there is great
interest in providing accurate simulations of DRAM based memory systems as part
of architectural research. Unfortunately, there is great difficulty associated
with the study of modern DRAM memory systems, arising from
the fact that DRAM-system performance depends on many
independent variables such as workload characteristics of memory access rate
and request sequence, memory-system architecture, memory-system configuration, DRAM
access protocol, and DRAM device timing parameters. As a result, system
architects and design engineers often disagree on the usefulness of a given
performance-enhancing feature, since the performance impact of that feature typically
depends on the characteristics of specific workloads, memory-system architecture,
memory-system configuration, DRAM access protocol and DRAM device timing parameters.
\subsection{DRAM Scheduling Complexity is Growing}
\begin{figure}[h]
\begin{center}
\includegraphics[width=\linewidth]{docs/why1.gif}
\caption{Timing diagram showing complexity of DRAM scheduling}
\label{timingcomplex}
\end{center}
\end{figure}
Figure \ref{timingcomplex} shows the pipelined scheduling of a DDR2 SDRAM device. Despite
the fact that the simulated memory system uses a closed-page policy and
rotates through available banks on the DRAM device, which should simplify
scheduling considerably, the scheduling of this system is actually more
complex than earlier DRAM systems: for instance, new timing parameters
such as t_{RRD} and t_{FAW} are contributing to the growing
set of timing constraints placed on each successive generation of DRAM
devices.
\subsection{DRAM performance characteristics changes every generation}
DRAM based memory systems are impacted primarily by two attributes:
row cycle time and device datarate. Presently, DRAM row cycle times are
decreasing at a rate of approximately 7\% per year, and DRAM device
datarates are increasing with each new generation of DRAM devices at the rate
of 100\% every three years,
\begin{figure}[h]
\begin{center}
\includegraphics[width=\linewidth]{docs/why2.gif}
\caption{DRAM row cycle time trends}
\label{classes}
\end{center}
\end{figure}
\begin{figure}[h]
\begin{center}
\includegraphics[width=\linewidth]{docs/why3.gif}
\caption{DRAM device data rate trends}
\label{classes}
\end{center}
\end{figure}
The difference in the scaling trends of the DRAM device means that
fundamental DRAM device performance characteristics are changing
every single generation, and the changing performance characteristics
cannot be accurately predicted by linear extrapolations. The result is
that no computer architect can rest easy knowing that he or she has
obtained X\% of performance improvement with a set of microarchitectural
techniques on the current generation memory system, because the same set
of microarchitectural techniques may not to be as effective in a
future memory system due to the differences in the scaling attributes
of DRAM devices,
\subsection{The Sales Pitch}
Our DRAM-system simulation work enables system architects not
only to explore the impact of a set of microarchitectual techniques on a
given memory system but also to examine the effectiveness of those
microarchitectural techniques on a future generation memory system with
future generations of DRAM devices.
\section{About DRAMSim2}
DRAMSim2 is a cycle accurate model of a DRAM memory controller, the DRAM
modules which comprise system storage, and the buses by which they
communicate.
The overarching goal is to have a simulator that is small,
portable, and accurate. The simulator core has a simple interface
which allows it to be CPU simulator agnostic and should to work with any simulator (see section \ref{library}). This core has no external run
time or build time dependencies and has been tested with g++ on Linux
as well as g++ on Cygwin on Windows.
\section{Getting DRAMSim2}
DRAMSim2 is available on \href{http://github.com/dramninjasUMD}{github}. If you have git installed you can clone our repository by typing:\\
\texttt{\$ git clone git://github.com/dramninjasUMD/DRAMSim2.git }
\section{Building DRAMSim2}
To build an optimized standalone trace-based simulator called \texttt{DRAMSim} simply type:
\texttt{\$ make}
For a debug build which contains debugging symbols and verbose output, run:
\texttt{\$ make DEBUG=1}
To build the DRAMSim2 library, type:
\texttt{\$ make libdramsim.so }
\section{Running DRAMSim2}
\begin{minipage}{\textwidth}
\subsection{Trace-Based Simulation}
In standalone mode, DRAMSim2 can simulate memory system traces. While traces are not as accurate
as a real CPU model driving the memory model, they are convenient since they can be generated in a number of different
ways (instrumentation, hardware traces, CPU simulation, etc.) and reused.
We've provided a few small sample traces in the \texttt{traces/} directory. These gzipped
traces should first be pre-processed before running through the simulator.
To run the preprocessor (the preprocessor requires python):
\begin{lstlisting}
cd traces/
./traceParse.py k6_aoe_02_short.trc.gz
\end{lstlisting}
This should produce the file \texttt{traces/k6\_aoe\_02\_short.trc}. Then, go back to the DRAMSim2 directory and run the trace based simulator:
\begin{lstlisting}
cd .
./DRAMSim -t traces/k6_aoe_02_short.trc -s system.ini -d ini/DDR3_micron_64M_8B_x4_sg15.ini -c 1000
\end{lstlisting}
This will run a 1000 cycle simulation of the \texttt{k6\_aoe\_02\_short} trace using
the specified DDR3 part. The -s, -d, and -t flags are required to run a simulation.
A full list of the command line arguments can be obtained by typing:
\begin{lstlisting}
$ ./DRAMSim --help
DRAMSim2 Usage:
DRAMSim -t tracefile -s system.ini -d ini/device.ini [-c #] [-p pwd] -q
-t, --tracefile=FILENAME specify a tracefile to run
-s, --systemini=FILENAME specify an ini file that describes the memory system parameters
-d, --deviceini=FILENAME specify an ini file that describes the device-level parameters
-c, --numcycles=# specify number of cycles to run the simulation for [default=30]
-q, --quiet flag to suppress simulation output (except final stats) [default=no]
-o, --option=OPTION_A=234 overwrite any ini file option from the command line
-p, --pwd=DIRECTORY Set the working directory
\end{lstlisting}
Some traces include timing information, which can be used
by the simulator or ignored. The benefit of ignoring timing information is that requests
will stream as fast as possible into the memory system and can serve as a good stress
test. To toggle the use of clock cycles, please change the \texttt{useClockCycle} flag in \texttt{TraceBasedSim.cpp}.
If you have a custom trace format you'd like to use, you can modify the \texttt{parseTraceFileLine()} function ton add
support for your own trace formats.
The prefix of the filename determines which type of trace this function will use (ex: k6\_foo.trc) will use the k6 format
in \texttt{parseTraceFileLine()}.
\end{minipage}
\subsection{Library Interface}\label{library}
In addition to simulating memory traces, DRAMSim2 can also be built as a dynamic
shared library which is convenient for connecting it to CPU simulators or other
custom front ends. A \texttt{MemorySystem} object encapsulates the
functionality of the memory system (i.e., the memory controller and DIMMs). The
classes that comprise DRAMSim2 can be seen in figure \ref{classes}. A
simple example application is provided in the \texttt{example\_app/} directory.
At this time we have plans to provide code to integrate DRAMSim2 into
\href{http://www.marss86.org/index.php/Home}{MARSSx86},
\href{http://www.cs.sandia.gov/sst/}{SST}, and (eventually)
\href{http://www.m5sim.org/}{M5}.
\begin{figure}[h]
\begin{center}
\includegraphics[width=\linewidth]{docs/classes.png}
\caption{Block diagram of DRAMSim2. The \texttt{\footnotesize recv()} functions are actually called
\texttt{\footnotesize receiveFromBus()} but were abbreviated to save sapce.}
\label{classes}
\end{center}
\end{figure}
\section{Example Output}
\noindent\begin{minipage}{\textwidth}
The verbosity of the DRAMSim2 can be customized in the system.ini file by turning the
various debug flags on or off.
Below, we have provided a detailed explanation of the simulator output. With
all DEBUG flags enabled, the following output is displayed for each cycle
executed.
\textbf{NOTE} : BP = Bus Packet, T = Transaction \\
MC = MemoryController, R\# = Rank (index \#)
\begin{lstlisting}
----------------- Memory System Update ------------------
---------- Memory Controller Update Starting ------------ [8]
-- R0 Receiving On Bus : BP [ACT] pa[0x5dec7f0] r[0] b[3] row[1502] col[799]
-- MC Issuing On Data Bus : BP [DATA] pa[0x7edc7e0] r[0] b[2] row[2029] col[799] data[0]=
++ Adding Read energy to total energy
-- MC Issuing On Command Bus : BP [READ_P] pa[0x5dec7f8] r[1] b[3] row[1502] col[799]
== New Transaction - Mapping Address [0x5dec800] (read)
Rank : 0
Bank : 0
Row : 1502
Col : 800
++ Adding IDD3N to total energy [from rank 0]
++ Adding IDD3N to total energy [from rank 1]
== Printing transaction queue
8]T [Read] [0x45bbfa4]
9]T [Write] [0x55fbfa0] [5439E]
10]T [Write] [0x55fbfa8] [1111]
== Printing bank states (According to MC)
[idle] [idle] [2029] [1502]
[idle] [idle] [1502] [1502]
== Printing Per Rank, Per Bank Queue
= Rank 0
Bank 0 size : 2
0]BP [ACT] pa[0x5dec800] r[0] b[0] row[1502] col[800]
1]BP [READ_P] pa[0x5dec800] r[0] b[0] row[1502] col[800]
Bank 1 size : 2
0]BP [ACT] pa[0x5dec810] r[0] b[1] row[1502] col[800]
1]BP [READ_P] pa[0x5dec810] r[0] b[1] row[1502] col[800]
Bank 2 size : 2
0]BP [ACT] pa[0x5dec7e0] r[0] b[2] row[1502] col[799]
1]BP [READ_P] pa[0x5dec7e0] r[0] b[2] row[1502] col[799]
Bank 3 size : 1
0]BP [READ_P] pa[0x5dec7f0] r[0] b[3] row[1502] col[799]
= Rank 1
Bank 0 size : 2
0]BP [ACT] pa[0x5dec808] r[1] b[0] row[1502] col[800]
1]BP [READ_P] pa[0x5dec808] r[1] b[0] row[1502] col[800]
Bank 1 size : 2
0]BP [ACT] pa[0x5dec818] r[1] b[1] row[1502] col[800]
1]BP [READ_P] pa[0x5dec818] r[1] b[1] row[1502] col[800]
Bank 2 size : 1
0]BP [READ_P] pa[0x5dec7e8] r[1] b[2] row[1502] col[799]
Bank 3 size : 0
\end{lstlisting}
\end{minipage}
\begin{minipage}{\textwidth}
Anything sent on the bus is encapsulated in an BusPacket (BP) object. When
printing, they display the following information:
\begin{lstlisting}
BP [ACT] pa[0x5dec818] r[1] b[1] row[1502] col[800]
\end{lstlisting}
The information displayed is (in order): command type, physical address, rank
\#, bank \#, row \#, and column \#.
\end{minipage}
\begin{minipage}{\textwidth}
Lines beginning with " -- " indicate bus traffic, ie,
\begin{lstlisting}
-- R0 Receiving On Bus : BP [ACT] pa[0x5dec7f0] r[0] b[3] row[1502] col[799]
-- MC Issuing On Data Bus : BP [DATA] pa[0x7edc7e0] r[0] b[2] row[2029] col[799] data[0]=
-- MC Issuing On Command Bus : BP [READ_P] pa[0x5dec7f8] r[1] b[3] row[1502] col[799]
\end{lstlisting}
Sender and receiver are indicated and the packet being sent is detailed.
\end{minipage}
Lines beginning with " ++ " indicate power calculations, ie,
\begin{lstlisting}
++ Adding Read energy to total energy
++ Adding IDD3N to total energy [from rank 0]
++ Adding IDD3N to total energy [from rank 1]
\end{lstlisting}
The state of the system and the actions taken determine which current draw is used. For further detail about each current value, see Micron datasheet.
If a pending transaction is in the transaction queue, it will be printed, as seen below:
\begin{lstlisting}
== Printing transaction queue
1]T [Read] [0x45bbfa4]
2]T [Write] [0x55fbfa0] [5439E]
3]T [Write] [0x55fbfa8] [1111]
\end{lstlisting}
Currently, at the start of every cycle, the head of the transaction
queue is removed, broken up into DRAM commands and placed in the
appropriate command queues. To do this, an address mapping scheme
is applied to the transaction's physical address, the output of
which is seen below:
\begin{lstlisting}
== New Transaction - Mapping Address [0x5dec800] (read)
Rank : 0
Bank : 0
Row : 1502
Col : 800
\end{lstlisting}
If there are pending commands in the command queue, they will be
printed. The output is dependent on the designated structure for
the command queue. For example, per-rank/per-bank queues are
shown below:
\begin{lstlisting}
= Rank 1
Bank 0 size : 2
0]BP [ACT] pa[0x5dec808] r[1] b[0] row[1502] col[800]
1]BP [READ_P] pa[0x5dec808] r[1] b[0] row[1502] col[800]
Bank 1 size : 2
0]BP [ACT] pa[0x5dec818] r[1] b[1] row[1502] col[800]
1]BP [READ_P] pa[0x5dec818] r[1] b[1] row[1502] col[800]
Bank 2 size : 1
0]BP [READ_P] pa[0x5dec7e8] r[1] b[2] row[1502] col[799]
Bank 3 size : 0
\end{lstlisting}
The state of each bank in the system is also displayed:
\begin{lstlisting}
== Printing bank states (According to MC)
[idle] [idle] [2029] [1502]
[idle] [idle] [1502] [1502]
\end{lstlisting}
Banks can be in many states, including idle, row active (shown
with the row that is active), refreshing, or precharging. These
states will update based on the commands being sent by the
controller.
\section{Results Output}
In addition to printing memory statistics and debug information to standard out, DRAMSim2 also produces
a 'vis' file in the \texttt{results/} directory. A vis file is essentially a summary of relevant statistics that is generated
per epoch (the number of cycles per epoch can be set by changing the \texttt{EPOCH\_COUNT} parameter in the \texttt{system.ini} file).
We are currently working on DRAMVis, which is a cross-platform viewer which parses the vis file and generates graphs that can be used
to analyze and compare results.
\end{document}