Skip to content

Commit 710de54

Browse files
committed
L27
1 parent 03c8615 commit 710de54

File tree

3 files changed

+34
-34
lines changed

3 files changed

+34
-34
lines changed

lectures/459.bib

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1086,7 +1086,7 @@ @misc{expbackoff
10861086

10871087
@misc{rustflamegraph,
10881088
author = {Adam Perry},
1089-
title = {Rust Performance: A story featuring perf and flamegraph on Linux},
1089+
title = {Rust {Performance}: A story featuring perf and flamegraph on {Linux}},
10901090
year = {2016},
10911091
url = {https://blog.anp.lol/rust/2016/07/24/profiling-rust-perf-flamegraph/},
10921092
note = {Online; accessed 2020-11-01}

lectures/L27-slides.tex

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ \part{Profiling}
331331
\begin{frame}[fragile]
332332
\frametitle{Previous Offering Profiling}
333333

334-
\begin{lstlisting}[basicstyle=\scriptsize]
334+
\begin{lstlisting}[basicstyle=\ttfamily\tiny]
335335
[plam@lynch nm-morph]$ perf stat ./test_harness
336336

337337
Performance counter stats for './test_harness':
@@ -356,7 +356,7 @@ \part{Profiling}
356356
\begin{frame}[fragile]
357357
\frametitle{Do it with Rust}
358358

359-
The first thing to do is to compile with debugging info, go to your \texttt{Cargo.toml} file and add:
359+
The first thing to do is to compile with debugging info. Go to your \texttt{Cargo.toml} file and add:
360360
\begin{verbatim}
361361
[profile.release]
362362
debug = true
@@ -402,13 +402,13 @@ \part{Profiler Guided Optimization}
402402
Example: branch prediction.
403403

404404
\begin{lstlisting}[language=Rust]
405-
fn which_branch(a: i32, b: i32) {
406-
if a < b {
407-
println!("Case one.");
408-
} else {
409-
println!("Case two.");
405+
fn which_branch(a: i32, b: i32) {
406+
if a < b {
407+
println!("Case one.");
408+
} else {
409+
println!("Case two.");
410+
}
410411
}
411-
}
412412
\end{lstlisting}
413413

414414
\end{frame}
@@ -456,13 +456,13 @@ \part{Profiler Guided Optimization}
456456
\frametitle{Match}
457457

458458
\begin{lstlisting}[language=Rust]
459-
fn match_thing(x: i32) -> i32 {
460-
match x {
461-
0..10 => 1,
462-
11..100 => 2,
463-
_ => 0
459+
fn match_thing(x: i32) -> i32 {
460+
match x {
461+
0..10 => 1,
462+
11..100 => 2,
463+
_ => 0
464+
}
464465
}
465-
}
466466
\end{lstlisting}
467467

468468
Same thing with \texttt{x}: what is its typical value? If we know that, it is our prediction.

lectures/L27.tex

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ \section*{Userspace per-process profiling}
133133

134134
\noindent
135135
Here's a usage example on some old assignment code from a previous offering of the course:
136-
\begin{lstlisting}[basicstyle=\scriptsize]
136+
\begin{lstlisting}[basicstyle=\ttfamily\scriptsize]
137137
[plam@lynch nm-morph]$ perf stat ./test_harness
138138

139139
Performance counter stats for './test_harness':
@@ -170,21 +170,21 @@ \section*{Userspace per-process profiling}
170170

171171
\section*{Profiler Guided Optimization (POGO)}
172172

173-
In 2015 we were fortunate enough to have a guest lecture from someone at Microsoft actually in the room to give the guest lecture on the subject of Profile Guided Optimization (or POGO). In subsequent years, I was not able to convince him to fly in just for the lecture. Anyway, let's talk about the subject, which is by no means restricted to Rust.
173+
A few years ago, we were fortunate enough to have a guest lecture from someone at Microsoft actually in the room to give the guest lecture on the subject of Profile Guided Optimization (or POGO). In subsequent years, I was not able to convince him to fly in just for the lecture. Anyway, let's talk about the subject, which is by no means restricted to Rust.
174174

175175
The compiler does static analysis of the code you've written and makes its best guesses about what is likely to happen. The canonical example for this is branch prediction: there is an if-else block and the compiler will then guess about which is more likely and optimize for that version. Consider three examples, originally from~\cite{pogo} but replaced with some Rust equivalents:
176176

177177
\begin{lstlisting}[language=Rust]
178-
fn which_branch(a: i32, b: i32) {
179-
if a < b {
180-
println!("Case one.");
181-
} else {
182-
println!("Case two.");
178+
fn which_branch(a: i32, b: i32) {
179+
if a < b {
180+
println!("Case one.");
181+
} else {
182+
println!("Case two.");
183+
}
183184
}
184-
}
185185
\end{lstlisting}
186186

187-
Just looking at this, which is more likely, \texttt{a < b} or \texttt{a >= b}? Assuming there's no other information in the system the compiler can believe that one is more likely than the other, or having no real information, use a fallback rule. This works, but what if we are wrong? Suppose the compiler decides it is likely that \texttt{a} is the larger value and it optimizes for that version. However, it is only the case 5\% of the time, so most of the time the prediction is wrong. That's unpleasant. But the only way to know is to actually run the program.
187+
Just looking at this, which is more likely, \texttt{a < b} or \texttt{a >= b}? Assuming there's no other information in the system, the compiler can believe that one is more likely than the other, or having no real information, use a fallback rule. This works, but what if we are wrong? Suppose the compiler decides it is likely that \texttt{a} is the larger value and it optimizes for that version. However, it is only the case 5\% of the time, so most of the time the prediction is wrong. That's unpleasant. But the only way to know is to actually run the program.
188188

189189
\begin{multicols}{2}
190190
\begin{lstlisting}[language=Rust]
@@ -224,13 +224,13 @@ \section*{Profiler Guided Optimization (POGO)}
224224
There are similar questions raised for the other two examples. What is the ``normal'' type for some reference \texttt{thing}? It could be of either type \texttt{Kenobi} or \texttt{Grievous}. If we do not know, the compiler cannot do devirtualization (replace this virtual call with a real one). If there was exactly one type that implements the \texttt{Polite} trait we wouldn't have to guess. But are we much more likely to see \texttt{Kenobi} than \texttt{Grievous}?
225225

226226
\begin{lstlisting}[language=Rust]
227-
fn match_thing(x: i32) -> i32 {
228-
match x {
229-
0..10 => 1,
230-
11..100 => 2,
231-
_ => 0
227+
fn match_thing(x: i32) -> i32 {
228+
match x {
229+
0..10 => 1,
230+
11..100 => 2,
231+
_ => 0
232+
}
232233
}
233-
}
234234
\end{lstlisting}
235235

236236
Same thing with \texttt{x}: what is its typical value? If we know that, it is our prediction. Actually, in a match block with many options, could we rank them in descending order of likelihood?
@@ -241,7 +241,7 @@ \section*{Profiler Guided Optimization (POGO)}
241241

242242
Step one is to generate an executable with instrumentation. Ask to compile with instrumentation enabled, which also says what directory to put it in: \texttt{-Cprofile-generate=/tmp/pgo-data}. The compiler inserts a bunch of probes into the generated code that are used to record data. Three types of probe are inserted: function entry probes, edge probes, and value probes. A function entry probe, obviously, counts how many times a particular function is called. An edge probe is used to count the transitions (which tells us whether an if branch is taken or the else condition). Value probes are interesting; they are used to collect a histogram of values. Thus, we can have a small table that tells us the frequency of what is given in to a \texttt{match} statement. When this phase is complete, there is an instrumented executable and an empty database file where the training data goes~\cite{pogo}.
243243

244-
Step two is training day: run the instrumented executable through real-world scenarios. Ideally you will spend the training time on the performance-critical sections. It does not have to be a single training run, of course, data can be collected from as many runs as desired. Keep in mind that the program will run a lot slower when there's the instrumentation present.
244+
Step two is training day: run the instrumented executable through real-world scenarios. Ideally you will spend the training time on the performance-critical sections. It does not have to be a single training run, of course. Data can be collected from as many runs as desired. Keep in mind that the program will run a lot slower when there's the instrumentation present.
245245

246246
Still, it is important to note that you are not trying to exercise every part of the program (this is not unit testing); instead it should be as close to real-world-usage as can be accomplished. In fact, trying to use every bell and whistle of the program is counterproductive; if the usage data does not match real world scenarios then the compiler has been given the wrong information about what is important. Or you might end up teaching it that almost nothing is important...
247247

@@ -285,11 +285,11 @@ \section*{Profiler Guided Optimization (POGO)}
285285

286286
What does it mean for the executable to be better? We have already looked at an example about how to predict branches. Predicting it correctly will be faster than predicting it incorrectly, but this is not the only thing. The algorithms will aim for speed in the areas that are ``hot'' (performance critical and/or common scenarios). The algorithms will alternatively aim to minimize the size of code of areas that are ``cold'' (not heavily used). It is recommended in~\cite{pogo} that less than 5\% of methods should be compiled for speed.
287287

288-
It is possible that we can combine multiple training runs and we can manually give some suggestions of what scenarios are important. Obviously the more a scenario runs in the training data, the more important it will be, as far as the POGO optimization routine is concerned, but multiple runs can be merged with user assigned weightings.
288+
It is possible that we can combine multiple training runs and we can manually give some suggestions of what scenarios are important. The more a scenario runs in the training data, the more important it will be, as far as the POGO optimization routine is concerned, but also, multiple runs can be merged with user assigned weightings.
289289

290290
\subsection*{Behind the Scenes}
291291

292-
In the optimize phase, the training data is used to do the following optimizations( which I will point out are based on C and \CPP~ programs and not necessarily Rust, but the principles should work because the Rust compiler's approach to this is based on that of LLVM/Clang)~\cite{pogo2}:
292+
In the optimize phase, the training data is used to do the following optimizations (which I will point out are based on C and \CPP~ programs and not necessarily Rust, but the principles should work because the Rust compiler's approach to this is based on that of LLVM/Clang)~\cite{pogo2}:
293293

294294
\begin{multicols}{2}
295295
\begin{enumerate}
@@ -343,7 +343,7 @@ \subsection*{Behind the Scenes}
343343

344344
\subsection*{Benchmark Results}
345345

346-
This table, condensed from~\cite{pogo2} summarizes the gains to be made. The application under test is a standard benchmark suite (Spec2K) (admittedly, C rather than Rust, but the goal is to see if the principle of POGO works and not just a specific implementation):
346+
This table, condensed from~\cite{pogo2}, summarizes the gains achieved. The application under test is a standard benchmark suite (Spec2K) (admittedly, C rather than Rust, but the goal is to see if the principle of POGO works and not just a specific implementation):
347347

348348
\begin{center}
349349
\begin{tabular}{l|l|l|l|l|l}

0 commit comments

Comments
 (0)