Merge branch 'master' of git@github.com:jzarnett/ece459.git

jzarnett · Apr 3, 2024 · 170e3f6 · 170e3f6
2 parents b18f4e0 + 0306d02
commit 170e3f6
Show file tree

Hide file tree

Showing 52 changed files with 945 additions and 497 deletions.
diff --git a/lectures/L26-slides.tex b/lectures/L26-slides.tex
@@ -19,6 +19,10 @@
 \begin{frame}
 \frametitle{Who Can It Be Now?}
 
+\begin{center}
+	\includegraphics[width=0.6\textwidth]{images/n-body-go-brrrrr.png}
+\end{center}
+
 We usually assume that CPU is the problem... but is that true?
 
 Let's also relate this to scalability: max users or transactions.

diff --git a/lectures/compiled/L26-slides-Profiling_and_Scalability.pdf b/lectures/compiled/L26-slides-Profiling_and_Scalability.pdf
diff --git a/lectures/compiled/notebook.pdf b/lectures/compiled/notebook.pdf
diff --git a/lectures/flipped/L04.md b/lectures/flipped/L04.md
@@ -208,10 +208,11 @@ fails to compile, so although as a human, we see the `result` is still valid
 when we print it since it always points to `string1`, the compiler cannot see
 that. The compiler will consider the `result` points to an invalid value
 whenever either `string1` or `string2` is dropped. (I see references in pairs,
-of course here are three pairs: `x,y`; `x,result`; `y,result`. For any pair, if
-the two references are with the same lifetime, say `'a`. Then it means we tell
-the compiler that if one reference becomes invalid, then it can just assume
-another also becomes invalid)
+of course here are three pairs: `x,y`; `x,returned result`; `y,returned result`.
+For any pair, if the two references are with the same lifetime, say `'a`. Then
+it means we tell the compiler that if one reference becomes invalid, then it can
+just assume another also becomes invalid; or if one is alive then the other has
+to be alive)
 
 ```rust
 fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {

diff --git a/lectures/flipped/L17.md b/lectures/flipped/L17.md
@@ -168,3 +168,9 @@ Gave out cookies.
 Showed an alias analysis/side effect analysis example on the board.
 Lectured about link-time optimization. Did a discussion of inlining
 with some audience participation. Did not talk about tail recursion elimination.
+
+# After-action report, huanyi, 12Feb24
+
+I went through all the things, but by just giving lectures. I didn't do the
+living coding since RustExplorer does not work for it anymore. I didn't do
+Stream VByte activity, either.
diff --git a/lectures/flipped/L18.md b/lectures/flipped/L18.md
@@ -4,12 +4,16 @@
 
 Compilation takes time. Let's take a look how compiler gets improved.
 
+https://rustc-dev-guide.rust-lang.org
+
 ## Measurement Infrastructure
 
 The optimization workflow is just to make a potential change to the compiler
 (ideally hotspots) and then benchmark and profile it. Let's check out
 <https://perf.rust-lang.org> and see how `rustc` gets improved.
 
+Glossary: https://github.com/rust-lang/rustc-perf/blob/master/docs/glossary.md
+
 ### Benchmark selection
 
 Question: How do you select appropriate benchmarks?
@@ -30,11 +34,14 @@ In addition to mention the cases in the lecture notes, use <https://godbolt.org>
 to show how LLVM backend will emit inline code rather than a call to `memcpy`.
 
 ```rust
+// with `-C opt-level=3`
+
 pub struct Var {
     a: [i128; 9]
     // ^ change 9 to a smaller number and see what happens
 }
 
+#[no_mangle]
 pub fn bar(x: Var) -> Var {
     let y = x;
     y
@@ -76,7 +83,14 @@ Using an appropriate linker can reduce the build time.
 
 Good to read: <https://nnethercote.github.io/perf-book/title-page.html>
 
+`cargo build --timings`:
+<https://doc.rust-lang.org/nightly/cargo/reference/timings.html>
+
 # After-action report, plam, 6 Mar 2023
 
 Mostly lecture/with audience participation. Showed slides for micro-opts
-and architecture-level changes.
+and architecture-level changes.
+
+# After-action report, huanyi, 16Feb24
+
+Went over all the stuffs except the "Other" section.
diff --git a/lectures/flipped/L19.md b/lectures/flipped/L19.md
@@ -1,5 +1,9 @@
 # Lecture 19 — Query Optimization
 
+## Roadmap
+
+We will talk about optimizing database queries.
+
 ## Intro
 
 Context; high-level steps of executing an assignment/query.
@@ -9,8 +13,8 @@ Context; high-level steps of executing an assignment/query.
 Activity: inside `flipped/L19`, there are two tables, `customer` and `orders`. I
 would like to know
 
-* Names of customers who live in New York and who have at least one order with over
-  5000 dollars.
+* Names of customers who live in New York and who have at least one order with
+  over 5000 dollars.
 
 (Can students express this in SQL-like language? Do we use a subquery or join?)
 
@@ -33,16 +37,17 @@ Question: what is the overhead of optimization?
 
 ## Measurement and estimation
 
-We have had an activity where we estimate travel time from Waterloo to
-Montreal and to SF before, so we won't do it again.
+We have had an activity where we estimate travel time from Waterloo to Montreal
+and to SF before, so we won't do it again.
 
-Anyway, same thing for estimating query execution time. What are
-the numbers of interest here?
+Anyway, same thing for estimating query execution time. What are the numbers of
+interest here?
 
 * disk access time
 * CPU time (ignored for simplicity but actually important)
 
 Complicating factors:
+
 * system load
 * buffer contents
 * data layout
@@ -65,10 +70,9 @@ Talk about tradeoff in maintaining metadata, and histograms.
 SELECT c.* FROM customer AS c JOIN address AS a ON c.address_id = a.address_id;
 ```
 
-The above `join` can be eliminated if there is a foreign key from
-customer's address_id to the address id field and nulls are not
-permitted; this means that there is always an address for each
-customer.
+The above `join` can be eliminated if there is a foreign key from customer's
+address_id to the address id field and nulls are not permitted; this means that
+there is always an address for each customer.
 
 (Go through the other examples in lecture notes too).
 
@@ -84,7 +88,15 @@ customer.
 * Set limits
 * Plan caching
 
+## Other
+
+https://wizardzines.com/comics/explain/
+
 # After-action report, plam, 10 Mar 2023
 
-Just did this lecture today; did not reach L20. Mostly lecture plus the interactive activities
-here. Did not do the "Alternative Routes" activity.
+Just did this lecture today; did not reach L20. Mostly lecture plus the
+interactive activities here. Did not do the "Alternative Routes" activity.
+
+# After-action report, huanyi, 16Feb24
+
+Mainly the query optimization activity. Very briefly talked about other stuffs.
diff --git a/lectures/flipped/L19_answers.md b/lectures/flipped/L19_answers.md
@@ -15,6 +15,6 @@ index (assuming tables have similar number of records). Of course, only if the
 statistics are up-to-date.
 
 * The overhead is that the query optimization is applied at runtime. If the
-original query is fast, then the optimization process may slow it
-down. Longer discussions can be about situations with insufficient memory,
-caching enabled, etc. Some optimizations may not work as expected.
+original query is fast, then the optimization process may slow it down. Longer
+discussions can be about situations with insufficient memory, caching enabled,
+etc. Some optimizations may not work as expected.
diff --git a/lectures/flipped/L20.md b/lectures/flipped/L20.md
@@ -13,8 +13,8 @@ The quick summary is:
 * observe and change: modify configuration/data layout on the fly
 * genetic algorithms: one way of improving configurations/data layouts
 * just-in-time compilation/optimization
- + lock optimization
- + on-stack replacement
+  * lock optimization
+  * on-stack replacement
 * binary rewriting
 
 ## Observe and change: experimentation [live coding 5 minutes, experimentation 10 minutes]
@@ -23,7 +23,7 @@ Exercise: can we think of real-life software examples where observe-and-change
 makes sense? (Talk about the ones in the lecture notes: data structure
 selection; database query processing; indexes; external services).
 
-Consider the [im docs](https://docs.rs/im/14.3.0/im/):
+Consider the [im docs](https://docs.rs/im/15.1.0/im/index.html):
 
 > For instance, Vec beats everything at memory usage, indexing and operations
 > that happen at the back of the list, but is terrible at insertion and removal,
@@ -48,7 +48,7 @@ Consider the [im docs](https://docs.rs/im/14.3.0/im/):
 > of a single chunk.
 
 Exercise: Look at `lectures/live-coding/L20a-skel`, experiment with different
-data structures and operation mixes, and try to understand what is fast when.
+data structures and operations, and try to understand what is fast and when.
 
 Exercise: Look at `lectures/live-coding/L20c-skel`, choose a good initial size.
 
@@ -59,22 +59,43 @@ given the workload.
 
 ## Rewriting the binary [30 minutes]
 
-note: L20c-skel basically does this already.
+A real example is in <https://github.com/aengelke/binopt>. Based on `func`, it
+generates a new version `new_func` which can be later optimized. Although the
+call to `new_func` is coded as `new_func(8, 16)`, the second parameter `16`,
+however, will be changed to `42` when it is executed. Note that this change
+happens at runtime using `binopt_cfg_set_parami`. (Question: why not `if-else`?)
 
-Have the program modify itself to add inline annotations, recompile itself and
-re-invoke the new version of itself. You may find the [build script
+Rewriting the binary is difficult, we will try rewriting the source code
+instead. The steps are similar though.
+
+Have the program (also in `L20c-skel`) modify itself to add inline annotations,
+recompile itself and re-invoke the new version of itself. You may find the
+[build script
 examples](https://doc.rust-lang.org/cargo/reference/build-script-examples.html)
-useful.
+useful, especially how it uses `Command::new`.
 
 # After-action report, plam, 13 Mar 2023
 
-Did all the things, pretty much, and did not get to L21 (which was the plan).
-To fix for next time: L20c-skel needs longer runs; to be able to handle changes in whitespace
-in the target line because VS Code changes whitespace; and should not
+Did all the things, pretty much, and did not get to L21 (which was the plan). To
+fix for next time: L20c-skel needs longer runs; to be able to handle changes in
+whitespace in the target line because VS Code changes whitespace; and should not
 put extra blank lines.
 
-We also experimented with lockopt-skel but changed the locks to atomics and
-saw how that worked. We should be consistent with using the flipped/ and live-coding
+We also experimented with lockopt-skel but changed the locks to atomics and saw
+how that worked. We should be consistent with using the flipped/ and live-coding
 directories for things here.
 
 We did not talk about *why* VecDeque worked better.
+
+# After-action report, huanyi, 26Feb24
+
+Discussed each strategy shortly. Did all exercises except the rewriting binary
+one. With the fixed L20a code, now the elapsed times make much more sense. It
+turns out `Vec` is extremely efficient when inserting at the end, but it is
+terrible when removing from the front. `VecDeque` performs generally well for
+every workload. I did not try `Vector`, which seems to be not maintained
+anymore.
+
+# After-action report, huanyi, 01Mar24
+
+Finished the rest
diff --git a/lectures/flipped/L20/lockopt-skel/Cargo.lock b/lectures/flipped/L20/lockopt-skel/Cargo.lock
diff --git a/lectures/flipped/L20/lockopt-workload/Cargo.lock b/lectures/flipped/L20/lockopt-workload/Cargo.lock
diff --git a/lectures/flipped/L20/lockopt-workload/src/main.rs b/lectures/flipped/L20/lockopt-workload/src/main.rs