Skip to content

Commit

Permalink
Merge branch 'master' of git@github.com:jzarnett/ece459.git
Browse files Browse the repository at this point in the history
  • Loading branch information
jzarnett committed Apr 3, 2024
2 parents b18f4e0 + 0306d02 commit 170e3f6
Show file tree
Hide file tree
Showing 52 changed files with 945 additions and 497 deletions.
4 changes: 4 additions & 0 deletions lectures/L26-slides.tex
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
\begin{frame}
\frametitle{Who Can It Be Now?}

\begin{center}
\includegraphics[width=0.6\textwidth]{images/n-body-go-brrrrr.png}
\end{center}

We usually assume that CPU is the problem... but is that true?

Let's also relate this to scalability: max users or transactions.
Expand Down
Binary file modified lectures/compiled/L26-slides-Profiling_and_Scalability.pdf
Binary file not shown.
Binary file modified lectures/compiled/notebook.pdf
Binary file not shown.
9 changes: 5 additions & 4 deletions lectures/flipped/L04.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,10 +208,11 @@ fails to compile, so although as a human, we see the `result` is still valid
when we print it since it always points to `string1`, the compiler cannot see
that. The compiler will consider the `result` points to an invalid value
whenever either `string1` or `string2` is dropped. (I see references in pairs,
of course here are three pairs: `x,y`; `x,result`; `y,result`. For any pair, if
the two references are with the same lifetime, say `'a`. Then it means we tell
the compiler that if one reference becomes invalid, then it can just assume
another also becomes invalid)
of course here are three pairs: `x,y`; `x,returned result`; `y,returned result`.
For any pair, if the two references are with the same lifetime, say `'a`. Then
it means we tell the compiler that if one reference becomes invalid, then it can
just assume another also becomes invalid; or if one is alive then the other has
to be alive)

```rust
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
Expand Down
6 changes: 6 additions & 0 deletions lectures/flipped/L17.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,3 +168,9 @@ Gave out cookies.
Showed an alias analysis/side effect analysis example on the board.
Lectured about link-time optimization. Did a discussion of inlining
with some audience participation. Did not talk about tail recursion elimination.

# After-action report, huanyi, 12Feb24

I went through all the things, but by just giving lectures. I didn't do the
living coding since RustExplorer does not work for it anymore. I didn't do
Stream VByte activity, either.
16 changes: 15 additions & 1 deletion lectures/flipped/L18.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@

Compilation takes time. Let's take a look how compiler gets improved.

https://rustc-dev-guide.rust-lang.org

## Measurement Infrastructure

The optimization workflow is just to make a potential change to the compiler
(ideally hotspots) and then benchmark and profile it. Let's check out
<https://perf.rust-lang.org> and see how `rustc` gets improved.

Glossary: https://github.com/rust-lang/rustc-perf/blob/master/docs/glossary.md

### Benchmark selection

Question: How do you select appropriate benchmarks?
Expand All @@ -30,11 +34,14 @@ In addition to mention the cases in the lecture notes, use <https://godbolt.org>
to show how LLVM backend will emit inline code rather than a call to `memcpy`.

```rust
// with `-C opt-level=3`

pub struct Var {
a: [i128; 9]
// ^ change 9 to a smaller number and see what happens
}

#[no_mangle]
pub fn bar(x: Var) -> Var {
let y = x;
y
Expand Down Expand Up @@ -76,7 +83,14 @@ Using an appropriate linker can reduce the build time.

Good to read: <https://nnethercote.github.io/perf-book/title-page.html>

`cargo build --timings`:
<https://doc.rust-lang.org/nightly/cargo/reference/timings.html>

# After-action report, plam, 6 Mar 2023

Mostly lecture/with audience participation. Showed slides for micro-opts
and architecture-level changes.
and architecture-level changes.

# After-action report, huanyi, 16Feb24

Went over all the stuffs except the "Other" section.
36 changes: 24 additions & 12 deletions lectures/flipped/L19.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Lecture 19 — Query Optimization

## Roadmap

We will talk about optimizing database queries.

## Intro

Context; high-level steps of executing an assignment/query.
Expand All @@ -9,8 +13,8 @@ Context; high-level steps of executing an assignment/query.
Activity: inside `flipped/L19`, there are two tables, `customer` and `orders`. I
would like to know

* Names of customers who live in New York and who have at least one order with over
5000 dollars.
* Names of customers who live in New York and who have at least one order with
over 5000 dollars.

(Can students express this in SQL-like language? Do we use a subquery or join?)

Expand All @@ -33,16 +37,17 @@ Question: what is the overhead of optimization?

## Measurement and estimation

We have had an activity where we estimate travel time from Waterloo to
Montreal and to SF before, so we won't do it again.
We have had an activity where we estimate travel time from Waterloo to Montreal
and to SF before, so we won't do it again.

Anyway, same thing for estimating query execution time. What are
the numbers of interest here?
Anyway, same thing for estimating query execution time. What are the numbers of
interest here?

* disk access time
* CPU time (ignored for simplicity but actually important)

Complicating factors:

* system load
* buffer contents
* data layout
Expand All @@ -65,10 +70,9 @@ Talk about tradeoff in maintaining metadata, and histograms.
SELECT c.* FROM customer AS c JOIN address AS a ON c.address_id = a.address_id;
```

The above `join` can be eliminated if there is a foreign key from
customer's address_id to the address id field and nulls are not
permitted; this means that there is always an address for each
customer.
The above `join` can be eliminated if there is a foreign key from customer's
address_id to the address id field and nulls are not permitted; this means that
there is always an address for each customer.

(Go through the other examples in lecture notes too).

Expand All @@ -84,7 +88,15 @@ customer.
* Set limits
* Plan caching

## Other

https://wizardzines.com/comics/explain/

# After-action report, plam, 10 Mar 2023

Just did this lecture today; did not reach L20. Mostly lecture plus the interactive activities
here. Did not do the "Alternative Routes" activity.
Just did this lecture today; did not reach L20. Mostly lecture plus the
interactive activities here. Did not do the "Alternative Routes" activity.

# After-action report, huanyi, 16Feb24

Mainly the query optimization activity. Very briefly talked about other stuffs.
6 changes: 3 additions & 3 deletions lectures/flipped/L19_answers.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ index (assuming tables have similar number of records). Of course, only if the
statistics are up-to-date.

* The overhead is that the query optimization is applied at runtime. If the
original query is fast, then the optimization process may slow it
down. Longer discussions can be about situations with insufficient memory,
caching enabled, etc. Some optimizations may not work as expected.
original query is fast, then the optimization process may slow it down. Longer
discussions can be about situations with insufficient memory, caching enabled,
etc. Some optimizations may not work as expected.
47 changes: 34 additions & 13 deletions lectures/flipped/L20.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ The quick summary is:
* observe and change: modify configuration/data layout on the fly
* genetic algorithms: one way of improving configurations/data layouts
* just-in-time compilation/optimization
+ lock optimization
+ on-stack replacement
* lock optimization
* on-stack replacement
* binary rewriting

## Observe and change: experimentation [live coding 5 minutes, experimentation 10 minutes]
Expand All @@ -23,7 +23,7 @@ Exercise: can we think of real-life software examples where observe-and-change
makes sense? (Talk about the ones in the lecture notes: data structure
selection; database query processing; indexes; external services).

Consider the [im docs](https://docs.rs/im/14.3.0/im/):
Consider the [im docs](https://docs.rs/im/15.1.0/im/index.html):

> For instance, Vec beats everything at memory usage, indexing and operations
> that happen at the back of the list, but is terrible at insertion and removal,
Expand All @@ -48,7 +48,7 @@ Consider the [im docs](https://docs.rs/im/14.3.0/im/):
> of a single chunk.
Exercise: Look at `lectures/live-coding/L20a-skel`, experiment with different
data structures and operation mixes, and try to understand what is fast when.
data structures and operations, and try to understand what is fast and when.

Exercise: Look at `lectures/live-coding/L20c-skel`, choose a good initial size.

Expand All @@ -59,22 +59,43 @@ given the workload.

## Rewriting the binary [30 minutes]

note: L20c-skel basically does this already.
A real example is in <https://github.com/aengelke/binopt>. Based on `func`, it
generates a new version `new_func` which can be later optimized. Although the
call to `new_func` is coded as `new_func(8, 16)`, the second parameter `16`,
however, will be changed to `42` when it is executed. Note that this change
happens at runtime using `binopt_cfg_set_parami`. (Question: why not `if-else`?)

Have the program modify itself to add inline annotations, recompile itself and
re-invoke the new version of itself. You may find the [build script
Rewriting the binary is difficult, we will try rewriting the source code
instead. The steps are similar though.

Have the program (also in `L20c-skel`) modify itself to add inline annotations,
recompile itself and re-invoke the new version of itself. You may find the
[build script
examples](https://doc.rust-lang.org/cargo/reference/build-script-examples.html)
useful.
useful, especially how it uses `Command::new`.

# After-action report, plam, 13 Mar 2023

Did all the things, pretty much, and did not get to L21 (which was the plan).
To fix for next time: L20c-skel needs longer runs; to be able to handle changes in whitespace
in the target line because VS Code changes whitespace; and should not
Did all the things, pretty much, and did not get to L21 (which was the plan). To
fix for next time: L20c-skel needs longer runs; to be able to handle changes in
whitespace in the target line because VS Code changes whitespace; and should not
put extra blank lines.

We also experimented with lockopt-skel but changed the locks to atomics and
saw how that worked. We should be consistent with using the flipped/ and live-coding
We also experimented with lockopt-skel but changed the locks to atomics and saw
how that worked. We should be consistent with using the flipped/ and live-coding
directories for things here.

We did not talk about *why* VecDeque worked better.

# After-action report, huanyi, 26Feb24

Discussed each strategy shortly. Did all exercises except the rewriting binary
one. With the fixed L20a code, now the elapsed times make much more sense. It
turns out `Vec` is extremely efficient when inserting at the end, but it is
terrible when removing from the front. `VecDeque` performs generally well for
every workload. I did not try `Vector`, which seems to be not maintained
anymore.

# After-action report, huanyi, 01Mar24

Finished the rest
7 changes: 0 additions & 7 deletions lectures/flipped/L20/lockopt-skel/Cargo.lock

This file was deleted.

86 changes: 0 additions & 86 deletions lectures/flipped/L20/lockopt-workload/Cargo.lock

This file was deleted.

46 changes: 0 additions & 46 deletions lectures/flipped/L20/lockopt-workload/src/main.rs

This file was deleted.

Loading

0 comments on commit 170e3f6

Please sign in to comment.