Skip to content

Commit

Permalink
Prepare L16 & L17 flipped note
Browse files Browse the repository at this point in the history
  • Loading branch information
h365chen committed Feb 9, 2024
1 parent cd1652d commit ce07811
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 14 deletions.
34 changes: 21 additions & 13 deletions lectures/flipped/L16.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Lecture 16 — Mostly Data Parallelism

## Roadmap

We will talk about SIMD and a case study of it.

## Two ideas

Data parallelism: performing *the same* operations on different input

Task parallelism: is performing *different* operations on different input.
Expand All @@ -23,22 +29,24 @@ pub fn foo(a: &[f64], b: &[f64], c: &mut [f64]) {
Without optimization

```asm
movsd xmm0, qword ptr [rcx]
addsd xmm0, qword ptr [rdx]
movsd qword ptr [rax], xmm0
; rustc 1.75.0
; line 758-760 on godbolt
movsd xmm0, qword ptr [rdx]
addsd xmm0, qword ptr [rcx]
movsd qword ptr [rax], xmm0
```

With optimization (opt-level=3)

```asm
movupd xmm0, xmmword ptr [rdi + 8*rcx]
movupd xmm1, xmmword ptr [rdi + 8*rcx + 16]
movupd xmm2, xmmword ptr [rdx + 8*rcx]
addpd xmm2, xmm0
movupd xmm0, xmmword ptr [rdx + 8*rcx + 16]
addpd xmm0, xmm1
movupd xmmword ptr [r8 + 8*rcx], xmm2
movupd xmmword ptr [r8 + 8*rcx + 16], xmm0
movupd xmm0, xmmword ptr [rdi + 8*rsi]
movupd xmm1, xmmword ptr [rdi + 8*rsi + 16] ; seems like a delay slot
movupd xmm2, xmmword ptr [rdx + 8*rsi]
addpd xmm2, xmm0
movupd xmm0, xmmword ptr [rdx + 8*rsi + 16]
addpd xmm0, xmm1
movupd xmmword ptr [r8 + 8*rsi], xmm2
movupd xmmword ptr [r8 + 8*rsi + 16], xmm0
```

Exercise: try to use <https://godbolt.org/> to explore the `foo` function. You
Expand All @@ -60,7 +68,7 @@ are in the lecture note)
```rust
/*
// runnable on rust explorer
// not runnable on rust explorer
[dependencies]
simdeez = "1.0.8"
Expand Down Expand Up @@ -131,4 +139,4 @@ then have a scalar result.

Tried again with poor-man's SIMD on the board, then worked through
Stream VByte example. Did not have students work through examples but
was somewhat interactive. Did do the SSE example as live coding.
was somewhat interactive. Did do the SSE example as live coding.
8 changes: 7 additions & 1 deletion lectures/flipped/L17.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Lecture 17 — Compiler Optimizations

Question
## Roadmap

We will talk about compiler optimizations, specifically, scalar optimizations,
loop optimizations, and link-time optimizations.

## Question

* How do you enable compiler optimization in `cargo`?
* Which profile will it use when you call `cargo build` or `cargo build
--release`?
Expand Down

0 comments on commit ce07811

Please sign in to comment.