Skip to content

Commit 6fdab39

Browse files
committed
Prepare L24 flip note
1 parent 851ad81 commit 6fdab39

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

lectures/flipped/L24.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Lecture 24 — Large Language Models
2+
3+
## Roadmap
4+
5+
We will try the OpenAI website and some Huggine Face demos to see how powerful
6+
current AI is, and then we will try two Hugging Face tutorials to see how model
7+
performance can be optimized.
8+
9+
## Let's try them
10+
11+
Activity: let's try if GPT-3.5 or GPT-4 can fix Rust code. Check out some Rust
12+
questions here: <https://practice.course.rs>
13+
14+
Activity: let's try image generation, say "please draw me a cute kitten" using
15+
DALL·E, and also stable diffusion
16+
<https://huggingface.co/spaces/stabilityai/stable-diffusion>
17+
18+
Activity: we can also try if GPT-3.5 or GPT-4 can decipher passwords if we give
19+
it the rainbow table that we created in L23.
20+
21+
## Hugging Face Hub
22+
23+
It works as a central place (like GitHub, but for AI) where anyone can explore,
24+
experiment, collaborate, and build technology with Machine Learning. In addition
25+
to host code like GitHub, it also put some restrictions on format, interface,
26+
etc. so that people working on different things can collaborate conveniently.
27+
Currently it has four big categories: Repositories, Models, Datasets, Spaces.
28+
29+
### Natural Language Processing (NLP)
30+
31+
A simple NLP task is translation. For example, given a sequence of words as the
32+
input, you would like to get a sequence of words in a different language as the
33+
output. Before Transformers, the words in the output sequence can only be
34+
generated one at a time, where the process cannot be executed in parallel well.
35+
However, Transformer architecture allows them to be generated in parallel.
36+
37+
There are three main groups of optimizations.
38+
39+
- Tensor Contractions: batched matrix-matrix multiplications. Most
40+
compute-intensive;
41+
- Statistical Normalizations: softmax and layer normalization. Less
42+
compute-intensive;
43+
- Element-wise Operators: remaining operators such as activations. Least
44+
compute-intensive
45+
46+
And two targets:
47+
48+
- The second is how can we train the model efficiently.
49+
- how to make the model give answers or predictions
50+
51+
Let's try some techniques (the code in `live-coding/L24` was tested on
52+
ecetesla0, but it takes time to run so may be just refer to the lecture note):
53+
54+
(See more info on Back Propogation
55+
<https://www.3blue1brown.com/topics/neural-networks>)
56+
57+
- Batch size choice ([Original
58+
tutorial](<https://huggingface.co/docs/transformers/v4.38.2/en/model_memory_anatomy>))
59+
- Gradient Accumulation ([Original
60+
tutorial](<https://huggingface.co/docs/transformers/main/training#prepare-a-dataset>))
61+
62+
For broader ideas:
63+
<https://huggingface.co/docs/transformers/perf_train_gpu_one#batch-size-choice>
64+
65+
## Other
66+
67+
Worth to try: [NLP From Scratch: Translation with a Sequence to Sequence Network
68+
and
69+
Attention](<https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html>)
70+
71+
Also
72+
<https://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected/index.html#case-studies>

0 commit comments

Comments
 (0)