Prepare L24 flip note

h365chen · h365chen · commit 6fdab39e2f7e · 2024-03-07T23:17:24.000-05:00
diff --git a/lectures/flipped/L24.md b/lectures/flipped/L24.md
@@ -0,0 +1,72 @@
+# Lecture 24 — Large Language Models
+
+## Roadmap
+
+We will try the OpenAI website and some Huggine Face demos to see how powerful
+current AI is, and then we will try two Hugging Face tutorials to see how model
+performance can be optimized.
+
+## Let's try them
+
+Activity: let's try if GPT-3.5 or GPT-4 can fix Rust code. Check out some Rust
+questions here: <https://practice.course.rs>
+
+Activity: let's try image generation, say "please draw me a cute kitten" using
+DALL·E, and also stable diffusion
+<https://huggingface.co/spaces/stabilityai/stable-diffusion>
+
+Activity: we can also try if GPT-3.5 or GPT-4 can decipher passwords if we give
+it the rainbow table that we created in L23.
+
+## Hugging Face Hub
+
+It works as a central place (like GitHub, but for AI) where anyone can explore,
+experiment, collaborate, and build technology with Machine Learning. In addition
+to host code like GitHub, it also put some restrictions on format, interface,
+etc. so that people working on different things can collaborate conveniently.
+Currently it has four big categories: Repositories, Models, Datasets, Spaces.
+
+### Natural Language Processing (NLP)
+
+A simple NLP task is translation. For example, given a sequence of words as the
+input, you would like to get a sequence of words in a different language as the
+output. Before Transformers, the words in the output sequence can only be
+generated one at a time, where the process cannot be executed in parallel well.
+However, Transformer architecture allows them to be generated in parallel.
+
+There are three main groups of optimizations.
+
+- Tensor Contractions: batched matrix-matrix multiplications. Most
+  compute-intensive;
+- Statistical Normalizations: softmax and layer normalization. Less
+  compute-intensive;
+- Element-wise Operators: remaining operators such as activations. Least
+  compute-intensive
+
+And two targets:
+
+- The second is how can we train the model efficiently.
+- how to make the model give answers or predictions
+
+Let's try some techniques (the code in `live-coding/L24` was tested on
+ecetesla0, but it takes time to run so may be just refer to the lecture note):
+
+(See more info on Back Propogation
+<https://www.3blue1brown.com/topics/neural-networks>)
+
+- Batch size choice ([Original
+  tutorial](<https://huggingface.co/docs/transformers/v4.38.2/en/model_memory_anatomy>))
+- Gradient Accumulation ([Original
+  tutorial](<https://huggingface.co/docs/transformers/main/training#prepare-a-dataset>))
+
+For broader ideas:
+<https://huggingface.co/docs/transformers/perf_train_gpu_one#batch-size-choice>
+
+## Other
+
+Worth to try: [NLP From Scratch: Translation with a Sequence to Sequence Network
+and
+Attention](<https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html>)
+
+Also
+<https://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected/index.html#case-studies>