|
| 1 | +# Lecture 24 — Large Language Models |
| 2 | + |
| 3 | +## Roadmap |
| 4 | + |
| 5 | +We will try the OpenAI website and some Huggine Face demos to see how powerful |
| 6 | +current AI is, and then we will try two Hugging Face tutorials to see how model |
| 7 | +performance can be optimized. |
| 8 | + |
| 9 | +## Let's try them |
| 10 | + |
| 11 | +Activity: let's try if GPT-3.5 or GPT-4 can fix Rust code. Check out some Rust |
| 12 | +questions here: <https://practice.course.rs> |
| 13 | + |
| 14 | +Activity: let's try image generation, say "please draw me a cute kitten" using |
| 15 | +DALL·E, and also stable diffusion |
| 16 | +<https://huggingface.co/spaces/stabilityai/stable-diffusion> |
| 17 | + |
| 18 | +Activity: we can also try if GPT-3.5 or GPT-4 can decipher passwords if we give |
| 19 | +it the rainbow table that we created in L23. |
| 20 | + |
| 21 | +## Hugging Face Hub |
| 22 | + |
| 23 | +It works as a central place (like GitHub, but for AI) where anyone can explore, |
| 24 | +experiment, collaborate, and build technology with Machine Learning. In addition |
| 25 | +to host code like GitHub, it also put some restrictions on format, interface, |
| 26 | +etc. so that people working on different things can collaborate conveniently. |
| 27 | +Currently it has four big categories: Repositories, Models, Datasets, Spaces. |
| 28 | + |
| 29 | +### Natural Language Processing (NLP) |
| 30 | + |
| 31 | +A simple NLP task is translation. For example, given a sequence of words as the |
| 32 | +input, you would like to get a sequence of words in a different language as the |
| 33 | +output. Before Transformers, the words in the output sequence can only be |
| 34 | +generated one at a time, where the process cannot be executed in parallel well. |
| 35 | +However, Transformer architecture allows them to be generated in parallel. |
| 36 | + |
| 37 | +There are three main groups of optimizations. |
| 38 | + |
| 39 | +- Tensor Contractions: batched matrix-matrix multiplications. Most |
| 40 | + compute-intensive; |
| 41 | +- Statistical Normalizations: softmax and layer normalization. Less |
| 42 | + compute-intensive; |
| 43 | +- Element-wise Operators: remaining operators such as activations. Least |
| 44 | + compute-intensive |
| 45 | + |
| 46 | +And two targets: |
| 47 | + |
| 48 | +- The second is how can we train the model efficiently. |
| 49 | +- how to make the model give answers or predictions |
| 50 | + |
| 51 | +Let's try some techniques (the code in `live-coding/L24` was tested on |
| 52 | +ecetesla0, but it takes time to run so may be just refer to the lecture note): |
| 53 | + |
| 54 | +(See more info on Back Propogation |
| 55 | +<https://www.3blue1brown.com/topics/neural-networks>) |
| 56 | + |
| 57 | +- Batch size choice ([Original |
| 58 | + tutorial](<https://huggingface.co/docs/transformers/v4.38.2/en/model_memory_anatomy>)) |
| 59 | +- Gradient Accumulation ([Original |
| 60 | + tutorial](<https://huggingface.co/docs/transformers/main/training#prepare-a-dataset>)) |
| 61 | + |
| 62 | +For broader ideas: |
| 63 | +<https://huggingface.co/docs/transformers/perf_train_gpu_one#batch-size-choice> |
| 64 | + |
| 65 | +## Other |
| 66 | + |
| 67 | +Worth to try: [NLP From Scratch: Translation with a Sequence to Sequence Network |
| 68 | +and |
| 69 | +Attention](<https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html>) |
| 70 | + |
| 71 | +Also |
| 72 | +<https://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected/index.html#case-studies> |
0 commit comments