add a thought to L23

patricklam · patricklam · commit f99376f2060a · 2024-09-16T17:08:47.000+12:00
diff --git a/lectures/L23-slides.tex b/lectures/L23-slides.tex
@@ -520,7 +520,7 @@ \part{Large Language Models}
 
 We can focus on how to generate a model that gives answers quickly...
 
-Or we can focus on how to generate or train the model quickly.
+Or we can focus on how to generate or train the model quickly---this will be our focus.
 
 \end{frame}
 
@@ -542,12 +542,24 @@ \part{Large Language Models}
 
 Why would we customize some LLM?
 
-Don't send your data to OpenAI...
+Don't send your data to OpenAI\ldots
 
 Specialize for your workload.
 
 \end{frame}
 
+\begin{frame}
+\frametitle{Configuration Spaces}
+
+There are a lot of knobs we can tweak with respect to training the model.
+
+We'll explore the configuration space: \\
+\hspace*{2em} see what are the effects of changing resource limits.
+
+``Don't guess, measure,'' but you also have to measure something meaningful\ldots
+
+\end{frame}
+
 \begin{frame}
 \frametitle{Batch Size}
 
diff --git a/lectures/L23.tex b/lectures/L23.tex
@@ -160,13 +160,15 @@ \subsection*{Optimizing LLMs}
 
 \paragraph{Optimizing.}There are two kinds of optimizations that are worth talking about. The first one is the idea of model performance: how do we generate a model that gives answers or predictions quickly? The second is how can we generate or train the model efficiently.
 
-The first one is easy to motivate and we have learned numerous techniques that could be applied here. Examples: Use more space to reduce CPU usage, optimize for common cases, speculate, et cetera. Some of these are more fun than others: given a particular question, can you guess what the followup might be? 
+The first one is easy to motivate and we have learned numerous techniques that could be applied here. Examples: Use more space to reduce CPU usage, optimize for common cases, speculate, et cetera. Some of these are more fun than others: given a particular question, can you guess what the followup might be? Mostly, though, we'll look at how.
 
 Before we get into the subject of how, we should address the question of why you would wish to generate or customize a LLM rather than use an existing one. To start with, you might not want to send your (sensitive) data to a third party for analysis. Still, you can download and use some existing models. So generating a model or refining an existing one may make sense in a situation where you will get better results by creating a more specialized model than the generic one. To illustrate what I mean, ChatGPT will gladly make you a Dungeons \& Dragons campaign setting, but you don't need it to have that capability if you want it to analyze your customer behaviours to find the ones who are most likely to be open to upgrading their plan. That extra capability (parameters) takes up space and computational time and a smaller model that gives better answers is more efficient.
 
+What we are going to do is explore the configuration space for training the model. There are a lot of knobs that we can tweak, with respect to which resources to consume. So we'll try to measure the effects of changing resource limits. One challenge, which we'll touch on, is that measurement only works if there is something useful to measure. (Yes, ``don't guess, measure'', but also you need to measure something meaningful. ``Number goes up'', in itself, is not useful.)
+
 Our first major optimization, and perhaps the easiest to do, is the batch size. The batch size is just telling the GPU how much to do at once. It's a little bit like when we discussed the idea of creating more threads to increase performance; you may see an improvement by having more workers active but you also may not get any additional benefit from worker $N+1$ over $N$ since there may not be enough work or other resource conflicts.
 
-I've used an example from Hugging Face~\cite{hf2} with some light modifications to see what we can do with a very simply example using dummy data. Let's go over and look at that example now. It's in Python (a lot of LLM, machine learning, etc. content is) but it shouldn't be too difficult to understand as we walk through it.
+I've used an example from Hugging Face~\cite{hf2} with some light modifications to see what we can do with a very simple example using dummy data. Let's go over and look at that example now. It's in Python (a lot of LLM, machine learning, etc. content is) but it shouldn't be too difficult to understand as we walk through it.
 
 \begin{lstlisting}[language=python]
 import numpy as np