Skip to content

Commit f99376f

Browse files
committed
add a thought to L23
1 parent a193a61 commit f99376f

File tree

2 files changed

+18
-4
lines changed

2 files changed

+18
-4
lines changed

lectures/L23-slides.tex

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -520,7 +520,7 @@ \part{Large Language Models}
520520

521521
We can focus on how to generate a model that gives answers quickly...
522522

523-
Or we can focus on how to generate or train the model quickly.
523+
Or we can focus on how to generate or train the model quickly---this will be our focus.
524524

525525
\end{frame}
526526

@@ -542,12 +542,24 @@ \part{Large Language Models}
542542

543543
Why would we customize some LLM?
544544

545-
Don't send your data to OpenAI...
545+
Don't send your data to OpenAI\ldots
546546

547547
Specialize for your workload.
548548

549549
\end{frame}
550550

551+
\begin{frame}
552+
\frametitle{Configuration Spaces}
553+
554+
There are a lot of knobs we can tweak with respect to training the model.
555+
556+
We'll explore the configuration space: \\
557+
\hspace*{2em} see what are the effects of changing resource limits.
558+
559+
``Don't guess, measure,'' but you also have to measure something meaningful\ldots
560+
561+
\end{frame}
562+
551563
\begin{frame}
552564
\frametitle{Batch Size}
553565

lectures/L23.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,13 +160,15 @@ \subsection*{Optimizing LLMs}
160160

161161
\paragraph{Optimizing.}There are two kinds of optimizations that are worth talking about. The first one is the idea of model performance: how do we generate a model that gives answers or predictions quickly? The second is how can we generate or train the model efficiently.
162162

163-
The first one is easy to motivate and we have learned numerous techniques that could be applied here. Examples: Use more space to reduce CPU usage, optimize for common cases, speculate, et cetera. Some of these are more fun than others: given a particular question, can you guess what the followup might be?
163+
The first one is easy to motivate and we have learned numerous techniques that could be applied here. Examples: Use more space to reduce CPU usage, optimize for common cases, speculate, et cetera. Some of these are more fun than others: given a particular question, can you guess what the followup might be? Mostly, though, we'll look at how.
164164

165165
Before we get into the subject of how, we should address the question of why you would wish to generate or customize a LLM rather than use an existing one. To start with, you might not want to send your (sensitive) data to a third party for analysis. Still, you can download and use some existing models. So generating a model or refining an existing one may make sense in a situation where you will get better results by creating a more specialized model than the generic one. To illustrate what I mean, ChatGPT will gladly make you a Dungeons \& Dragons campaign setting, but you don't need it to have that capability if you want it to analyze your customer behaviours to find the ones who are most likely to be open to upgrading their plan. That extra capability (parameters) takes up space and computational time and a smaller model that gives better answers is more efficient.
166166

167+
What we are going to do is explore the configuration space for training the model. There are a lot of knobs that we can tweak, with respect to which resources to consume. So we'll try to measure the effects of changing resource limits. One challenge, which we'll touch on, is that measurement only works if there is something useful to measure. (Yes, ``don't guess, measure'', but also you need to measure something meaningful. ``Number goes up'', in itself, is not useful.)
168+
167169
Our first major optimization, and perhaps the easiest to do, is the batch size. The batch size is just telling the GPU how much to do at once. It's a little bit like when we discussed the idea of creating more threads to increase performance; you may see an improvement by having more workers active but you also may not get any additional benefit from worker $N+1$ over $N$ since there may not be enough work or other resource conflicts.
168170

169-
I've used an example from Hugging Face~\cite{hf2} with some light modifications to see what we can do with a very simply example using dummy data. Let's go over and look at that example now. It's in Python (a lot of LLM, machine learning, etc. content is) but it shouldn't be too difficult to understand as we walk through it.
171+
I've used an example from Hugging Face~\cite{hf2} with some light modifications to see what we can do with a very simple example using dummy data. Let's go over and look at that example now. It's in Python (a lot of LLM, machine learning, etc. content is) but it shouldn't be too difficult to understand as we walk through it.
170172

171173
\begin{lstlisting}[language=python]
172174
import numpy as np

0 commit comments

Comments
 (0)