Ordinary least squares (OLS). Suppose you have a datset of
For ease, we put the examples into a [[design matrix]]
We can find the optimal
\begin{gather*} \nabla_w \langle y - Xw, y - Xw \rangle = 0 \\ \nabla_w \left[ {\langle y, y \rangle} - 2 \langle Xw, y \rangle + \langle Xw, Xw \rangle \right] = 0 \\ -2 X^\mathsf{T}y + 2(X^\mathsf{T}X)w = 0 \end{gather*}
Rearranging, we get a closed-form solution.
This trick for making an "inverse" out of a non-rectangular matrix comes up often enough that it has its own name, the Moore-Penrose pseudoinverse. In that notation, we just write
- The matrix
$X^\mathsf{T}X$ is only invertible when the rank of$X$ is$d$ , i.e. each of the columns is linearly independent. - [[Ridge regression]] is also an important extension to least squares.