From 2c3e4c544cb7a5951f3010f249aa90861b422bd5 Mon Sep 17 00:00:00 2001
From: ishani07
Date: Tue, 30 Apr 2024 11:26:22 -0700
Subject: [PATCH] note 11 fix
---
.../loss_transformations.qmd | 10 +-
.../loss_transformations.html | 129 +++++++++---------
.../figure-pdf/cell-10-output-2.pdf | Bin 0 -> 103496 bytes
.../figure-pdf/cell-12-output-1.pdf | Bin 0 -> 11239 bytes
.../figure-pdf/cell-13-output-1.pdf | Bin 0 -> 9752 bytes
.../figure-pdf/cell-18-output-2.pdf | Bin 0 -> 9193 bytes
.../figure-pdf/cell-19-output-1.pdf | Bin 0 -> 15000 bytes
.../figure-pdf/cell-20-output-2.pdf | Bin 0 -> 8394 bytes
.../figure-pdf/cell-5-output-1.pdf | Bin 0 -> 14938 bytes
.../figure-pdf/cell-7-output-1.pdf | Bin 0 -> 16000 bytes
.../figure-pdf/cell-9-output-1.pdf | Bin 0 -> 11041 bytes
11 files changed, 65 insertions(+), 74 deletions(-)
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-10-output-2.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-12-output-1.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-13-output-1.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-18-output-2.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-19-output-1.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-20-output-2.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-5-output-1.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-7-output-1.pdf
create mode 100644 docs/constant_model_loss_transformations/loss_transformations_files/figure-pdf/cell-9-output-1.pdf
diff --git a/constant_model_loss_transformations/loss_transformations.qmd b/constant_model_loss_transformations/loss_transformations.qmd
index f8654141..0ab34966 100644
--- a/constant_model_loss_transformations/loss_transformations.qmd
+++ b/constant_model_loss_transformations/loss_transformations.qmd
@@ -735,12 +735,10 @@ $$\hat{z} = \theta_0 + \theta_1 x$$
It turns out that this linearized relationship can help us understand the underlying relationship between $x$ and $y$. If we rearrange the relationship above, we find:
-$$
-\log{(y)} = \theta_0 + \theta_1 x \\
-y = e^{\theta_0 + \theta_1 x} \\
-y = (e^{\theta_0})e^{\theta_1 x} \\
-y_i = C e^{k x}
-$$
+$$\log{(y)} = \theta_0 + \theta_1 x$$
+$$y = e^{\theta_0 + \theta_1 x}$$
+$$y = (e^{\theta_0})e^{\theta_1 x}$$
+$$y_i = C e^{k x}$$
For some constants $C$ and $k$.
diff --git a/docs/constant_model_loss_transformations/loss_transformations.html b/docs/constant_model_loss_transformations/loss_transformations.html
index 96fbd15f..9c11bf05 100644
--- a/docs/constant_model_loss_transformations/loss_transformations.html
+++ b/docs/constant_model_loss_transformations/loss_transformations.html
@@ -1165,12 +1165,7 @@
, rather than the untransformed "Age". In other words, we are applying the transformation \(z_i = \log{(y_i)}\). Notice that the resulting model is still linear in the parameters\(\theta = [\theta_0, \theta_1]\). The SLR model becomes:
It turns out that this linearized relationship can help us understand the underlying relationship between \(x\) and \(y\). If we rearrange the relationship above, we find:
\(y\) is an exponential function of \(x\). Applying an exponential fit to the untransformed variables corroborates this finding.
@@ -2277,88 +2272,86 @@
It turns out that this linearized relationship can help us understand the underlying relationship between $x$ and $y$. If we rearrange the relationship above, we find:
-$$
-\log{(y)} = \theta_0 + \theta_1 x \\
-y = e^{\theta_0 + \theta_1 x} \\
-y = (e^{\theta_0})e^{\theta_1 x} \\
-y_i = C e^{k x}
-$$
+$$\log{(y)} = \theta_0 + \theta_1 x$$
+$$y = e^{\theta_0 + \theta_1 x}$$
+$$y = (e^{\theta_0})e^{\theta_1 x}$$
+$$y_i = C e^{k x}$$
+
+For some constants $C$ and $k$.
-For some constants $C$ and $k$.
+$y$ is an *exponential* function of $x$. Applying an exponential fit to the untransformed variables corroborates this finding.
-$y$ is an *exponential* function of $x$. Applying an exponential fit to the untransformed variables corroborates this finding.
-
-```{python}
-#| code-fold: true
-#| vscode: {languageId: python}
-plt.figure(dpi=120, figsize=(4, 3))
-
-plt.scatter(x, y)
-plt.plot(x, np.exp(theta_0) * np.exp(theta_1 * x), "tab:red")
-plt.xlabel("Length")
-plt.ylabel("Age")
-```
+```{python}
+#| code-fold: true
+#| vscode: {languageId: python}
+plt.figure(dpi=120, figsize=(4, 3))
+
+plt.scatter(x, y)
+plt.plot(x, np.exp(theta_0) * np.exp(theta_1 * x), "tab:red")
+plt.xlabel("Length")
+plt.ylabel("Age")
+```
+
+You may wonder: why did we choose to apply a log transformation specifically? Why not some other function to linearize the data?
-You may wonder: why did we choose to apply a log transformation specifically? Why not some other function to linearize the data?
+Practically, many other mathematical operations that modify the relative scales of `"Age"` and `"Length"` could have worked here.
-Practically, many other mathematical operations that modify the relative scales of `"Age"` and `"Length"` could have worked here.
+## Multiple Linear Regression
-## Multiple Linear Regression
+Multiple linear regression is an extension of simple linear regression that adds additional features to the model. The multiple linear regression model takes the form:
-Multiple linear regression is an extension of simple linear regression that adds additional features to the model. The multiple linear regression model takes the form:
+$$\hat{y} = \theta_0\:+\:\theta_1x_{1}\:+\:\theta_2 x_{2}\:+\:...\:+\:\theta_p x_{p}$$
-$$\hat{y} = \theta_0\:+\:\theta_1x_{1}\:+\:\theta_2 x_{2}\:+\:...\:+\:\theta_p x_{p}$$
+Our predicted value of $y$, $\hat{y}$, is a linear combination of the single **observations** (features), $x_i$, and the parameters, $\theta_i$.
-Our predicted value of $y$, $\hat{y}$, is a linear combination of the single **observations** (features), $x_i$, and the parameters, $\theta_i$.
+We'll dive deeper into Multiple Linear Regression in the next lecture.
-We'll dive deeper into Multiple Linear Regression in the next lecture.
+## Bonus: Calculating Constant Model MSE Using an Algebraic Trick
-## Bonus: Calculating Constant Model MSE Using an Algebraic Trick
+Earlier, we calculated the constant model MSE using calculus. It turns out that there is a much more elegant way of performing this same minimization algebraically, without using calculus at all.
-Earlier, we calculated the constant model MSE using calculus. It turns out that there is a much more elegant way of performing this same minimization algebraically, without using calculus at all.
+In this calculation, we use the fact that the **sum of deviations from the mean is $0$** or that $\sum_{i=1}^{n} (y_i - \bar{y}) = 0$.
-In this calculation, we use the fact that the **sum of deviations from the mean is $0$** or that $\sum_{i=1}^{n} (y_i - \bar{y}) = 0$.
-
-Let's quickly walk through the proof for this:
-$$
-\begin{align}
-\sum_{i=1}^{n} (y_i - \bar{y}) &= \sum_{i=1}^{n} y_i - \sum_{i=1}^{n} \bar{y} \\
- &= \sum_{i=1}^{n} y_i - n\bar{y} \\
- &= \sum_{i=1}^{n} y_i - n\frac{1}{n}\sum_{i=1}^{n}y_i \\
- &= \sum_{i=1}^{n} y_i - \sum_{i=1}^{n}y_i \\
- & = 0
-\end{align}
-$$
+Let's quickly walk through the proof for this:
+$$
+\begin{align}
+\sum_{i=1}^{n} (y_i - \bar{y}) &= \sum_{i=1}^{n} y_i - \sum_{i=1}^{n} \bar{y} \\
+ &= \sum_{i=1}^{n} y_i - n\bar{y} \\
+ &= \sum_{i=1}^{n} y_i - n\frac{1}{n}\sum_{i=1}^{n}y_i \\
+ &= \sum_{i=1}^{n} y_i - \sum_{i=1}^{n}y_i \\
+ & = 0
+\end{align}
+$$
+
+In our calculations, we'll also be using the definition of the variance as a sample. As a refresher:
-In our calculations, we'll also be using the definition of the variance as a sample. As a refresher:
+$$\sigma_y^2 = \frac{1}{n}\sum_{i=1}^{n} (y_i - \bar{y})^2$$
-$$\sigma_y^2 = \frac{1}{n}\sum_{i=1}^{n} (y_i - \bar{y})^2$$
+Getting into our calculation for MSE minimization:
-Getting into our calculation for MSE minimization:
-
-$$
-\begin{align}
-R(\theta) &= {\frac{1}{n}}\sum^{n}_{i=1} (y_i - \theta)^2
-\\ &= \frac{1}{n}\sum^{n}_{i=1} [(y_i - \bar{y}) + (\bar{y} - \theta)]^2\quad \quad \text{using trick that a-b can be written as (a-c) + (c-b) } \\
-&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \space \space \text{where a, b, and c are any numbers}
-\\ &= \frac{1}{n}\sum^{n}_{i=1} [(y_i - \bar{y})^2 + 2(y_i - \bar{y})(\bar{y} - \theta) + (\bar{y} - \theta)^2]
-\\ &= \frac{1}{n}[\sum^{n}_{i=1}(y_i - \bar{y})^2 + 2(\bar{y} - \theta)\sum^{n}_{i=1}(y_i - \bar{y}) + n(\bar{y} - \theta)^2] \quad \quad \text{distribute sum to individual terms}
-\\ &= \frac{1}{n}\sum^{n}_{i=1}(y_i - \bar{y})^2 + \frac{2}{n}(\bar{y} - \theta)\cdot0 + (\bar{y} - \theta)^2 \quad \quad \text{sum of deviations from mean is 0}
-\\ &= \sigma_y^2 + (\bar{y} - \theta)^2
-\end{align}
-$$
+$$
+\begin{align}
+R(\theta) &= {\frac{1}{n}}\sum^{n}_{i=1} (y_i - \theta)^2
+\\ &= \frac{1}{n}\sum^{n}_{i=1} [(y_i - \bar{y}) + (\bar{y} - \theta)]^2\quad \quad \text{using trick that a-b can be written as (a-c) + (c-b) } \\
+&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \space \space \text{where a, b, and c are any numbers}
+\\ &= \frac{1}{n}\sum^{n}_{i=1} [(y_i - \bar{y})^2 + 2(y_i - \bar{y})(\bar{y} - \theta) + (\bar{y} - \theta)^2]
+\\ &= \frac{1}{n}[\sum^{n}_{i=1}(y_i - \bar{y})^2 + 2(\bar{y} - \theta)\sum^{n}_{i=1}(y_i - \bar{y}) + n(\bar{y} - \theta)^2] \quad \quad \text{distribute sum to individual terms}
+\\ &= \frac{1}{n}\sum^{n}_{i=1}(y_i - \bar{y})^2 + \frac{2}{n}(\bar{y} - \theta)\cdot0 + (\bar{y} - \theta)^2 \quad \quad \text{sum of deviations from mean is 0}
+\\ &= \sigma_y^2 + (\bar{y} - \theta)^2
+\end{align}
+$$
+
+Since variance can't be negative, we know that our first term, $\sigma_y^2$ is greater than or equal to $0$. Also note, that **the first term doesn't involve $\theta$ at all**, meaning changing our model won't change this value. For the purposes of determining $\hat{\theta}#, we can then essentially ignore this term.
-Since variance can't be negative, we know that our first term, $\sigma_y^2$ is greater than or equal to $0$. Also note, that **the first term doesn't involve $\theta$ at all**, meaning changing our model won't change this value. For the purposes of determining $\hat{\theta}#, we can then essentially ignore this term.
+Looking at the second term, $(\bar{y} - \theta)^2$, since it is squared, we know it must be greater than or equal to $0$. As this term does involve $\theta$, picking the value of $\theta$ that minimizes this term will allow us to minimize our average loss. For the second term to equal $0$, $\theta = \bar{y}$, or in other words, $\hat{\theta} = \bar{y} = mean(y)$.
-Looking at the second term, $(\bar{y} - \theta)^2$, since it is squared, we know it must be greater than or equal to $0$. As this term does involve $\theta$, picking the value of $\theta$ that minimizes this term will allow us to minimize our average loss. For the second term to equal $0$, $\theta = \bar{y}$, or in other words, $\hat{\theta} = \bar{y} = mean(y)$.
+##### Note
-##### Note
+In the derivation above, we decompose the expected loss, $R(\theta)$, into two key components: the variance of the data, $\sigma_y^2$, and the square of the bias, $(\bar{y} - \theta)^2$. This decomposition is insightful for understanding the behavior of estimators in statistical models.
-In the derivation above, we decompose the expected loss, $R(\theta)$, into two key components: the variance of the data, $\sigma_y^2$, and the square of the bias, $(\bar{y} - \theta)^2$. This decomposition is insightful for understanding the behavior of estimators in statistical models.
+- **Variance, $\sigma_y^2$**: This term represents the spread of the data points around their mean, $\bar{y}$, and is a measure of the data's inherent variability. Importantly, it does not depend on the choice of $\theta$, meaning it's a fixed property of the data. Variance serves as an indicator of the data's dispersion and is crucial in understanding the dataset's structure, but it remains constant regardless of how we adjust our model parameter $\theta$.
-- **Variance, $\sigma_y^2$**: This term represents the spread of the data points around their mean, $\bar{y}$, and is a measure of the data's inherent variability. Importantly, it does not depend on the choice of $\theta$, meaning it's a fixed property of the data. Variance serves as an indicator of the data's dispersion and is crucial in understanding the dataset's structure, but it remains constant regardless of how we adjust our model parameter $\theta$.
-
-- **Bias Squared, $(\bar{y} - \theta)^2$**: This term captures the bias of the estimator, defined as the square of the difference between the mean of the data points, $\bar{y}$, and the parameter $\theta$. The bias quantifies the systematic error introduced when estimating $\theta$. Minimizing this term is essential for improving the accuracy of the estimator. When $\theta = \bar{y}$, the bias is $0$, indicating that the estimator is unbiased for the parameter it estimates. This highlights a critical principle in statistical estimation: choosing $\theta$ to be the sample mean, $\bar{y}$, minimizes the average loss, rendering the estimator both efficient and unbiased for the population mean.
+- **Bias Squared, $(\bar{y} - \theta)^2$**: This term captures the bias of the estimator, defined as the square of the difference between the mean of the data points, $\bar{y}$, and the parameter $\theta$. The bias quantifies the systematic error introduced when estimating $\theta$. Minimizing this term is essential for improving the accuracy of the estimator. When $\theta = \bar{y}$, the bias is $0$, indicating that the estimator is unbiased for the parameter it estimates. This highlights a critical principle in statistical estimation: choosing $\theta$ to be the sample mean, $\bar{y}$, minimizes the average loss, rendering the estimator both efficient and unbiased for the population mean.