Skip to content

Commit

Permalink
minor note 11 fix
Browse files Browse the repository at this point in the history
  • Loading branch information
nsreddy16 committed Oct 20, 2024
1 parent 0f55e1e commit 3ca65a3
Show file tree
Hide file tree
Showing 85 changed files with 899 additions and 927 deletions.
18 changes: 2 additions & 16 deletions constant_model_loss_transformations/loss_transformations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jupyter:
format_version: '1.0'
jupytext_version: 1.16.1
kernelspec:
display_name: Python 3 (ipykernel)
display_name: ds100env
language: python
name: python3
---
Expand Down Expand Up @@ -116,7 +116,7 @@ $$
\begin{align}
0 &= {\frac{-2}{n}}\sum^{n}_{i=1} (y_i - \hat{\theta_0})
\\ &= \sum^{n}_{i=1} (y_i - \hat{\theta_0}) \quad \quad \text{divide both sides by} \frac{-2}{n}
\\ &= \left(\sum^{n}_{i=1} y_i\right) - \left(\sum^{n}_{i=1} \theta_0\right) \quad \quad \text{separate sums}
\\ &= \left(\sum^{n}_{i=1} y_i\right) - \left(\sum^{n}_{i=1} \hat{\theta_0}\right) \quad \quad \text{separate sums}
\\ &= \left(\sum^{n}_{i=1} y_i\right) - (n \cdot \hat{\theta_0}) \quad \quad \text{c + c + … + c = nc}
\\ n \cdot \hat{\theta_0} &= \sum^{n}_{i=1} y_i
\\ \hat{\theta_0} &= \frac{1}{n} \sum^{n}_{i=1} y_i
Expand Down Expand Up @@ -159,7 +159,6 @@ The code for generating the graphs and models is included below, but we won't go

```{python}
#| code-fold: true
#| vscode: {languageId: python}
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Expand All @@ -174,7 +173,6 @@ data_linear = dugongs[["Length", "Age"]]

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# Big font helper
def adjust_fontsize(size=None):
SMALL_SIZE = 8
Expand All @@ -196,7 +194,6 @@ plt.style.use("default") # Revert style to default mpl

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# Constant Model + MSE
plt.style.use('default') # Revert style to default mpl
adjust_fontsize(size=16)
Expand All @@ -222,7 +219,6 @@ plt.legend();

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# SLR + MSE
def mse_linear(theta_0, theta_1, data_linear):
data_x, data_y = data_linear.iloc[:, 0], data_linear.iloc[:, 1]
Expand Down Expand Up @@ -278,7 +274,6 @@ ax.set_zlabel("MSE");

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# Predictions
yobs = data_linear["Age"] # The true observations y
xs = data_linear["Length"] # Needed for linear predictions
Expand All @@ -290,7 +285,6 @@ yhats_linear = [theta_0_hat + theta_1_hat * x for x in xs]

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# Constant Model Rug Plot
# In case we're in a weird style state
sns.set_theme()
Expand All @@ -307,7 +301,6 @@ plt.yticks([]);

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# SLR model scatter plot
# In case we're in a weird style state
sns.set_theme()
Expand Down Expand Up @@ -421,7 +414,6 @@ Let's consider a dataset where each entry represents the number of drinks sold a

```{python}
#| code-fold: false
#| vscode: {languageId: python}
drinks = np.array([20, 21, 22, 29, 33])
drinks
```
Expand All @@ -430,7 +422,6 @@ From our derivations above, we know that the optimal model parameter under MSE c

```{python}
#| code-fold: false
#| vscode: {languageId: python}
np.mean(drinks), np.median(drinks)
```

Expand All @@ -444,7 +435,6 @@ How do outliers affect each cost function? Imagine we replace the largest value

```{python}
#| code-fold: false
#| vscode: {languageId: python}
drinks_with_outlier = np.append(drinks, 1033)
display(drinks_with_outlier)
np.mean(drinks_with_outlier), np.median(drinks_with_outlier)
Expand All @@ -458,7 +448,6 @@ Let's try another experiment. This time, we'll add an additional, non-outlying d

```{python}
#| code-fold: false
#| vscode: {languageId: python}
drinks_with_additional_observation = np.append(drinks, 35)
drinks_with_additional_observation
```
Expand Down Expand Up @@ -502,7 +491,6 @@ Let's revisit our dugongs example. The lengths and ages are plotted below:

```{python}
#| code-fold: true
#| vscode: {languageId: python}
# `corrcoef` computes the correlation coefficient between two variables
# `std` finds the standard deviation
x = dugongs["Length"]
Expand Down Expand Up @@ -530,7 +518,6 @@ An important word on $\log$: in Data 100 (and most upper-division STEM courses),

```{python}
#| code-fold: true
#| vscode: {languageId: python}
z = np.log(y)
r = np.corrcoef(x, z)[0, 1]
Expand Down Expand Up @@ -568,7 +555,6 @@ $y$ is an *exponential* function of $x$. Applying an exponential fit to the untr

```{python}
#| code-fold: true
#| vscode: {languageId: python}
plt.figure(dpi=120, figsize=(4, 3))
plt.scatter(x, y)
Expand Down
902 changes: 444 additions & 458 deletions docs/constant_model_loss_transformations/loss_transformations.html

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
156 changes: 78 additions & 78 deletions docs/eda/eda.html

Large diffs are not rendered by default.

Binary file modified docs/eda/eda_files/figure-pdf/cell-62-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-67-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-68-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-69-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-71-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-75-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-76-output-1.pdf
Binary file not shown.
Binary file modified docs/eda/eda_files/figure-pdf/cell-77-output-1.pdf
Binary file not shown.
24 changes: 12 additions & 12 deletions docs/feature_engineering/feature_engineering.html

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
48 changes: 24 additions & 24 deletions docs/gradient_descent/gradient_descent.html

Large diffs are not rendered by default.

Binary file not shown.
16 changes: 8 additions & 8 deletions docs/intro_to_modeling/intro_to_modeling.html
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ <h2 data-number="10.2" class="anchored" data-anchor-id="simple-linear-regression
<li><span class="math inline">\(\text{regression estimate} = y\text{-intercept} + \text{slope}\cdot\text{}x\)</span></li>
<li><span class="math inline">\(\text{residual} =\text{observed }y - \text{regression estimate}\)</span></li>
</ul>
<div id="1ab7ac5a" class="cell" data-execution_count="1">
<div id="d6f418ab" class="cell" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
Expand Down Expand Up @@ -465,7 +465,7 @@ <h4 data-number="10.2.1.2" class="anchored" data-anchor-id="correlation"><span c
<li>Correlations range between -1 and 1: <span class="math inline">\(|r| \leq 1\)</span>, with <span class="math inline">\(r=1\)</span> indicating perfect positive linear association, and <span class="math inline">\(r=-1\)</span> indicating perfect negative association. The closer <span class="math inline">\(r\)</span> is to <span class="math inline">\(0\)</span>, the weaker the linear association is.</li>
<li>Correlation says nothing about causation and non-linear association. Correlation does <strong>not</strong> imply causation. When <span class="math inline">\(r = 0\)</span>, the two variables are uncorrelated. However, they could still be related through some non-linear relationship.</li>
</ol>
<div id="638e733e" class="cell" data-execution_count="2">
<div id="456e69eb" class="cell" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> plot_and_get_corr(ax, x, y, title):</span>
Expand Down Expand Up @@ -689,7 +689,7 @@ <h2 data-number="10.7" class="anchored" data-anchor-id="evaluating-the-slr-model
<section id="four-mysterious-datasets-anscombes-quartet" class="level3" data-number="10.7.1">
<h3 data-number="10.7.1" class="anchored" data-anchor-id="four-mysterious-datasets-anscombes-quartet"><span class="header-section-number">10.7.1</span> Four Mysterious Datasets (Anscombe’s quartet)</h3>
<p>Let’s take a look at four different datasets.</p>
<div id="f06c2852" class="cell" data-execution_count="3">
<div id="c8a8a30a" class="cell" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
Expand All @@ -701,7 +701,7 @@ <h3 data-number="10.7.1" class="anchored" data-anchor-id="four-mysterious-datase
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> mpl_toolkits.mplot3d <span class="im">import</span> Axes3D</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<div id="0a263548" class="cell" data-execution_count="4">
<div id="58981354" class="cell" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Big font helper</span></span>
Expand Down Expand Up @@ -755,7 +755,7 @@ <h3 data-number="10.7.1" class="anchored" data-anchor-id="four-mysterious-datase
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a>plt.style.use(<span class="st">"default"</span>) <span class="co"># Revert style to default mpl</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<div id="7bec2a87" class="cell" data-execution_count="5">
<div id="f0c62211" class="cell" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>plt.style.use(<span class="st">"default"</span>) <span class="co"># Revert style to default mpl</span></span>
Expand Down Expand Up @@ -794,7 +794,7 @@ <h3 data-number="10.7.1" class="anchored" data-anchor-id="four-mysterious-datase
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> fig</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<div id="c6b48559" class="cell" data-execution_count="6">
<div id="af3f30cd" class="cell" data-execution_count="6">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load in four different datasets: I, II, III, IV</span></span>
Expand Down Expand Up @@ -837,7 +837,7 @@ <h3 data-number="10.7.1" class="anchored" data-anchor-id="four-mysterious-datase
</div>
</div>
<p>While these four sets of datapoints look very different, they actually all have identical means <span class="math inline">\(\bar x\)</span>, <span class="math inline">\(\bar y\)</span>, standard deviations <span class="math inline">\(\sigma_x\)</span>, <span class="math inline">\(\sigma_y\)</span>, correlation <span class="math inline">\(r\)</span>, and RMSE! If we only look at these statistics, we would probably be inclined to say that these datasets are similar.</p>
<div id="2fbfddc5" class="cell" data-execution_count="7">
<div id="d764b493" class="cell" data-execution_count="7">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> dataset <span class="kw">in</span> [<span class="st">"I"</span>, <span class="st">"II"</span>, <span class="st">"III"</span>, <span class="st">"IV"</span>]:</span>
Expand Down Expand Up @@ -884,7 +884,7 @@ <h3 data-number="10.7.1" class="anchored" data-anchor-id="four-mysterious-datase
</div>
<p>We may also wish to visualize the model’s <strong>residuals</strong>, defined as the difference between the observed and predicted <span class="math inline">\(y_i\)</span> value (<span class="math inline">\(e_i = y_i - \hat{y}_i\)</span>). This gives a high-level view of how “off” each prediction is from the true observed value. Recall that you explored this concept in <a href="https://inferentialthinking.com/chapters/15/5/Visual_Diagnostics.html?highlight=heteroscedasticity#detecting-heteroscedasticity">Data 8</a>: a good regression fit should display no clear pattern in its plot of residuals. The residual plots for Anscombe’s quartet are displayed below. Note how only the first plot shows no clear pattern to the magnitude of residuals. This is an indication that SLR is not the best choice of model for the remaining three sets of points.</p>
<!-- <img src="images/residual.png" alt='residual' width='600'> -->
<div id="fd64711b" class="cell" data-execution_count="8">
<div id="1d07f513" class="cell" data-execution_count="8">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Residual visualization</span></span>
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
6 changes: 3 additions & 3 deletions docs/ols/ols.html
Original file line number Diff line number Diff line change
Expand Up @@ -356,15 +356,15 @@ <h3 data-number="12.1.1" class="anchored" data-anchor-id="multiple-linear-regres
<p><span class="math display">\[\hat{y} = \theta_0\:+\:\theta_1x_{1}\:+\:\theta_2 x_{2}\:+\:...\:+\:\theta_p x_{p}\]</span></p>
<p>Our predicted value of <span class="math inline">\(y\)</span>, <span class="math inline">\(\hat{y}\)</span>, is a linear combination of the single <strong>observations</strong> (features), <span class="math inline">\(x_i\)</span>, and the parameters, <span class="math inline">\(\theta_i\)</span>.</p>
<p>We can explore this idea further by looking at a dataset containing aggregate per-player data from the 2018-19 NBA season, downloaded from <a href="https://www.kaggle.com/schmadam97/nba-regular-season-stats-20182019">Kaggle</a>.</p>
<div id="9de286c9" class="cell" data-execution_count="1">
<div id="ad9425e6" class="cell" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>nba <span class="op">=</span> pd.read_csv(<span class="st">'data/nba18-19.csv'</span>, index_col<span class="op">=</span><span class="dv">0</span>)</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>nba.index.name <span class="op">=</span> <span class="va">None</span> <span class="co"># Drops name of index (players are ordered by rank)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<div id="6945b254" class="cell" data-execution_count="2">
<div id="cc375914" class="cell" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>nba.head(<span class="dv">5</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down Expand Up @@ -535,7 +535,7 @@ <h3 data-number="12.1.1" class="anchored" data-anchor-id="multiple-linear-regres
<li><code>AST</code>, the average number of assists per game</li>
<li><code>3PA</code>, the average number of 3-point field goals attempted per game</li>
</ul>
<div id="e39b5cde" class="cell" data-execution_count="3">
<div id="03f081f7" class="cell" data-execution_count="3">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>nba[[<span class="st">'FG'</span>, <span class="st">'AST'</span>, <span class="st">'3PA'</span>, <span class="st">'PTS'</span>]].head()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down
Loading

0 comments on commit 3ca65a3

Please sign in to comment.