probabiliy1 first draft

DS-100 · Oct 23, 2023 · 33ad5a9 · 33ad5a9
1 parent 30c506c
commit 33ad5a9
Show file tree

Hide file tree

Showing 4 changed files with 472 additions and 267 deletions.
diff --git a/probability_1/images/SD_change.png b/probability_1/images/SD_change.png
diff --git a/probability_1/probability_1.ipynb b/probability_1/probability_1.ipynb
@@ -199,34 +199,35 @@
         "$$\\mathbb{E}[X^2] = \\sum_{x} x^2 P(X = x)$$ \n",
         "\n",
         "### Example: Dice\n",
-        "Let $X$ be the outcome of a single fair dice roll. $X$ is a random variable defined as $$X = \\begin{cases} \n",
+        "Let $X$ be the outcome of a single fair dice roll. $X$ is a random variable defined as \n",
+        "$$X = \\begin{cases} \n",
         "      \\frac{1}{6}, \\text{if } x \\in \\{1,2,3,4,5,6\\} \\\\\n",
         "      0, \\text{otherwise} \n",
         "   \\end{cases}$$\n",
         "\n",
-        "<details>\n",
-        "    <summary>What's the expectation $\\mathbb{E}[X]$</summary>\n",
-        "    \n",
-        "    $$ \\begin{align} \n",
+        "::: {.callout-caution collapse=\"true\"}\n",
+        "## What's the expectation $\\mathbb{E}[X]?$\n",
+        "\n",
+        "$$ \\begin{align} \n",
         "         \\mathbb{E}[X] &= 1(\\frac{1}{6}) + 2(\\frac{1}{6}) + 3(\\frac{1}{6}) + 4(\\frac{1}{6}) + 5(\\frac{1}{6}) + 6(\\frac{1}{6}) \\\\\n",
-        "         &= (\\frac{1}{6}) ( 1 + 2 + 3 + 4 + 5 + 6)\n",
+        "         &= (\\frac{1}{6}) ( 1 + 2 + 3 + 4 + 5 + 6) \\\\\n",
         "         &= \\frac{7}{2}\n",
         "      \\end{align}$$\n",
-        "</details>\n",
+        ":::\n",
         "\n",
-        "<details>\n",
-        "    <summary>What's the variance $\\text{Var}(X)$</summary>\n",
-        "    \n",
-        "    Using approach 1: \n",
-        "    $$\\begin{align} \n",
-        "         \\text{Var}(X) &= (\\frac{1}{6})((1 - \\frac{7}{2})^2 + (2 - \\frac{7}{2})^2 + (3 - \\frac{7}{2})^2 + (4 - \\frac{7}{2})^2 + (5 - \\frac{7}{2})^2 + (6 - \\frac{7}{2})^2) \\\\\n",
-        "         &= \\frac{35}{12}\n",
-        "      \\end{align}$$\n",
+        "::: {.callout-caution collapse=\"true\"}\n",
+        "## What's the variance $\\text{Var}(X)?$\n",
+        "\n",
+        "Using approach 1: \n",
+        "   $$\\begin{align} \n",
+        "      \\text{Var}(X) &= (\\frac{1}{6})((1 - \\frac{7}{2})^2 + (2 - \\frac{7}{2})^2 + (3 - \\frac{7}{2})^2 + (4 - \\frac{7}{2})^2 + (5 - \\frac{7}{2})^2 + (6 - \\frac{7}{2})^2) \\\\\n",
+        "      &= \\frac{35}{12}\n",
+        "   \\end{align}$$\n",
         "\n",
-        "   Using approach 2: \n",
-        "   $$\\mathbb{E}[X^2] = \\sum_{x} x^2 P(X = x) = \\frac{91}{6}$$\n",
-        "   $$\\text{Var}(X) = \\frac{91}{6} - (\\frac{7}{2})^2 = \\frac{35}{12}$$\n",
-        "</details>\n"
+        "Using approach 2: \n",
+        "$$\\mathbb{E}[X^2] = \\sum_{x} x^2 P(X = x) = \\frac{91}{6}$$\n",
+        "$$\\text{Var}(X) = \\frac{91}{6} - (\\frac{7}{2})^2 = \\frac{35}{12}$$\n",
+        ":::"
       ]
     },
     {
@@ -257,7 +258,7 @@
         "<p align=\"center\">\n",
         "<img src=\"images/yz_distribution.png\" alt='distribution' width='400'>\n",
         "</p>\n",
-        "However, $Y=@X_1$ has a larger variance\n",
+        "However, $Y = X_1$ has a larger variance\n",
         "<p align=\"center\">\n",
         "<img src=\"images/yz.png\" alt='distribution' width='400'>\n",
         "</p>\n",
@@ -269,76 +270,86 @@
         "\n",
         "$$\\mathbb{E}[aX+b] = aE[\\mathbb{X}] + b$$\n",
         "\n",
-        "<details>\n",
-        "    <summary>Proof (toggle this cell)</summary>\n",
-        "    \n",
-        "    $$\\begin{align}\n",
-        "         \\mathbb{E}[aX+b] &= \\sum_{x} (ax + b) P(X=x) \\\\\n",
-        "         &= \\sum_{x} (ax P(X=x) + bP(X=x)) \\\\\n",
-        "         &= a\\sum_{x}P(X=x) + b\\sum_{x}P(X=x)\\\\\n",
-        "         &= a\\mathbb{E}(X) = b * 1\n",
-        "      \\end{align}$$\n",
-        "</details>\n",
+        "::: {.callout-tip collapse=\"true\"}\n",
+        "## Proof\n",
+        "$$\\begin{align}\n",
+        "        \\mathbb{E}[aX+b] &= \\sum_{x} (ax + b) P(X=x) \\\\\n",
+        "        &= \\sum_{x} (ax P(X=x) + bP(X=x)) \\\\\n",
+        "        &= a\\sum_{x}P(X=x) + b\\sum_{x}P(X=x)\\\\\n",
+        "        &= a\\mathbb{E}(X) = b * 1\n",
+        "    \\end{align}$$\n",
+        ":::\n",
         "\n",
         "2. Expectation is also linear in *sums* of random variables. \n",
         "\n",
         "$$\\mathbb{E}[X+Y] = \\mathbb{E}[X] + \\mathbb{E}[Y]$$\n",
         "\n",
-        "<details>\n",
-        "    <summary>Proof (toggle this cell)</summary>\n",
-        "    \n",
-        "    $$\\begin{align}\n",
-        "         \\mathbb{E}[X+Y] &= \\sum_{s} (X+Y)(s) P(s) \\\\\n",
-        "         &= \\sum_{s} (X(s)P(s) + Y(s)P(s)) \\\\\n",
-        "         &= \\sum_{s} X(s)P(s) + \\sum_{s} Y(s)P(s)\\\\\n",
-        "         &= \\mathbb{E}[X] + \\mathbb{E}[Y]\n",
-        "      \\end{align}$$\n",
-        "</details>\n",
+        "::: {.callout-tip collapse=\"true\"}\n",
+        "## Proof\n",
+        "$$\\begin{align}\n",
+        "    \\mathbb{E}[X+Y] &= \\sum_{s} (X+Y)(s) P(s) \\\\\n",
+        "    &= \\sum_{s} (X(s)P(s) + Y(s)P(s)) \\\\\n",
+        "    &= \\sum_{s} X(s)P(s) + \\sum_{s} Y(s)P(s)\\\\\n",
+        "    &= \\mathbb{E}[X] + \\mathbb{E}[Y]\n",
+        "\\end{align}$$\n",
+        ":::\n",
         "\n",
         "3. If $g$ is a non-linear function, then in general, \n",
         "$$\\mathbb{E}[g(X)] \\neq g(\\mathbb{E}[X])$$\n",
-        "For example, if $X$ is -1 or 1 with equal probability, then $\\mathbb{E}[X] = 0$ but $\\mathbb{E}[X^2] = 1 \\neq 0$\n",
+        "* For example, if $X$ is -1 or 1 with equal probability, then $\\mathbb{E}[X] = 0$ but $\\mathbb{E}[X^2] = 1 \\neq 0$\n",
         "\n",
         "### Properties of Variance\n",
         "Recall the definition of variance: \n",
         "$$\\text{Var}(X) = \\mathbb{E}[(X-\\mathbb{E}[X])^2]$$\n",
         "\n",
         "1. Unlike expectation, variance is *non-linear*. The variance of the linear transformation $aX+b$ is:\n",
         "$$\\text{Var}(aX+b) = a^2 \\text{Var}(X)$$\n",
-        "Subsequently, $$\\text{SD}(aX+b) = |a| \\text{SD}(X)$$\n",
         "\n",
-        "The full proof of this fact can be found using the definition of variance. As general intuition, consider that $aX+b$ scales the variable $X$ by a factor of $a$, then shifts the distribution of $X$ by $b$ units. \n",
-        "<details>\n",
-        "    <summary>Full Proof (toggle this cell)</summary>\n",
-        "    \n",
-        "    We know that $$\\mathbb{E}[aX+b] = aE[\\mathbb{X}] + b$$\n",
+        "* Subsequently, $$\\text{SD}(aX+b) = |a| \\text{SD}(X)$$\n",
+        "* The full proof of this fact can be found using the definition of variance. As general intuition, consider that $aX+b$ scales the variable $X$ by a factor of $a$, then shifts the distribution of $X$ by $b$ units. \n",
         "\n",
-        "    In order to compute $\\text{Var}(aX+b)$, consider that a shift by b units does not affect spread, so $\\text{Var}(aX+b) = \\text{Var}(aX)$\n",
+        "::: {.callout-tip collapse=\"true\"}\n",
+        "## Full Proof\n",
+        "We know that $$\\mathbb{E}[aX+b] = aE[\\mathbb{X}] + b$$\n",
         "\n",
-        "    Then, \n",
-        "    $$\\begin{align}\n",
-        "         \\text{Var}(aX+b) &= \\text{Var}(aX) \\\\\n",
-        "         &= E((aX)^2) - (E(aX))^2\n",
-        "         &= E(a^2 X^2) - (aE(X))^2\\\\\n",
-        "         &= a^2 (E(X^2) - (E(X))^2) \\\\\n",
-        "         &= a^2 \\text{Var}(X)\n",
-        "      \\end{align}$$\n",
-        "</details>\n",
+        "In order to compute $\\text{Var}(aX+b)$, consider that a shift by b units does not affect spread, so $\\text{Var}(aX+b) = \\text{Var}(aX)$\n",
+        "\n",
+        "Then, \n",
+        "$$\\begin{align}\n",
+        "    \\text{Var}(aX+b) &= \\text{Var}(aX) \\\\\n",
+        "    &= E((aX)^2) - (E(aX))^2\n",
+        "    &= E(a^2 X^2) - (aE(X))^2\\\\\n",
+        "    &= a^2 (E(X^2) - (E(X))^2) \\\\\n",
+        "    &= a^2 \\text{Var}(X)\n",
+        "\\end{align}$$\n",
+        ":::\n",
         "\n",
         "* Shifting the distribution by $b$ *does not* impact the *spread* of the distribution. Thus, $\\text{Var}(aX+b) = \\text{Var}(aX)$.\n",
         "* Scaling the distribution by $a$ *does* impact the spread of the distribution.\n",
         "\n",
+        "<p align=\"center\">\n",
+        "<img src=\"images/transformation.png\" alt='transformation' width='600'>\n",
+        "</p>\n",
+        "\n",
         "2. Variance of sums of RVs is affected by the (in)dependence of the RVs\n",
         "$$\\text{Var}(X + Y) = \\text{Var}(X) + \\text{Var}(Y) 2\\text{cov}(X,Y)$$\n",
         "$$\\text{Var}(X + Y) = \\text{Var}(X) + \\text{Var}(Y) \\qquad \\text{if } X, Y \\text{ independent}$$\n",
         "\n",
-        "<details>\n",
-        "    <summary>Derivation (toggle this cell)</summary>\n",
-        "    \n",
-        "    TODO \n",
-        "$$\\text{Var}(X + Y) = \\text{Var}(X) + \\text{Var}(Y) + 2\\mathbb{E}[(X-\\mathbb{E}[X])(Y-\\mathbb{E}[Y])]$$\n",
         "\n",
-        "</details>\n",
+        "::: {.callout-tip collapse=\"true\"}\n",
+        "## Proof\n",
+        "The variance of a sum is affected by the dependence between the two random variables that are being added. Let’s expand out the definition of $\\text{Var}(X + Y)$ to see what’s going on.\n",
+        "\n",
+        "To simplify the math, let $\\mu_x = \\mathbb{E}[X]$ and $\\mu_y = \\mathbb{E}[Y]$\n",
+        "\n",
+        "$$ \\begin{align}\n",
+        "\\text{Var}(X + Y) &= \\mathbb{E}[(X+Y- \\mathbb{E}(X+Y))^2] \\\\\n",
+        "&= \\mathbb{E}[((X - \\mu_x) + (Y - \\mu_y))^2] \\\\\n",
+        "&= \\mathbb{E}[(X - \\mu_x)^2 + 2(X - \\mu_x)(Y - \\mu_y) + (Y - \\mu_y)^2] \\\\\n",
+        "&= \\mathbb{E}[(X - \\mu_x)^2] + \\mathbb{E}[(Y - \\mu_y)^2] + \\mathbb{E}[(X - \\mu_x)(Y - \\mu_y)] \\\\\n",
+        "&= \\text{Var}(X) + \\text{Var}(Y) + \\mathbb{E}[(X - \\mu_x)(Y - \\mu_y)] \n",
+        "\\end{align}$$\n",
+        ":::\n",
         "\n",
         "### Covariance and Correlation\n",
         "We define the **covariance** of two random variables as the expected product of deviations from expectation. Put more simply, covariance is a generalization of variance to *two* random variables: $\\text{Cov}(X, X) = \\mathbb{E}[(X - \\mathbb{E}[X])^2] = \\text{Var}(X)$.\n",
@@ -384,7 +395,8 @@
         "        * $X_i$ s the indicator of success on trial i. $X_i = 1$ if trial i is a success, else 0.\n",
         "        * all $X_i$ are i.i.d. and Bernoulli(p)\n",
         "    * $\\mathbb{E}[Y] = \\sum_{i=1}^n \\mathbb{E}[X_i] = np$\n",
-        "    * $\\text{Var}(X) = \\sum_{i=1}^n \\text{Var}(X_i) = np(1-p)$ because $X_i$'s are independent, so $\\text{Cov}(X_i, X_j) = 0$ for all i, j.\n",
+        "    * $\\text{Var}(X) = \\sum_{i=1}^n \\text{Var}(X_i) = np(1-p)$ \n",
+        "      * $X_i$'s are independent, so $\\text{Cov}(X_i, X_j) = 0$ for all i, j.\n",
         "* Uniform on a finite set of values\n",
         "  * Probability of each value is 1 / (number of possible values).\n",
         "  * For example, a standard/fair die.\n",
@@ -399,7 +411,54 @@
       "metadata": {},
       "source": [
         "## Populations and Samples \n",
-        "<img src=\"images/transformation.png\" alt='transformation' width='600'>\n"
+        "Today, we've talked extensively about populations; if we know the distribution of a random variable, we can reliably compute expectation, variance, functions of the random variable, etc. \n",
+        "\n",
+        "In Data Science, however, we often do not have access to the whole population, so we don’t know its distribution. As such, we need to collect a sample and use its distribution to estimate or infer properties of the population. \n",
+        "\n",
+        "When sampling, we make the (big) assumption that we sample uniformly at random with replacement from the population; each observation in our sample is a random variable drawn i.i.d from our population distribution. \n",
+        "\n",
+        "### Sample Mean \n",
+        "Consider an i.i.d. sample $X_1, X_2, ..., X_n$ drawn from a population with mean 𝜇 and SD 𝜎.\n",
+        "We define the sample mean as $$\\bar{X_n} = \\frac{1}{n} \\sum_{i=1}^n X_i$$\n",
+        "\n",
+        "The expectation of the sample mean is given by: \n",
+        "$$\\begin{align} \n",
+        "    \\mathbb{E}[\\bar{X_n}] &= \\frac{1}{n} \\sum_{i=1}^n \\mathbb{E}[X_i] \\\\\n",
+        "    &= \\frac{1}{n} (n \\mu) \\\\\n",
+        "    &= \\mu \n",
+        "\\end{align}$$\n",
+        "\n",
+        "The variance is given by: \n",
+        "$$\\begin{align} \n",
+        "    \\text{Var}(\\bar{X_n}) &= \\frac{1}{n^2} \\text{Var}( \\sum_{i=1}^n X_i) \\\\\n",
+        "    &=  \\frac{1}{n^2} \\left( \\sum_{i=1}^n \\text{Var}(X_i) \\right) \\\\\n",
+        "    &=  \\frac{1}{n^2} (n \\sigma^2) = \\frac{\\sigma^2}{n}\n",
+        "\\end{align}$$\n",
+        " \n",
+        "$\\bar{X_n}$ is normally distributed by the Central Limit Theorem (CLT).\n",
+        "\n",
+        "### Central Limit Theorem\n",
+        "The CLT states that no matter what population you are drawing from, if an i.i.d. sample of size $n$ is large, the probability distribution of the sample mean is roughly normal with mean 𝜇 and SD $\\sigma/\\sqrt{n}$.\n",
+        "\n",
+        "Any theorem that provides the rough distribution of a statistic and doesn’t need the distribution of the population is valuable to data scientists because we rarely know a lot about the population!\n",
+        "\n",
+        "For a more in-depth demo check out [onlinestatbook](https://onlinestatbook.com/stat_sim/sampling_dist/). \n",
+        "\n",
+        "THE CLT applies if the sample size $n$ is large, but how large does n have to be for the normal approximation to be good? It depends on the shape of the distribution of the population.\n",
+        "\n",
+        "* If population is roughly symmetric and unimodal/uniform, could need as few as $n = 20$.\n",
+        "* If population is very skewed, you will need bigger n.\n",
+        "* If in doubt, you can bootstrap the sample mean and see if the bootstrapped distribution is bell-shaped.\n",
+        "\n",
+        "### Using the Sample Mean to Estimate the Population Mean\n",
+        "Our goal with sampling is often to estimate some characteristic of a population. When we collect a single sample, it has just one average. Since our sample was random, it *could* have come out differently. The CLT helps us understand this difference. We should consider the average value and spread of all possible sample means, and what this means for how big $n$ should be.\n",
+        "\n",
+        "For every sample size, the expected value of the sample mean is the population mean. $\\mathbb{E}[\\bar{X_n}] = \\mu$. We call the sample mean an unbiased estimator of the population mean, and we'll cover this more in next lecture. \n",
+        "\n",
+        "Square root law ([Data 8](https://inferentialthinking.com/chapters/14/5/Variability_of_the_Sample_Mean.html#the-square-root-law)) states that if you increase the sample size by a factor, the SD decreases by the square root of the factor. $\\text{SD}(\\bar{X_n}) = \\frac{\\sigma}{\\sqrt{n}}$. The sample mean is more likely to be close to the population mean if we have a larger sample size.\n",
+        "<p align=\"center\">\n",
+        "<img src=\"images/SD_change.png\" alt='transformation' width='400'>\n",
+        "</p>"
       ]
     }
   ],