diff --git a/probability_1/probability_1.ipynb b/probability_1/probability_1.ipynb index a9bfa30e..dcaad5da 100644 --- a/probability_1/probability_1.ipynb +++ b/probability_1/probability_1.ipynb @@ -117,22 +117,6 @@ "distribution\n", "

\n", "\n", - "### Common Random Variables\n", - "There are several cases of random variables that appear often and have useful properties. Below are the ones we will explore further in this course. The numbers in parentheses are the parameters of a random variable, which are constants. Parameters define a random variable’s shape (i.e., distribution) and its values.\n", - "\n", - "* Bernoulli(p)\n", - " * Takes on value 1 with probability p, and 0 with probability 1 - p.\n", - " * AKA the “indicator” random variable.\n", - "* Binomial(n, p)\n", - " * Number of 1s in 'n' independent Bernoulli(p) trials.\n", - "* Uniform on a finite set of values\n", - " * Probability of each value is 1 / (number of possible values).\n", - " * For example, a standard/fair die.\n", - "* Uniform on the unit interval (0, 1)\n", - " * Density is flat at 1 on (0, 1) and 0 elsewhere.\n", - "* Normal($\\mu, \\sigma^2$)\n", - " * $$f(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}} \\exp\\left( -\\frac{1}{2}\\left(\\frac{x-\\mu}{\\sigma}\\right)^{\\!2}\\,\\right)$$\n", - "\n", "### Simulation\n", "Given a random variable $X$’s distribution, how could we **generate/simulate** a population? To do so, we can randomly pick values of $X$ according to its distribution using `np.random.choice` or `df.sample`. " ] @@ -365,24 +349,56 @@ "\n", "$$r(X, Y) = \\mathbb{E}\\left[\\left(\\frac{X-\\mathbb{E}[X]}{\\text{SD}(X)}\\right)\\left(\\frac{Y-\\mathbb{E}[Y]}{\\text{SD}(Y)}\\right)\\right] = \\frac{\\text{Cov}(X, Y)}{\\text{SD}(X)\\text{SD}(Y)}$$\n", "\n", - "It turns out we've been quietly using covariance for some time now! If $X$ and $Y$ are independent, then $\\text{Cov}(X, Y) =0$ and $r(X, Y) = 0$. Note, however, that the converse is not always true: $X$ and $Y$ could have $\\text{Cov}(X, Y) = r(X, Y) = 0$ but not be independent. This means that the variance of a sum of independent random variables is the sum of their variances:\n", - "$$\\text{Var}(X + Y) = \\text{Var}(X) + \\text{Var}(Y) \\qquad \\text{if } X, Y \\text{ independent}$$\n", - "\n", - "\n", - "### Standard Deviation\n", - "Notice that the units of variance are the *square* of the units of $X$. For example, if the random variable $X$ was measured in meters, its variance would be measured in meters$^2$. The **standard deviation** of a random variable converts things back to the correct scale by taking the square root of variance.\n", - "\n", - "$$\\text{SD}(X) = \\sqrt{\\text{Var}(X)}$$\n", - "\n", - "To find the standard deviation of a linear transformation $aX+b$, take the square root of the variance:\n", + "It turns out we've been quietly using covariance for some time now! If $X$ and $Y$ are independent, then $\\text{Cov}(X, Y) =0$ and $r(X, Y) = 0$. Note, however, that the converse is not always true: $X$ and $Y$ could have $\\text{Cov}(X, Y) = r(X, Y) = 0$ but not be independent. \n", + "\n", + "### Summary \n", + "* Let $X$ be a random variable with distribution $P(X=x). \n", + " * $\\mathbb{E}[X] = \\sum_{x} x P(X=x)$\n", + " * $\\text{Var}(X) = \\mathbb{E}[(X-\\mathbb{E}[X])^2] = \\mathbb{E}[X^2] - (\\mathbb{E}[X])^2$\n", + "* Let $a$ and $b$ be scalar values. \n", + " * $\\mathbb{E}[aX+b] = aE[\\mathbb{X}] + b$\n", + " * $\\text{Var}(aX+b) = a^2 \\text{Var}(X)$\n", + "* Let $Y$ be another random variable. \n", + " * $\\mathbb{E}[X+Y] = \\mathbb{E}[X] + \\mathbb{E}[Y]$\n", + " * $\\text{Var}(X + Y) = \\text{Var}(X) + \\text{Var}(Y) 2\\text{cov}(X,Y)$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Common Random Variables\n", + "There are several cases of random variables that appear often and have useful properties. Below are the ones we will explore further in this course. The numbers in parentheses are the parameters of a random variable, which are constants. Parameters define a random variable’s shape (i.e., distribution) and its values.\n", "\n", - "$$\\text{SD}(aX+b) = \\sqrt{\\text{Var}(aX+b)} = \\sqrt{a^2 \\text{Var}(X)} = |a|\\text{SD}(X)$$" + "* Bernoulli(p)\n", + " * Takes on value 1 with probability p, and 0 with probability 1 - p.\n", + " * AKA the “indicator” random variable.\n", + " * Let X be a Bernoulli(p) random variable\n", + " * $\\mathbb{E}[X] = 1 * p + 0 * (1-p) = p$\n", + " * $\\mathbb{E}[X^2] = 1^2 * p + 0 * (1-p) = p$\n", + " * $\\text{Var}(X) = \\mathbb{E}[X^2] - (\\mathbb{E}[X])^2 = p - p^2 = p(1-p)$\n", + "* Binomial(n, p)\n", + " * Number of 1s in 'n' independent Bernoulli(p) trials.\n", + " * Let $Y$ be a Binomial(n, p) random variable\n", + " * the distribution of $Y$ is given by the binomial formula, and we can write $Y = \\sum_{i=1}^n X_i$ where\n", + " * $X_i$ s the indicator of success on trial i. $X_i = 1$ if trial i is a success, else 0.\n", + " * all $X_i$ are i.i.d. and Bernoulli(p)\n", + " * $\\mathbb{E}[Y] = \\sum_{i=1}^n \\mathbb{E}[X_i] = np$\n", + " * $\\text{Var}(X) = \\sum_{i=1}^n \\text{Var}(X_i) = np(1-p)$ because $X_i$'s are independent, so $\\text{Cov}(X_i, X_j) = 0$ for all i, j.\n", + "* Uniform on a finite set of values\n", + " * Probability of each value is 1 / (number of possible values).\n", + " * For example, a standard/fair die.\n", + "* Uniform on the unit interval (0, 1)\n", + " * Density is flat at 1 on (0, 1) and 0 elsewhere.\n", + "* Normal($\\mu, \\sigma^2$)\n", + " * $$f(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}} \\exp\\left( -\\frac{1}{2}\\left(\\frac{x-\\mu}{\\sigma}\\right)^{\\!2}\\,\\right)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "## Populations and Samples \n", "transformation\n" ] }