diff --git a/_freeze/14-foundations-errors/execute-results/html.json b/_freeze/14-foundations-errors/execute-results/html.json index f6092e62..885f3a12 100644 --- a/_freeze/14-foundations-errors/execute-results/html.json +++ b/_freeze/14-foundations-errors/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "311909d91c1b1bbea30b17b84bc936d8", + "hash": "e93f2e657716e4a570ecb1e8b523dd7f", "result": { - "markdown": "# Decision Errors {#decerr}\n\n\n\n\n\n::: {.chapterintro data-latex=\"\"}\nUsing data to make inferential decisions about larger populations is not a perfect process.\nAs seen in Chapter \\@ref(foundations-randomization), a small p-value typically leads the researcher to a decision to reject the null claim or hypothesis.\nSometimes, however, data can produce a small p-value when the null hypothesis is actually true and the data are just inherently variable.\nHere we describe the errors which can arise in hypothesis testing, how to define and quantify the different errors, and suggestions for mitigating errors if possible.\n:::\n\n\\index{decision errors}\n\nHypothesis tests are not flawless.\nJust think of the court system: innocent people are sometimes wrongly convicted and the guilty sometimes walk free.\nSimilarly, data can point to the wrong conclusion.\nHowever, what distinguishes statistical hypothesis tests from a court system is that our framework allows us to quantify and control how often the data lead us to the incorrect conclusion.\n\nIn a hypothesis test, there are two competing hypotheses: the null and the alternative.\nWe make a statement about which one might be true, but we might choose incorrectly.\nThere are four possible scenarios in a hypothesis test, which are summarized in Table \\@ref(tab:fourHTScenarios).\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n\n\n\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n\n
Four different scenarios for hypothesis tests.
Test conclusion
Truth Reject null hypothesis Fail to reject null hypothesis
Null hypothesis is true Type 1 Error Good decision
Alternative hypothesis is true Good decision Type 2 Error
\n\n`````\n:::\n:::\n\n\nA **Type 1 Error**\\index{Type 1 Error} is rejecting the null hypothesis when $H_0$ is actually true.\nSince we rejected the null hypothesis in the sex discrimination and opportunity cost studies, it is possible that we made a Type 1 Error in one or both of those studies.\nA **Type 2 Error**\\index{Type 2 Error} is failing to reject the null hypothesis when the alternative is actually true.\n\n\n\n\n\n::: {.workedexample data-latex=\"\"}\nIn a US court, the defendant is either innocent $(H_0)$ or guilty $(H_A).$ What does a Type 1 Error represent in this context?\nWhat does a Type 2 Error represent?\nTable \\@ref(tab:fourHTScenarios) may be useful.\n\n------------------------------------------------------------------------\n\nIf the court makes a Type 1 Error, this means the defendant is innocent $(H_0$ true) but wrongly convicted.\nA Type 2 Error means the court failed to reject $H_0$ (i.e., failed to convict the person) when they were in fact guilty $(H_A$ true).\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nConsider the opportunity cost study where we concluded students were less likely to make a DVD purchase if they were reminded that money not spent now could be spent later.\nWhat would a Type 1 Error represent in this context?[^14-foundations-errors-1]\n:::\n\n[^14-foundations-errors-1]: Making a Type 1 Error in this context would mean that reminding students that money not spent now can be spent later does not affect their buying habits, despite the strong evidence (the data suggesting otherwise) found in the experiment.\n Notice that this does *not* necessarily mean something was wrong with the data or that we made a computational mistake.\n Sometimes data simply point us to the wrong conclusion, which is why scientific studies are often repeated to check initial findings.\n\n::: {.workedexample data-latex=\"\"}\nHow could we reduce the Type 1 Error rate in US courts?\nWhat influence would this have on the Type 2 Error rate?\n\n------------------------------------------------------------------------\n\nTo lower the Type 1 Error rate, we might raise our standard for conviction from \"beyond a reasonable doubt\" to \"beyond a conceivable doubt\" so fewer people would be wrongly convicted.\nHowever, this would also make it more difficult to convict the people who are actually guilty, so we would make more Type 2 Errors.\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nHow could we reduce the Type 2 Error rate in US courts?\nWhat influence would this have on the Type 1 Error rate?[^14-foundations-errors-2]\n:::\n\n[^14-foundations-errors-2]: To lower the Type 2 Error rate, we want to convict more guilty people.\n We could lower the standards for conviction from \"beyond a reasonable doubt\" to \"beyond a little doubt\".\n Lowering the bar for guilt will also result in more wrongful convictions, raising the Type 1 Error rate.\n\n\\index{decision errors}\n\nThe example and guided practice above provide an important lesson: if we reduce how often we make one type of error, we generally make more of the other type.\n\n\\clearpage\n\n## Discernibility level\n\n\\index{discernibility level}\n\nThe **discernibility level** provides the cutoff for the p-value which will lead to a decision of \"reject the null hypothesis.\" Choosing a discernibility level for a test is important in many contexts, and the traditional level is 0.05.\nHowever, it is sometimes helpful to adjust the discernibility level based on the application.\nWe may select a level that is smaller or larger than 0.05 depending on the consequences of any conclusions reached from the test.\n\nIf making a Type 1 Error is dangerous or especially costly, we should choose a small discernibility level (e.g., 0.01 or 0.001).\nIf we want to be very cautious about rejecting the null hypothesis, we demand very strong evidence favoring the alternative $H_A$ before we would reject $H_0.$\n\nIf a Type 2 Error is relatively more dangerous or much more costly than a Type 1 Error, then we should choose a higher discernibility level (e.g., 0.10).\nHere we want to be cautious about failing to reject $H_0$ when the null is actually false.\n\n\n\n\n\n::: {.tip data-latex=\"\"}\n**Discernibility levels should reflect consequences of errors.**\n\nThe discernibility level selected for a test should reflect the real-world consequences associated with making a Type 1 or Type 2 Error.\n:::\n\n## Two-sided hypotheses\n\n\\index{hypothesis testing}\n\nIn Chapter \\@ref(foundations-randomization) we explored whether women were discriminated against and whether a simple trick could make students a little thriftier.\nIn these two case studies, we have actually ignored some possibilities:\n\n- What if *men* are actually discriminated against?\n- What if the money trick actually makes students *spend more*?\n\nThese possibilities weren't considered in our original hypotheses or analyses.\nThe disregard of the extra alternatives may have seemed natural since the data pointed in the directions in which we framed the problems.\nHowever, there are two dangers if we ignore possibilities that disagree with our data or that conflict with our world view:\n\n1. Framing an alternative hypothesis simply to match the direction that the data point will generally inflate the Type 1 Error rate.\n After all the work we have done (and will continue to do) to rigorously control the error rates in hypothesis tests, careless construction of the alternative hypotheses can disrupt that hard work.\n\n2. If we only use alternative hypotheses that agree with our worldview, then we are going to be subjecting ourselves to **confirmation bias**\\index{confirmation bias}, which means we are looking for data that supports our ideas.\n That's not very scientific, and we can do better!\n\nThe original hypotheses we have seen are called **one-sided hypothesis tests**\\index{one-sided hypothesis test} because they only explored one direction of possibilities.\nSuch hypotheses are appropriate when we are exclusively interested in the single direction, but usually we want to consider all possibilities.\nTo do so, let's learn about **two-sided hypothesis tests**\\index{two-sided hypothesis test} in the context of a new study that examines the impact of using blood thinners on patients who have undergone CPR.\n\n\n\n\n\nCardiopulmonary resuscitation (CPR) is a procedure used on individuals suffering a heart attack when other emergency resources are unavailable.\nThis procedure is helpful in providing some blood circulation to keep a person alive, but CPR chest compression can also cause internal injuries.\nInternal bleeding and other injuries that can result from CPR complicate additional treatment efforts.\nFor instance, blood thinners may be used to help release a clot that is causing the heart attack once a patient arrives in the hospital.\nHowever, blood thinners negatively affect internal injuries.\n\nHere we consider an experiment with patients who underwent CPR for a heart attack and were subsequently admitted to a hospital.\nEach patient was randomly assigned to either receive a blood thinner (treatment group) or not receive a blood thinner (control group).\nThe outcome variable of interest was whether the patient survived for at least 24 hours.\n[@Bottiger:2001]\n\n::: {.data data-latex=\"\"}\nThe [`cpr`](http://openintrostat.github.io/openintro/reference/cpr.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n:::\n\n::: {.workedexample data-latex=\"\"}\nForm hypotheses for this study in plain and statistical language.\nLet $p_C$ represent the true survival rate of people who do not receive a blood thinner (corresponding to the control group) and $p_T$ represent the survival rate for people receiving a blood thinner (corresponding to the treatment group).\n\n------------------------------------------------------------------------\n\nWe want to understand whether blood thinners are helpful or harmful.\nWe'll consider both of these possibilities using a two-sided hypothesis test.\n\n- $H_0:$ Blood thinners do not have an overall survival effect, i.e., the survival proportions are the same in each group.\n $p_T - p_C = 0.$\n\n- $H_A:$ Blood thinners have an impact on survival, either positive or negative, but not zero.\n $p_T - p_C \\neq 0.$\n\nNote that if we had done a one-sided hypothesis test, the resulting hypotheses would have been:\n\n- $H_0:$ Blood thinners do not have a positive overall survival effect, i.e., the survival proportions for the blood thinner group is the same or lower than the control group.\n $p_T - p_C \\leq 0.$\n\n- $H_A:$ Blood thinners have a positive impact on survival.\n $p_T - p_C > 0.$\n:::\n\nThere were 50 patients in the experiment who did not receive a blood thinner and 40 patients who did.\nThe study results are shown in Table \\@ref(tab:cpr-summary).\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Results for the CPR study. Patients in the treatment group were given a blood thinner, and patients in the control group were not.
Group Died Survived Total
Control 39 11 50
Treatment 26 14 40
Total 65 25 90
\n\n`````\n:::\n:::\n\n\n::: {.guidedpractice data-latex=\"\"}\nWhat is the observed survival rate in the control group?\nAnd in the treatment group?\nAlso, provide a point estimate $(\\hat{p}_T - \\hat{p}_C)$ for the true difference in population survival proportions across the two groups: $p_T - p_C.$[^14-foundations-errors-3]\n:::\n\n[^14-foundations-errors-3]: Observed control survival rate: $\\hat{p}_C = \\frac{11}{50} = 0.22.$ Treatment survival rate: $\\hat{p}_T = \\frac{14}{40} = 0.35.$ Observed difference: $\\hat{p}_T - \\hat{p}_C = 0.35 - 0.22 = 0.13.$\n\nAccording to the point estimate, for patients who have undergone CPR outside of the hospital, an additional 13% of these patients survive when they are treated with blood thinners.\nHowever, we wonder if this difference could be easily explainable by chance, if the treatment has no effect on survival.\n\nAs we did in past studies, we will simulate what type of differences we might see from chance alone under the null hypothesis.\nBy randomly assigning each of the patient's files to a \"simulated treatment\" or \"simulated control\" allocation, we get a new grouping.\nIf we repeat this simulation 1,000 times, we can build a **null distribution**\\index{null distribution} of the differences shown in Figure \\@ref(fig:CPR-study-right-tail).\n\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Null distribution of the point estimate for the difference in proportions, $\\hat{p}_T - \\hat{p}_C.$ The shaded right tail shows observations that are at least as large as the observed difference, 0.13.](14-foundations-errors_files/figure-html/CPR-study-right-tail-1.png){width=90%}\n:::\n:::\n\n\nThe right tail area is 0.135.\n(Note: it is only a coincidence that we also have $\\hat{p}_T - \\hat{p}_C=0.13.)$ However, contrary to how we calculated the p-value in previous studies, the p-value of this test is not actually the tail area we calculated, i.e., it's not 0.135!\n\nThe p-value is defined as the probability we observe a result at least as favorable to the alternative hypothesis as the result (i.e., the difference) we observe.\nIn this case, any differences less than or equal to -0.13 would also provide equally strong evidence favoring the alternative hypothesis as a difference of +0.13 did.\nA difference of -0.13 would correspond to 13% higher survival rate in the control group than the treatment group.\nIn Figure \\@ref(fig:CPR-study-p-value) we have also shaded these differences in the left tail of the distribution.\nThese two shaded tails provide a visual representation of the p-value for a two-sided test.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Null distribution of the point estimate for the difference in proportions, $\\hat{p}_T - \\hat{p}_C.$ All values that are at least as extreme as +0.13 but in either direction away from 0 are shaded.](14-foundations-errors_files/figure-html/CPR-study-p-value-1.png){width=90%}\n:::\n:::\n\n\nFor a two-sided test, take the single tail (in this case, 0.131) and double it to get the p-value: 0.262.\nSince this p-value is larger than 0.05, we do not reject the null hypothesis.\nThat is, we do not find convincing evidence that the blood thinner has any influence on survival of patients who undergo CPR prior to arriving at the hospital.\n\n::: {.important data-latex=\"\"}\n**Default to a two-sided test.**\n\nWe want to be rigorous and keep an open mind when we analyze data and evidence.\nUse a one-sided hypothesis test only if you truly have interest in only one direction.\n:::\n\n::: {.important data-latex=\"\"}\n**Computing a p-value for a two-sided test.**\n\nFirst compute the p-value for one tail of the distribution, then double that value to get the two-sided p-value.\nThat's it!\n:::\n\n::: {.workedexample data-latex=\"\"}\nConsider the situation of the medical consultant.\nNow that you know about one-sided and two-sided tests, which type of test do you think is more appropriate?\n\n------------------------------------------------------------------------\n\nThe setting has been framed in the context of the consultant being helpful (which is what led us to a one-sided test originally), but what if the consultant actually performed *worse* than the average?\nWould we care?\nMore than ever!\nSince it turns out that we care about a finding in either direction, we should run a two-sided test.\nThe p-value for the two-sided test is double that of the one-sided test, here the simulated p-value would be 0.2444.\n:::\n\nGenerally, to find a two-sided p-value we double the single tail area, which remains a reasonable approach even when the distribution is asymmetric.\nHowever, the approach can result in p-values larger than 1 when the point estimate is very near the mean in the null distribution; in such cases, we write that the p-value is 1.\nAlso, very large p-values computed in this way (e.g., 0.85), may also be slightly inflated.\nTypically, we do not worry too much about the precision of very large p-values because they lead to the same analysis conclusion, even if the value is slightly off.\n\n\\clearpage\n\n## Controlling the Type 1 Error rate\n\nNow that we understand the difference between one-sided and two-sided tests, we must recognize when to use each type of test.\nBecause of the result of increased error rates, it is never okay to change two-sided tests to one-sided tests after observing the data.\nWe explore the consequences of ignoring this advice in the next example.\n\n::: {.workedexample data-latex=\"\"}\nUsing $\\alpha=0.05,$ we show that freely switching from two-sided tests to one-sided tests will lead us to make twice as many Type 1 Errors as intended.\n\n------------------------------------------------------------------------\n\nSuppose we are interested in finding any difference from 0.\nWe've created a smooth-looking **null distribution** representing differences due to chance in Figure \\@ref(fig:type1ErrorDoublingExampleFigure).\n\nSuppose the sample difference was larger than 0.\nThen if we can flip to a one-sided test, we would use $H_A:$ difference $> 0.$ Now if we obtain any observation in the upper 5% of the distribution, we would reject $H_0$ since the p-value would just be a the single tail.\nThus, if the null hypothesis is true, we incorrectly reject the null hypothesis about 5% of the time when the sample mean is above the null value, as shown in Figure \\@ref(fig:type1ErrorDoublingExampleFigure).\n\nSuppose the sample difference was smaller than 0.\nThen if we change to a one-sided test, we would use $H_A:$ difference $< 0.$ If the observed difference falls in the lower 5% of the figure, we would reject $H_0.$ That is, if the null hypothesis is true, then we would observe this situation about 5% of the time.\n\nBy examining these two scenarios, we can determine that we will make a Type 1 Error $5\\%+5\\%=10\\%$ of the time if we are allowed to swap to the \"best\" one-sided test for the data.\nThis is twice the error rate we prescribed with our discernibility level: $\\alpha=0.05$ (!).\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n![The shaded regions represent areas where we would reject $H_0$ under the bad practices considered in when $\\alpha = 0.05.$](14-foundations-errors_files/figure-html/type1ErrorDoublingExampleFigure-1.png){width=90%}\n:::\n:::\n\n\n::: caution\n**Hypothesis tests should be set up *before* seeing the data.**\n\nAfter observing data, it is tempting to turn a two-sided test into a one-sided test.\nAvoid this temptation.\nHypotheses should be set up *before* observing the data.\n:::\n\n\\index{hypothesis testing}\n\n\\clearpage\n\n## Power {#pow}\n\nAlthough we won't go into extensive detail here, power is an important topic for follow-up consideration after understanding the basics of hypothesis testing.\nA good power analysis is a vital preliminary step to any study as it will inform whether the data you collect are sufficient for being able to conclude your research broadly.\n\nOften times in experiment planning, there are two competing considerations:\n\n- We want to collect enough data that we can detect important effects.\n- Collecting data can be expensive, and, in experiments involving people, there may be some risk to patients.\n\nWhen planning a study, we want to know how likely we are to detect an effect we care about.\nIn other words, if there is a real effect, and that effect is large enough that it has practical value, then what is the probability that we detect that effect?\nThis probability is called the **power**\\index{power}, and we can compute it for different sample sizes or different effect sizes.\n\n::: {.important data-latex=\"\"}\n**Power.**\n\nThe power of the test is the probability of rejecting the null claim when the alternative claim is true.\n\nHow easy it is to detect the effect depends on both how big the effect is (e.g., how good the medical treatment is) as well as the sample size.\n:::\n\nWe think of power as the probability that you will become rich and famous from your science.\nIn order for your science to make a splash, you need to have good ideas!\nThat is, you won't become famous if you happen to find a single Type 1 error which rejects the null hypothesis.\nInstead, you'll become famous if your science is very good and important (that is, if the alternative hypothesis is true).\nThe better your science is (i.e., the better the medical treatment), the larger the *effect size* and the easier it will be for you to convince people of your work.\n\nNot only does your science need to be solid, but you also need to have evidence (i.e., data) that shows the effect.\nA few observations (e.g., $n = 2)$ is unlikely to be convincing because of well known ideas of natural variability.\nIndeed, the larger the dataset which provides evidence for your scientific claim, the more likely you are to convince the community that your idea is correct.\n\n\n\n\n\n\\clearpage\n\n## Chapter review {#chp15-review}\n\n### Summary\n\nAlthough hypothesis testing provides a strong framework for making decisions based on data, as the analyst, you need to understand how and when the process can go wrong.\nThat is, always keep in mind that the conclusion to a hypothesis test may not be right!\nSometimes when the null hypothesis is true, we will accidentally reject it and commit a type 1 error; sometimes when the alternative hypothesis is true, we will fail to reject the null hypothesis and commit a type 2 error.\nThe power of the test quantifies how likely it is to obtain data which will reject the null hypothesis when indeed the alternative is true; the power of the test is increased when larger sample sizes are taken.\n\n### Terms\n\nWe introduced the following terms in the chapter.\nIf you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.\nWe are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.\nHowever, you should be able to easily spot them as **bolded text**.\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
confirmation bias one-sided hypothesis test two-sided hypothesis test
discernibility level power type 1 error
null distribution significance level type 2 error
\n\n`````\n:::\n:::\n\n\n\\clearpage\n\n## Exercises {#chp14-exercises}\n\nAnswers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-14].\n\n::: {.exercises data-latex=\"\"}\n1. **Testing for Fibromyalgia.**\nA patient named Diana was diagnosed with Fibromyalgia, a long-term syndrome of body pain, and was prescribed anti-depressants. Being the skeptic that she is, Diana didn't initially believe that anti-depressants would help her symptoms. However, after a couple months of being on the medication she decides that the anti-depressants are working, because she feels like her symptoms are in fact getting better.\n\n a. Write the hypotheses in words for Diana's skeptical position when she started taking the anti-depressants.\n\n b. What is a Type 1 Error in this context?\n\n c. What is a Type 2 Error in this context?\n\n1. **Which is higher?**\nIn each part below, there is a value of interest and two scenarios: (i) and (ii). For each part, report if the value of interest is larger under scenario (i), scenario (ii), or whether the value is equal under the scenarios.\n\n a. The standard error of $\\hat{p}$ when (i) $n = 125$ or (ii) $n = 500$.\n\n b. The margin of error of a confidence interval when the confidence level is (i) 90% or (ii) 80%.\n\n c. The p-value for a Z-statistic of 2.5 calculated based on a (i) sample with $n = 500$ or based on a (ii) sample with $n = 1000$.\n\n d. The probability of making a Type 2 Error when the alternative hypothesis is true and the discernibility level is (i) 0.05 or (ii) 0.10.\n\n1. **Testing for food safety.**\nA food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.\n\n a. Write the hypotheses in words.\n\n b. What is a Type 1 Error in this context?\n\n c. What is a Type 2 Error in this context?\n\n d. Which error is more problematic for the restaurant owner? Why?\n\n e. Which error is more problematic for the diners? Why?\n\n f. As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant's license? Explain your reasoning.\n\n1. **True or false.**\nDetermine if the following statements are true or false, and explain your reasoning. If false, state how it could be corrected.\n\n a. If a given value (for example, the null hypothesized value of a parameter) is within a 95% confidence interval, it will also be within a 99% confidence interval.\n\n b. Decreasing the discernibility level ($\\alpha$) will increase the probability of making a Type 1 Error.\n\n c. Suppose the null hypothesis is $p = 0.5$ and we fail to reject $H_0$. Under this scenario, the true population proportion is 0.5.\n\n d. With large sample sizes, even small differences between the null value and the observed point estimate, a difference often called the effect size, will be identified as statistically discernible.\n \n \\clearpage\n\n1. **Online communication.**\nA study suggests that 60% of college student spend 10 or more hours per week communicating with others online. You believe that this is incorrect and decide to collect your own sample for a hypothesis test. You randomly sample 160 students from your dorm and find that 70% spent 10 or more hours a week communicating with others online. A friend of yours, who offers to help you with the hypothesis test, comes up with the following set of hypotheses. Indicate any errors you see. \n\n $$H_0: \\hat{p} < 0.6 \\quad \\quad H_A: \\hat{p} > 0.7$$\n\n1. **Same observation, different sample size.**\nSuppose you conduct a hypothesis test based on a sample where the sample size is $n = 50$, and arrive at a p-value of 0.08. You then refer back to your notes and discover that you made a careless mistake, the sample size should have been $n = 500$. Will your p-value increase, decrease, or stay the same? Explain.\n\n\n:::\n", + "engine": "knitr", + "markdown": "# Decision Errors {#sec-decerr}\n\n\n\n\n\n::: {.chapterintro data-latex=\"\"}\nUsing data to make inferential decisions about larger populations is not a perfect process.\nAs seen in Chapter \\@ref(foundations-randomization), a small p-value typically leads the researcher to a decision to reject the null claim or hypothesis.\nSometimes, however, data can produce a small p-value when the null hypothesis is actually true and the data are just inherently variable.\nHere we describe the errors which can arise in hypothesis testing, how to define and quantify the different errors, and suggestions for mitigating errors if possible.\n:::\n\n\\index{decision errors}\n\nHypothesis tests are not flawless.\nJust think of the court system: innocent people are sometimes wrongly convicted and the guilty sometimes walk free.\nSimilarly, data can point to the wrong conclusion.\nHowever, what distinguishes statistical hypothesis tests from a court system is that our framework allows us to quantify and control how often the data lead us to the incorrect conclusion.\n\nIn a hypothesis test, there are two competing hypotheses: the null and the alternative.\nWe make a statement about which one might be true, but we might choose incorrectly.\nThere are four possible scenarios in a hypothesis test, which are summarized in Table \\@ref(tab:fourHTScenarios).\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n\n\n\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n\n
Four different scenarios for hypothesis tests.
Test conclusion
Truth Reject null hypothesis Fail to reject null hypothesis
Null hypothesis is true Type 1 Error Good decision
Alternative hypothesis is true Good decision Type 2 Error
\n\n`````\n:::\n:::\n\n\nA **Type 1 Error**\\index{Type 1 Error} is rejecting the null hypothesis when $H_0$ is actually true.\nSince we rejected the null hypothesis in the sex discrimination and opportunity cost studies, it is possible that we made a Type 1 Error in one or both of those studies.\nA **Type 2 Error**\\index{Type 2 Error} is failing to reject the null hypothesis when the alternative is actually true.\n\n\n\n\n\n::: {.workedexample data-latex=\"\"}\nIn a US court, the defendant is either innocent $(H_0)$ or guilty $(H_A).$ What does a Type 1 Error represent in this context?\nWhat does a Type 2 Error represent?\nTable \\@ref(tab:fourHTScenarios) may be useful.\n\n------------------------------------------------------------------------\n\nIf the court makes a Type 1 Error, this means the defendant is innocent $(H_0$ true) but wrongly convicted.\nA Type 2 Error means the court failed to reject $H_0$ (i.e., failed to convict the person) when they were in fact guilty $(H_A$ true).\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nConsider the opportunity cost study where we concluded students were less likely to make a DVD purchase if they were reminded that money not spent now could be spent later.\nWhat would a Type 1 Error represent in this context?[^14-foundations-errors-1]\n:::\n\n[^14-foundations-errors-1]: Making a Type 1 Error in this context would mean that reminding students that money not spent now can be spent later does not affect their buying habits, despite the strong evidence (the data suggesting otherwise) found in the experiment.\n Notice that this does *not* necessarily mean something was wrong with the data or that we made a computational mistake.\n Sometimes data simply point us to the wrong conclusion, which is why scientific studies are often repeated to check initial findings.\n\n::: {.workedexample data-latex=\"\"}\nHow could we reduce the Type 1 Error rate in US courts?\nWhat influence would this have on the Type 2 Error rate?\n\n------------------------------------------------------------------------\n\nTo lower the Type 1 Error rate, we might raise our standard for conviction from \"beyond a reasonable doubt\" to \"beyond a conceivable doubt\" so fewer people would be wrongly convicted.\nHowever, this would also make it more difficult to convict the people who are actually guilty, so we would make more Type 2 Errors.\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nHow could we reduce the Type 2 Error rate in US courts?\nWhat influence would this have on the Type 1 Error rate?[^14-foundations-errors-2]\n:::\n\n[^14-foundations-errors-2]: To lower the Type 2 Error rate, we want to convict more guilty people.\n We could lower the standards for conviction from \"beyond a reasonable doubt\" to \"beyond a little doubt\".\n Lowering the bar for guilt will also result in more wrongful convictions, raising the Type 1 Error rate.\n\n\\index{decision errors}\n\nThe example and guided practice above provide an important lesson: if we reduce how often we make one type of error, we generally make more of the other type.\n\n\\clearpage\n\n## Discernibility level\n\n\\index{discernibility level}\n\nThe **discernibility level** provides the cutoff for the p-value which will lead to a decision of \"reject the null hypothesis.\" Choosing a discernibility level for a test is important in many contexts, and the traditional level is 0.05.\nHowever, it is sometimes helpful to adjust the discernibility level based on the application.\nWe may select a level that is smaller or larger than 0.05 depending on the consequences of any conclusions reached from the test.\n\nIf making a Type 1 Error is dangerous or especially costly, we should choose a small discernibility level (e.g., 0.01 or 0.001).\nIf we want to be very cautious about rejecting the null hypothesis, we demand very strong evidence favoring the alternative $H_A$ before we would reject $H_0.$\n\nIf a Type 2 Error is relatively more dangerous or much more costly than a Type 1 Error, then we should choose a higher discernibility level (e.g., 0.10).\nHere we want to be cautious about failing to reject $H_0$ when the null is actually false.\n\n\n\n\n\n::: {.tip data-latex=\"\"}\n**Discernibility levels should reflect consequences of errors.**\n\nThe discernibility level selected for a test should reflect the real-world consequences associated with making a Type 1 or Type 2 Error.\n:::\n\n## Two-sided hypotheses\n\n\\index{hypothesis testing}\n\nIn Chapter \\@ref(foundations-randomization) we explored whether women were discriminated against and whether a simple trick could make students a little thriftier.\nIn these two case studies, we have actually ignored some possibilities:\n\n- What if *men* are actually discriminated against?\n- What if the money trick actually makes students *spend more*?\n\nThese possibilities weren't considered in our original hypotheses or analyses.\nThe disregard of the extra alternatives may have seemed natural since the data pointed in the directions in which we framed the problems.\nHowever, there are two dangers if we ignore possibilities that disagree with our data or that conflict with our world view:\n\n1. Framing an alternative hypothesis simply to match the direction that the data point will generally inflate the Type 1 Error rate.\n After all the work we have done (and will continue to do) to rigorously control the error rates in hypothesis tests, careless construction of the alternative hypotheses can disrupt that hard work.\n\n2. If we only use alternative hypotheses that agree with our worldview, then we are going to be subjecting ourselves to **confirmation bias**\\index{confirmation bias}, which means we are looking for data that supports our ideas.\n That's not very scientific, and we can do better!\n\nThe original hypotheses we have seen are called **one-sided hypothesis tests**\\index{one-sided hypothesis test} because they only explored one direction of possibilities.\nSuch hypotheses are appropriate when we are exclusively interested in the single direction, but usually we want to consider all possibilities.\nTo do so, let's learn about **two-sided hypothesis tests**\\index{two-sided hypothesis test} in the context of a new study that examines the impact of using blood thinners on patients who have undergone CPR.\n\n\n\n\n\nCardiopulmonary resuscitation (CPR) is a procedure used on individuals suffering a heart attack when other emergency resources are unavailable.\nThis procedure is helpful in providing some blood circulation to keep a person alive, but CPR chest compression can also cause internal injuries.\nInternal bleeding and other injuries that can result from CPR complicate additional treatment efforts.\nFor instance, blood thinners may be used to help release a clot that is causing the heart attack once a patient arrives in the hospital.\nHowever, blood thinners negatively affect internal injuries.\n\nHere we consider an experiment with patients who underwent CPR for a heart attack and were subsequently admitted to a hospital.\nEach patient was randomly assigned to either receive a blood thinner (treatment group) or not receive a blood thinner (control group).\nThe outcome variable of interest was whether the patient survived for at least 24 hours.\n[@Bottiger:2001]\n\n::: {.data data-latex=\"\"}\nThe [`cpr`](http://openintrostat.github.io/openintro/reference/cpr.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n:::\n\n::: {.workedexample data-latex=\"\"}\nForm hypotheses for this study in plain and statistical language.\nLet $p_C$ represent the true survival rate of people who do not receive a blood thinner (corresponding to the control group) and $p_T$ represent the survival rate for people receiving a blood thinner (corresponding to the treatment group).\n\n------------------------------------------------------------------------\n\nWe want to understand whether blood thinners are helpful or harmful.\nWe'll consider both of these possibilities using a two-sided hypothesis test.\n\n- $H_0:$ Blood thinners do not have an overall survival effect, i.e., the survival proportions are the same in each group.\n $p_T - p_C = 0.$\n\n- $H_A:$ Blood thinners have an impact on survival, either positive or negative, but not zero.\n $p_T - p_C \\neq 0.$\n\nNote that if we had done a one-sided hypothesis test, the resulting hypotheses would have been:\n\n- $H_0:$ Blood thinners do not have a positive overall survival effect, i.e., the survival proportions for the blood thinner group is the same or lower than the control group.\n $p_T - p_C \\leq 0.$\n\n- $H_A:$ Blood thinners have a positive impact on survival.\n $p_T - p_C > 0.$\n:::\n\nThere were 50 patients in the experiment who did not receive a blood thinner and 40 patients who did.\nThe study results are shown in Table \\@ref(tab:cpr-summary).\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Results for the CPR study. Patients in the treatment group were given a blood thinner, and patients in the control group were not.
Group Died Survived Total
Control 39 11 50
Treatment 26 14 40
Total 65 25 90
\n\n`````\n:::\n:::\n\n\n::: {.guidedpractice data-latex=\"\"}\nWhat is the observed survival rate in the control group?\nAnd in the treatment group?\nAlso, provide a point estimate $(\\hat{p}_T - \\hat{p}_C)$ for the true difference in population survival proportions across the two groups: $p_T - p_C.$[^14-foundations-errors-3]\n:::\n\n[^14-foundations-errors-3]: Observed control survival rate: $\\hat{p}_C = \\frac{11}{50} = 0.22.$ Treatment survival rate: $\\hat{p}_T = \\frac{14}{40} = 0.35.$ Observed difference: $\\hat{p}_T - \\hat{p}_C = 0.35 - 0.22 = 0.13.$\n\nAccording to the point estimate, for patients who have undergone CPR outside of the hospital, an additional 13% of these patients survive when they are treated with blood thinners.\nHowever, we wonder if this difference could be easily explainable by chance, if the treatment has no effect on survival.\n\nAs we did in past studies, we will simulate what type of differences we might see from chance alone under the null hypothesis.\nBy randomly assigning each of the patient's files to a \"simulated treatment\" or \"simulated control\" allocation, we get a new grouping.\nIf we repeat this simulation 1,000 times, we can build a **null distribution**\\index{null distribution} of the differences shown in Figure \\@ref(fig:CPR-study-right-tail).\n\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Null distribution of the point estimate for the difference in proportions, $\\hat{p}_T - \\hat{p}_C.$ The shaded right tail shows observations that are at least as large as the observed difference, 0.13.](14-foundations-errors_files/figure-html/CPR-study-right-tail-1.png){width=90%}\n:::\n:::\n\n\nThe right tail area is 0.135.\n(Note: it is only a coincidence that we also have $\\hat{p}_T - \\hat{p}_C=0.13.)$ However, contrary to how we calculated the p-value in previous studies, the p-value of this test is not actually the tail area we calculated, i.e., it's not 0.135!\n\nThe p-value is defined as the probability we observe a result at least as favorable to the alternative hypothesis as the result (i.e., the difference) we observe.\nIn this case, any differences less than or equal to -0.13 would also provide equally strong evidence favoring the alternative hypothesis as a difference of +0.13 did.\nA difference of -0.13 would correspond to 13% higher survival rate in the control group than the treatment group.\nIn Figure \\@ref(fig:CPR-study-p-value) we have also shaded these differences in the left tail of the distribution.\nThese two shaded tails provide a visual representation of the p-value for a two-sided test.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Null distribution of the point estimate for the difference in proportions, $\\hat{p}_T - \\hat{p}_C.$ All values that are at least as extreme as +0.13 but in either direction away from 0 are shaded.](14-foundations-errors_files/figure-html/CPR-study-p-value-1.png){width=90%}\n:::\n:::\n\n\nFor a two-sided test, take the single tail (in this case, 0.131) and double it to get the p-value: 0.262.\nSince this p-value is larger than 0.05, we do not reject the null hypothesis.\nThat is, we do not find convincing evidence that the blood thinner has any influence on survival of patients who undergo CPR prior to arriving at the hospital.\n\n::: {.important data-latex=\"\"}\n**Default to a two-sided test.**\n\nWe want to be rigorous and keep an open mind when we analyze data and evidence.\nUse a one-sided hypothesis test only if you truly have interest in only one direction.\n:::\n\n::: {.important data-latex=\"\"}\n**Computing a p-value for a two-sided test.**\n\nFirst compute the p-value for one tail of the distribution, then double that value to get the two-sided p-value.\nThat's it!\n:::\n\n::: {.workedexample data-latex=\"\"}\nConsider the situation of the medical consultant.\nNow that you know about one-sided and two-sided tests, which type of test do you think is more appropriate?\n\n------------------------------------------------------------------------\n\nThe setting has been framed in the context of the consultant being helpful (which is what led us to a one-sided test originally), but what if the consultant actually performed *worse* than the average?\nWould we care?\nMore than ever!\nSince it turns out that we care about a finding in either direction, we should run a two-sided test.\nThe p-value for the two-sided test is double that of the one-sided test, here the simulated p-value would be 0.2444.\n:::\n\nGenerally, to find a two-sided p-value we double the single tail area, which remains a reasonable approach even when the distribution is asymmetric.\nHowever, the approach can result in p-values larger than 1 when the point estimate is very near the mean in the null distribution; in such cases, we write that the p-value is 1.\nAlso, very large p-values computed in this way (e.g., 0.85), may also be slightly inflated.\nTypically, we do not worry too much about the precision of very large p-values because they lead to the same analysis conclusion, even if the value is slightly off.\n\n\\clearpage\n\n## Controlling the Type 1 Error rate\n\nNow that we understand the difference between one-sided and two-sided tests, we must recognize when to use each type of test.\nBecause of the result of increased error rates, it is never okay to change two-sided tests to one-sided tests after observing the data.\nWe explore the consequences of ignoring this advice in the next example.\n\n::: {.workedexample data-latex=\"\"}\nUsing $\\alpha=0.05,$ we show that freely switching from two-sided tests to one-sided tests will lead us to make twice as many Type 1 Errors as intended.\n\n------------------------------------------------------------------------\n\nSuppose we are interested in finding any difference from 0.\nWe've created a smooth-looking **null distribution** representing differences due to chance in Figure \\@ref(fig:type1ErrorDoublingExampleFigure).\n\nSuppose the sample difference was larger than 0.\nThen if we can flip to a one-sided test, we would use $H_A:$ difference $> 0.$ Now if we obtain any observation in the upper 5% of the distribution, we would reject $H_0$ since the p-value would just be a the single tail.\nThus, if the null hypothesis is true, we incorrectly reject the null hypothesis about 5% of the time when the sample mean is above the null value, as shown in Figure \\@ref(fig:type1ErrorDoublingExampleFigure).\n\nSuppose the sample difference was smaller than 0.\nThen if we change to a one-sided test, we would use $H_A:$ difference $< 0.$ If the observed difference falls in the lower 5% of the figure, we would reject $H_0.$ That is, if the null hypothesis is true, then we would observe this situation about 5% of the time.\n\nBy examining these two scenarios, we can determine that we will make a Type 1 Error $5\\%+5\\%=10\\%$ of the time if we are allowed to swap to the \"best\" one-sided test for the data.\nThis is twice the error rate we prescribed with our discernibility level: $\\alpha=0.05$ (!).\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n![The shaded regions represent areas where we would reject $H_0$ under the bad practices considered in when $\\alpha = 0.05.$](14-foundations-errors_files/figure-html/type1ErrorDoublingExampleFigure-1.png){width=90%}\n:::\n:::\n\n\n::: caution\n**Hypothesis tests should be set up *before* seeing the data.**\n\nAfter observing data, it is tempting to turn a two-sided test into a one-sided test.\nAvoid this temptation.\nHypotheses should be set up *before* observing the data.\n:::\n\n\\index{hypothesis testing}\n\n\\clearpage\n\n## Power {#pow}\n\nAlthough we won't go into extensive detail here, power is an important topic for follow-up consideration after understanding the basics of hypothesis testing.\nA good power analysis is a vital preliminary step to any study as it will inform whether the data you collect are sufficient for being able to conclude your research broadly.\n\nOften times in experiment planning, there are two competing considerations:\n\n- We want to collect enough data that we can detect important effects.\n- Collecting data can be expensive, and, in experiments involving people, there may be some risk to patients.\n\nWhen planning a study, we want to know how likely we are to detect an effect we care about.\nIn other words, if there is a real effect, and that effect is large enough that it has practical value, then what is the probability that we detect that effect?\nThis probability is called the **power**\\index{power}, and we can compute it for different sample sizes or different effect sizes.\n\n::: {.important data-latex=\"\"}\n**Power.**\n\nThe power of the test is the probability of rejecting the null claim when the alternative claim is true.\n\nHow easy it is to detect the effect depends on both how big the effect is (e.g., how good the medical treatment is) as well as the sample size.\n:::\n\nWe think of power as the probability that you will become rich and famous from your science.\nIn order for your science to make a splash, you need to have good ideas!\nThat is, you won't become famous if you happen to find a single Type 1 error which rejects the null hypothesis.\nInstead, you'll become famous if your science is very good and important (that is, if the alternative hypothesis is true).\nThe better your science is (i.e., the better the medical treatment), the larger the *effect size* and the easier it will be for you to convince people of your work.\n\nNot only does your science need to be solid, but you also need to have evidence (i.e., data) that shows the effect.\nA few observations (e.g., $n = 2)$ is unlikely to be convincing because of well known ideas of natural variability.\nIndeed, the larger the dataset which provides evidence for your scientific claim, the more likely you are to convince the community that your idea is correct.\n\n\n\n\n\n\\clearpage\n\n## Chapter review {#chp15-review}\n\n### Summary\n\nAlthough hypothesis testing provides a strong framework for making decisions based on data, as the analyst, you need to understand how and when the process can go wrong.\nThat is, always keep in mind that the conclusion to a hypothesis test may not be right!\nSometimes when the null hypothesis is true, we will accidentally reject it and commit a type 1 error; sometimes when the alternative hypothesis is true, we will fail to reject the null hypothesis and commit a type 2 error.\nThe power of the test quantifies how likely it is to obtain data which will reject the null hypothesis when indeed the alternative is true; the power of the test is increased when larger sample sizes are taken.\n\n### Terms\n\nWe introduced the following terms in the chapter.\nIf you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.\nWe are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.\nHowever, you should be able to easily spot them as **bolded text**.\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
confirmation bias one-sided hypothesis test two-sided hypothesis test
discernibility level power type 1 error
null distribution significance level type 2 error
\n\n`````\n:::\n:::\n\n\n\\clearpage\n\n## Exercises {#chp14-exercises}\n\nAnswers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-14].\n\n::: {.exercises data-latex=\"\"}\n1. **Testing for Fibromyalgia.**\nA patient named Diana was diagnosed with Fibromyalgia, a long-term syndrome of body pain, and was prescribed anti-depressants. Being the skeptic that she is, Diana didn't initially believe that anti-depressants would help her symptoms. However, after a couple months of being on the medication she decides that the anti-depressants are working, because she feels like her symptoms are in fact getting better.\n\n a. Write the hypotheses in words for Diana's skeptical position when she started taking the anti-depressants.\n\n b. What is a Type 1 Error in this context?\n\n c. What is a Type 2 Error in this context?\n\n1. **Which is higher?**\nIn each part below, there is a value of interest and two scenarios: (i) and (ii). For each part, report if the value of interest is larger under scenario (i), scenario (ii), or whether the value is equal under the scenarios.\n\n a. The standard error of $\\hat{p}$ when (i) $n = 125$ or (ii) $n = 500$.\n\n b. The margin of error of a confidence interval when the confidence level is (i) 90% or (ii) 80%.\n\n c. The p-value for a Z-statistic of 2.5 calculated based on a (i) sample with $n = 500$ or based on a (ii) sample with $n = 1000$.\n\n d. The probability of making a Type 2 Error when the alternative hypothesis is true and the discernibility level is (i) 0.05 or (ii) 0.10.\n\n1. **Testing for food safety.**\nA food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.\n\n a. Write the hypotheses in words.\n\n b. What is a Type 1 Error in this context?\n\n c. What is a Type 2 Error in this context?\n\n d. Which error is more problematic for the restaurant owner? Why?\n\n e. Which error is more problematic for the diners? Why?\n\n f. As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant's license? Explain your reasoning.\n\n1. **True or false.**\nDetermine if the following statements are true or false, and explain your reasoning. If false, state how it could be corrected.\n\n a. If a given value (for example, the null hypothesized value of a parameter) is within a 95% confidence interval, it will also be within a 99% confidence interval.\n\n b. Decreasing the discernibility level ($\\alpha$) will increase the probability of making a Type 1 Error.\n\n c. Suppose the null hypothesis is $p = 0.5$ and we fail to reject $H_0$. Under this scenario, the true population proportion is 0.5.\n\n d. With large sample sizes, even small differences between the null value and the observed point estimate, a difference often called the effect size, will be identified as statistically discernible.\n \n \\clearpage\n\n1. **Online communication.**\nA study suggests that 60% of college student spend 10 or more hours per week communicating with others online. You believe that this is incorrect and decide to collect your own sample for a hypothesis test. You randomly sample 160 students from your dorm and find that 70% spent 10 or more hours a week communicating with others online. A friend of yours, who offers to help you with the hypothesis test, comes up with the following set of hypotheses. Indicate any errors you see. \n\n $$H_0: \\hat{p} < 0.6 \\quad \\quad H_A: \\hat{p} > 0.7$$\n\n1. **Same observation, different sample size.**\nSuppose you conduct a hypothesis test based on a sample where the sample size is $n = 50$, and arrive at a p-value of 0.08. You then refer back to your notes and discover that you made a careless mistake, the sample size should have been $n = 500$. Will your p-value increase, decrease, or stay the same? Explain.\n\n\n:::\n", "supporting": [ "14-foundations-errors_files" ], diff --git a/_freeze/16-inference-one-prop/execute-results/html.json b/_freeze/16-inference-one-prop/execute-results/html.json index cb8ed094..df8fe6b2 100644 --- a/_freeze/16-inference-one-prop/execute-results/html.json +++ b/_freeze/16-inference-one-prop/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "3453162b2ab691eae69adb40b7e058ab", + "hash": "a01b7e6412d8aed4572b180d1cab3300", "result": { - "markdown": "\n\n\n# Inference for a single proportion {#inference-one-prop}\n\n::: {.chapterintro data-latex=\"\"}\nFocusing now on statistical inference for categorical data, we will revisit many of the foundational aspects of hypothesis testing from Chapter \\@ref(foundations-randomization).\n\nThe three data structures we detail are one binary variable, summarized using a single proportion; two binary variables, summarized using a difference of two proportions; and two categorical variables, summarized using a two-way table.\nWhen appropriate, each of the data structures will be analyzed using the three methods from Chapters \\@ref(foundations-randomization), \\@ref(foundations-bootstrapping), and \\@ref(foundations-mathematical): randomization test, bootstrapping, and mathematical models, respectively.\n\nAs we build on the inferential ideas, we will visit new foundational concepts in statistical inference.\nFor example, we will cover the conditions for when a normal model is appropriate; the two different error rates in hypothesis testing; and choosing the confidence level for a confidence interval.\n:::\n\nWe encountered inference methods for a single proportion in Chapter \\@ref(foundations-bootstrapping), exploring point estimates and confidence intervals.\nIn this section, we'll do a review of these topics and how to choose an appropriate sample size when collecting data for single proportion contexts.\n\nNote that there is only one variable being measured in a study which focuses on one proportion.\nFor each observational unit, the single variable is measured as either a success or failure (e.g., \"surgical complication\" vs. \"no surgical complication\").\nBecause the nature of the research question at hand focuses on only a single variable, there is not a way to randomize the variable across a different (explanatory) variable.\nFor this reason, we will not use randomization as an analysis tool when focusing on a single proportion.\nInstead, we will apply bootstrapping techniques to test a given hypothesis, and we will also revisit the associated mathematical models.\n\n\\vspace{-4mm}\n\n## Bootstrap test for a proportion {#one-prop-null-boot}\n\nThe bootstrap simulation concept when $H_0$ is true is similar to the ideas used in the case studies presented in Chapter \\@ref(foundations-bootstrapping) where we bootstrapped without an assumption about $H_0.$ Because we will be testing a hypothesized value of $p$ (referred to as $p_0),$ the bootstrap simulation for hypothesis testing has a fantastic advantage that it can be used for any sample size (a huge benefit for small samples, a nice alternative for large samples).\n\nWe expand on the medical consultant example, see Section \\@ref(case-study-med-consult), where instead of finding an interval estimate for the true complication rate, we work to test a specific research claim.\n\n\\clearpage\n\n### Observed data\n\nRecall the set-up for the example:\n\nPeople providing an organ for donation sometimes seek the help of a special \"medical consultant\".\nThese consultants assist the patient in all aspects of the surgery, with the goal of reducing the possibility of complications during the medical procedure and recovery.\nPatients might choose a consultant based in part on the historical complication rate of the consultant's clients.\nOne consultant tried to attract patients by noting the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated.\nShe claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).\n\n::: {.workedexample data-latex=\"\"}\nUsing the data, is it possible to assess the consultant's claim that her complication rate is less than 10%?\n\n------------------------------------------------------------------------\n\nNo.\nThe claim is that there is a causal connection, but the data are observational.\nPatients who hire this medical consultant may have lower complication rates for other reasons.\n\nWhile it is not possible to assess this causal claim, it is still possible to test for an association using these data.\nFor this question we ask, could the low complication rate of $\\hat{p} = 0.048$ have simply occurred by chance, if her complication rate does not differ from the US standard rate?\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nWrite out hypotheses in both plain and statistical language to test for the association between the consultant's work and the true complication rate, $p,$ for the consultant's clients.[^16-inference-one-prop-1]\n:::\n\n[^16-inference-one-prop-1]: $H_0:$ There is no association between the consultant's contributions and the clients' complication rate.\n In statistical language, $p = 0.10.$ $H_A:$ Patients who work with the consultant tend to have a complication rate lower than 10%, i.e., $p < 0.10.$\n\nBecause, as it turns out, the conditions of working with the normal distribution are not met (see Section \\@ref(one-prop-norm)), the uncertainty associated with the sample proportion should not be modeled using the normal distribution, as doing so would underestimate the uncertainty associated with the sample statistic.\nHowever, we would still like to assess the hypotheses from the previous Guided Practice in absence of the normal framework.\nTo do so, we need to evaluate the possibility of a sample value $(\\hat{p})$ as far below the null value, $p_0 = 0.10$ as what was observed.\nThe deviation of the sample value from the hypothesized parameter is usually quantified with a p-value.\n\nThe p-value is computed based on the null distribution, which is the distribution of the test statistic if the null hypothesis is true.\nSupposing the null hypothesis is true, we can compute the p-value by identifying the probability of observing a test statistic that favors the alternative hypothesis at least as strongly as the observed test statistic.\nHere we will use a bootstrap simulation to calculate the p-value.\n\n\\clearpage\n\n### Variability of the statistic\n\nWe want to identify the sampling distribution of the test statistic $(\\hat{p})$ if the null hypothesis was true.\nIn other words, we want to see the variability we can expect from sample proportions if the null hypothesis was true.\nThen we plan to use this information to decide whether there is enough evidence to reject the null hypothesis.\n\nUnder the null hypothesis, 10% of liver donors have complications during or after surgery.\nSuppose this rate was really no different for the consultant's clients (for *all* the consultant's clients, not just the 62 previously measured).\nIf this was the case, we could *simulate* 62 clients to get a sample proportion for the complication rate from the null distribution.\nSimulating observations using a hypothesized null parameter value is often called a **parametric bootstrap simulation**\\index{parametric bootstrap}.\n\n\n\n\n\nSimilar to the process described in Chapter \\@ref(foundations-bootstrapping), each client can be simulated using a bag of marbles with 10% red marbles and 90% white marbles.\nSampling a marble from the bag (with 10% red marbles) is one way of simulating whether a patient has a complication *if the true complication rate is 10%*.\nIf we select 62 marbles and then compute the proportion of patients with complications in the simulation, $\\hat{p}_{sim1},$ then the resulting sample proportion is a sample from the null distribution.\n\nThere were 5 simulated cases with a complication and 57 simulated cases without a complication, i.e., $\\hat{p}_{sim1} = 5/62 = 0.081.$\n\n::: {.workedexample data-latex=\"\"}\nIs this one simulation enough to determine whether we should reject the null hypothesis?\n\n------------------------------------------------------------------------\n\nNo.\nTo assess the hypotheses, we need to see a distribution of many values of $\\hat{p}_{sim},$ not just a *single* draw from this sampling distribution.\n:::\n\n### Observed statistic vs. null statistics\n\nOne simulation isn't enough to get a sense of the null distribution; many simulation studies are needed.\nRoughly 10,000 seems sufficient.\nHowever, paying someone to simulate 10,000 studies by hand is a waste of time and money.\nInstead, simulations are typically programmed into a computer, which is much more efficient.\n\n\n\n\n\nFigure \\@ref(fig:nullDistForPHatIfLiverTransplantConsultantIsNotHelpful) shows the results of 10,000 simulated studies.\nThe proportions that are equal to or less than $\\hat{p} = 0.048$ are shaded.\nThe shaded areas represent sample proportions under the null distribution that provide at least as much evidence as $\\hat{p}$ favoring the alternative hypothesis.\nThere were 420 simulated sample proportions with $\\hat{p}_{sim} \\leq 0.048.$ We use these to construct the null distribution's left-tail area and find the p-value:\n\n$$\\text{left tail area} = \\frac{\\text{Number of observed simulations with }\\hat{p}_{sim} \\leq \\text{ 0.048}}{10000}$$\n\nOf the 10,000 simulated $\\hat{p}_{sim},$ 420 were equal to or smaller than $\\hat{p}.$ Since the hypothesis test is one-sided, the estimated p-value is equal to this tail area: 0.042.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![(ref:nullDistForPHatIfLiverTransplantConsultantIsNotHelpful-cap)](16-inference-one-prop_files/figure-html/nullDistForPHatIfLiverTransplantConsultantIsNotHelpful-1.png){width=90%}\n:::\n:::\n\n\n(ref:nullDistForPHatIfLiverTransplantConsultantIsNotHelpful-cap) The null distribution for $\\hat{p},$ created from 10,000 simulated studies. The left tail, representing the p-value for the hypothesis test, contains 4.2% of the simulations.\n\n::: {.guidedpractice data-latex=\"\"}\nBecause the estimated p-value is 0.042, which is smaller than the discernibility level 0.05, we reject the null hypothesis.\nExplain what this means in plain language in the context of the problem.[^16-inference-one-prop-2]\n:::\n\n[^16-inference-one-prop-2]: There is sufficiently strong evidence to reject the null hypothesis in favor of the alternative hypothesis.\n We would conclude that there is evidence that the consultant's surgery complication rate is lower than the US standard rate of 10%.\n\n::: {.guidedpractice data-latex=\"\"}\nDoes the conclusion in the previous Guided Practice imply the consultant is good at their job?\nExplain.[^16-inference-one-prop-3]\n:::\n\n[^16-inference-one-prop-3]: No.\n Not necessarily.\n The evidence supports the alternative hypothesis that the consultant's complication rate is lower, but it's not a measurement of their performance.\n\n::: {.important data-latex=\"\"}\n**Null distribution of** $\\hat{p}$ **with bootstrap simulation.**\n\nRegardless of the statistical method chosen, the p-value is always derived by analyzing the null distribution of the test statistic.\nThe normal model poorly approximates the null distribution for $\\hat{p}$ when the success-failure condition is not satisfied.\nAs a substitute, we can generate the null distribution using simulated sample proportions and use this distribution to compute the tail area, i.e., the p-value.\n:::\n\nIn the previous Guided Practice, the p-value is *estimated*.\nIt is not exact because the simulated null distribution itself is only a close approximation of the sampling distribution of the sample statistic.\nAn exact p-value can be generated using the binomial distribution, but that method will not be covered in this text.\n\n\\clearpage\n\n## Mathematical model for a proportion {#one-prop-norm}\n\n### Conditions\n\nIn Section \\@ref(normalDist), we introduced the normal distribution and showed how it can be used as a mathematical model to describe the variability of a statistic.\nThere are conditions under which a sample proportion $\\hat{p}$ is well modeled using a normal distribution.\nWhen the sample observations are independent and the sample size is sufficiently large, the normal model will describe the sampling distribution of the sample proportion quite well; when the observations violate the conditions, the normal model can be inaccurate.\nParticularly, it can underestimate the variability of the sample proportion.\n\n::: {.important data-latex=\"\"}\n**Sampling distribution of** $\\hat{p}.$\n\nThe sampling distribution for $\\hat{p}$ based on a sample of size $n$ from a population with a true proportion $p$ is nearly normal when:\n\n1. The sample's observations are independent, e.g., are from a simple random sample.\n2. We expected to see at least 10 successes and 10 failures in the sample, i.e., $np\\geq10$ and $n(1-p)\\geq10.$ This is called the **success-failure condition**.\n\nWhen these conditions are met, then the sampling distribution of $\\hat{p}$ is nearly normal with mean $p$ and standard error of $\\hat{p}$ as $SE = \\sqrt{\\frac{\\ \\hat{p}(1-\\hat{p})\\ }{n}}.$\n:::\n\nRecall that the margin of error is defined by the standard error.\nThe margin of error for $\\hat{p}$ can be directly obtained from $SE(\\hat{p}).$\n\n::: {.important data-latex=\"\"}\n**Margin of error for** $\\hat{p}.$\n\nThe margin of error is $z^\\star \\times \\sqrt{\\frac{\\ \\hat{p}(1-\\hat{p})\\ }{n}}$ where $z^\\star$ is calculated from a specified percentile on the normal distribution.\n:::\n\n\\index{success-failure condition} \\index{standard error (SE)!single proportion}\n\n\n\n\n\nTypically we do not know the true proportion $p,$ so we substitute some value to check conditions and estimate the standard error.\nFor confidence intervals, the sample proportion $\\hat{p}$ is used to check the success-failure condition and compute the standard error.\nFor hypothesis tests, typically the null value -- that is, the proportion claimed in the null hypothesis -- is used in place of $p.$\n\nThe independence condition is a more nuanced requirement.\nWhen it isn't met, it is important to understand how and why it is violated.\nFor example, there exist no statistical methods available to truly correct the inherent biases of data from a convenience sample.\nOn the other hand, if we took a cluster sample (see Section \\@ref(samp-methods)), the observations wouldn't be independent, but suitable statistical methods are available for analyzing the data (but they are beyond the scope of even most second or third courses in statistics).\n\n::: {.workedexample data-latex=\"\"}\nIn the examples based on large sample theory, we modeled $\\hat{p}$ using the normal distribution.\nWhy is this not appropriate for the case study on the medical consultant?\n\n------------------------------------------------------------------------\n\nThe independence assumption may be reasonable if each of the surgeries is from a different surgical team.\nHowever, the success-failure condition is not satisfied.\nUnder the null hypothesis, we would anticipate seeing $62 \\times 0.10 = 6.2$ complications, not the 10 required for the normal approximation.\n:::\n\nWhile this book is scoped to well-constrained statistical problems, do remember that this is just the first book in what is a large library of statistical methods that are suitable for a very wide range of data and contexts.\n\n### Confidence interval for a proportion\n\n\\index{point estimate!single proportion}\n\nA confidence interval provides a range of plausible values for the parameter $p,$ and when $\\hat{p}$ can be modeled using a normal distribution, the confidence interval for $p$ takes the form $\\hat{p} \\pm z^{\\star} \\times SE.$ We have seen $\\hat{p}$ to be the sample proportion.\nThe value $z^{\\star}$ determines the confidence level (previously set to be 1.96) and will be discussed in detail in the examples following.\nThe value of the standard error, $SE,$ depends heavily on the sample size.\n\n::: {.important data-latex=\"\"}\n**Standard error of one proportion,** $\\hat{p}.$\n\nWhen the conditions are met so that the distribution of $\\hat{p}$ is nearly normal, the **variability** of a single proportion, $\\hat{p}$ is well described by:\n\n$$SE(\\hat{p}) = \\sqrt{\\frac{p(1-p)}{n}}$$\n\nNote that we almost never know the true value of $p.$ A more helpful formula to use is:\n\n$$SE(\\hat{p}) \\approx \\sqrt{\\frac{(\\mbox{best guess of }p)(1 - \\mbox{best guess of }p)}{n}}$$\n\nFor hypothesis testing, we often use $p_0$ as the best guess of $p.$ For confidence intervals, we typically use $\\hat{p}$ as the best guess of $p.$\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nConsider taking many polls of registered voters (i.e., random samples) of size 300 asking them if they support legalized marijuana.\nIt is suspected that about 2/3 of all voters support legalized marijuana.\nTo understand how the sample proportion $(\\hat{p})$ would vary across the samples, calculate the standard error of $\\hat{p}.$[^16-inference-one-prop-4]\n:::\n\n[^16-inference-one-prop-4]: Because the $p$ is unknown but expected to be around 2/3, we will use 2/3 in place of $p$ in the formula for the standard error.\\\n $SE = \\sqrt{\\frac{p(1-p)}{n}} \\approx \\sqrt{\\frac{2/3 (1 - 2/3)} {300}} = 0.027.$\n\n\\clearpage\n\n### Variability of the sample proportion\n\n::: {.workedexample data-latex=\"\"}\nA simple random sample of 826 payday loan borrowers was surveyed to better understand their interests around regulation and costs.\n70% of the responses supported new regulations on payday lenders.\n\n1. Is it reasonable to model the variability of $\\hat{p}$ from sample to sample using a normal distribution?\n\n2. Estimate the standard error of $\\hat{p}.$\n\n3. Construct a 95% confidence interval for $p,$ the proportion of payday borrowers who support increased regulation for payday lenders.\n\n------------------------------------------------------------------------\n\n1. The data are a random sample, so it is reasonable to assume that the observations are independent and representative of the population of interest.\n\nWe also must check the success-failure condition, which we do using $\\hat{p}$ in place of $p$ when computing a confidence interval:\n\n$$\n\\begin{aligned}\n \\text{Support: }\n n p &\n \\approx 826 \\times 0.70\n = 578\\\\\n \\text{Not: }\n n (1 - p) &\n \\approx 826 \\times (1 - 0.70)\n = 248\n\\end{aligned}\n$$\n\nSince both values are at least 10, we can use the normal distribution to model $\\hat{p}.$\n\n2. Because $p$ is unknown and the standard error is for a confidence interval, use $\\hat{p}$ in place of $p$ in the formula.\n\n$$SE = \\sqrt{\\frac{p(1-p)}{n}} \\approx \\sqrt{\\frac{0.70 (1 - 0.70)} {826}} = 0.016.$$\n\n3. Using the point estimate 0.70, $z^{\\star} = 1.96$ for a 95% confidence interval, and the standard error $SE = 0.016$ from the previous Guided Practice, the confidence interval is\n\n$$ \n\\begin{aligned}\n\\text{point estimate} \\ &\\pm \\ z^{\\star} \\times \\ SE \\\\\n0.70 \\ &\\pm \\ 1.96 \\ \\times \\ 0.016 \\\\ \n(0.669 \\ &, \\ 0.731)\n\\end{aligned}\n$$\n\nWe are 95% confident that the true proportion of payday borrowers who supported regulation at the time of the poll was between 0.669 and 0.731.\n:::\n\n::: {.important data-latex=\"\"}\n**Constructing a confidence interval for a single proportion.**\n\nThere are three steps to constructing a confidence interval for $p.$\n\n1. Check if it seems reasonable to assume the observations are independent and check the success-failure condition using $\\hat{p}.$ If the conditions are met, the sampling distribution of $\\hat{p}$ may be well-approximated by the normal model.\n2. Construct the standard error using $\\hat{p}$ in place of $p$ in the standard error formula.\n3. Apply the general confidence interval formula.\n:::\n\nFor additional one-proportion confidence interval examples, see Section \\@ref(ConfidenceIntervals).\n\n### Changing the confidence level\n\n\\index{confidence level}\n\nSuppose we want to consider confidence intervals where the confidence level is somewhat higher than 95%: perhaps we would like a confidence level of 99%.\nThink back to the analogy about trying to catch a fish: if we want to be more sure that we will catch the fish, we should use a wider net.\nTo create a 99% confidence level, we must also widen our 95% interval.\nOn the other hand, if we want an interval with lower confidence, such as 90%, we could make our original 95% interval slightly slimmer.\n\nThe 95% confidence interval structure provides guidance in how to make intervals with new confidence levels.\nBelow is a general 95% confidence interval for a point estimate that comes from a nearly normal distribution:\n\n$$\\text{point estimate} \\ \\pm \\ 1.96 \\ \\times \\ SE$$\n\nThere are three components to this interval: the point estimate, \"1.96\", and the standard error.\nThe choice of $1.96 \\times SE$ was based on capturing 95% of the data since the estimate is within 1.96 standard errors of the true value about 95% of the time.\nThe choice of 1.96 corresponds to a 95% confidence level.\n\n::: {.guidedpractice data-latex=\"\"}\nIf $X$ is a normally distributed random variable, how often will $X$ be within 2.58 standard deviations of the mean?[^16-inference-one-prop-5]\n:::\n\n[^16-inference-one-prop-5]: This is equivalent to asking how often the $Z$ score will be larger than -2.58 but less than 2.58.\n (For a picture, see Figure \\@ref(fig:choosingZForCI).) To determine this probability, look up -2.58 and 2.58 in the normal probability table (0.0049 and 0.9951).\n Thus, there is a $0.9951-0.0049 \\approx 0.99$ probability that the unobserved random variable $X$ will be within 2.58 standard deviations of the mean.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![(ref:choosingZForCI-cap)](16-inference-one-prop_files/figure-html/choosingZForCI-1.png){width=90%}\n:::\n:::\n\n\n(ref:choosingZForCI-cap) The area between -$z^{\\star}$ and $z^{\\star}$ increases as $|z^{\\star}|$ becomes larger. If the confidence level is 99%, we choose $z^{\\star}$ such that 99% of the normal curve is between -$z^{\\star}$ and $z^{\\star},$ which corresponds to 0.5% in the lower tail and 0.5% in the upper tail: $z^{\\star}=2.58.$\n\n\\index{confidence interval}\n\nTo create a 99% confidence interval, change 1.96 in the 95% confidence interval formula to be $2.58.$ The previous Guided Practice highlights that 99% of the time a normal random variable will be within 2.58 standard deviations of its mean.\nThis approach -- using the Z scores in the normal model to compute confidence levels -- is appropriate when the point estimate is associated with a normal distribution and we can properly compute the standard error.\nThus, the formula for a 99% confidence interval is:\n\n$$\\text{point estimate} \\ \\pm \\ 2.58 \\ \\times \\ SE$$\n\nThe normal approximation is crucial to the precision of the $z^\\star$ confidence intervals (in contrast to the bootstrap percentile confidence intervals).\nWhen the normal model is not a good fit, we will use alternative distributions that better characterize the sampling distribution or we will use bootstrapping procedures.\n\n::: {.guidedpractice data-latex=\"\"}\nCreate a 99% confidence interval for the impact of the stent on the risk of stroke using the data from Section \\@ref(case-study-stents-strokes).\nThe point estimate is 0.090, and the standard error is $SE = 0.028.$ It has been verified for you that the point estimate can reasonably be modeled by a normal distribution.[^16-inference-one-prop-6]\n:::\n\n[^16-inference-one-prop-6]: Since the necessary conditions for applying the normal model have already been checked for us, we can go straight to the construction of the confidence interval: $\\text{point estimate} \\pm 2.58 \\times SE$ Which gives an interval of (0.018, 0.162).\\$ We are 99% confident that implanting a stent in the brain of a patient who is at risk of stroke increases the risk of stroke within 30 days by a rate of 0.018 to 0.162 (assuming the patients are representative of the population).\n\n::: {.important data-latex=\"\"}\n**Mathematical model confidence interval for any confidence level.**\n\nIf the point estimate follows the normal model with standard error $SE,$ then a confidence interval for the population parameter is\n\n$$\\text{point estimate} \\ \\pm \\ z^{\\star} \\ \\times \\ SE$$\n\nwhere $z^{\\star}$ corresponds to the confidence level selected.\n:::\n\nFigure \\@ref(fig:choosingZForCI) provides a picture of how to identify $z^{\\star}$ based on a confidence level.\nWe select $z^{\\star}$ so that the area between -$z^{\\star}$ and $z^{\\star}$ in the normal model corresponds to the confidence level.\n\n::: {.guidedpractice data-latex=\"\"}\nPreviously, we found that implanting a stent in the brain of a patient at risk for a stroke *increased* the risk of a stroke.\nThe study estimated a 9% increase in the number of patients who had a stroke, and the standard error of this estimate was about $SE = 2.8%.$ Compute a 90% confidence interval for the effect.[^16-inference-one-prop-7]\n:::\n\n[^16-inference-one-prop-7]: We must find $z^{\\star}$ such that 90% of the distribution falls between -$z^{\\star}$ and $z^{\\star}$ in the standard normal model, $N(\\mu=0, \\sigma=1).$ We can look up -$z^{\\star}$ in the normal probability table by looking for a lower tail of 5% (the other 5% is in the upper tail), thus $z^{\\star} = 1.65.$ The 90% confidence interval can then be computed as $\\text{point estimate} \\pm 1.65 \\times SE \\to (4.4\\%, 13.6\\%).$ (Note: the conditions for normality had earlier been confirmed for us.) That is, we are 90% confident that implanting a stent in a stroke patient's brain increased the risk of stroke within 30 days by 4.4% to 13.6%.\\\n Note, the problem was set up as 90% to indicate that there was not a need for a high level of confidence (such as 95% or 99%).\n A lower degree of confidence increases potential for error, but it also produces a more narrow interval.\n\n### Hypothesis test for a proportion\n\n::: {.important data-latex=\"\"}\n**The test statistic for assessing a single proportion is a Z.**\n\nThe **Z score** is a ratio of how the sample proportion differs from the hypothesized proportion as compared to the expected variability of the $\\hat{p}$ values.\n\n$$Z = \\frac{\\hat{p} - p_0}{\\sqrt{p_0(1 - p_0)/n}}$$\n\nWhen the null hypothesis is true and the conditions are met, Z has a standard normal distribution.\n\nConditions:\n\n- independent observations\\\n- large samples $(n p_0 \\geq 10$ and $n (1-p_0) \\geq 10)$\\\n:::\n\n\n\n\n\nOne possible regulation for payday lenders is that they would be required to do a credit check and evaluate debt payments against the borrower's finances.\nWe would like to know: would borrowers support this form of regulation?\n\n::: {.guidedpractice data-latex=\"\"}\nSet up hypotheses to evaluate whether borrowers have a majority support for this type of regulation.[^16-inference-one-prop-8]\n:::\n\n[^16-inference-one-prop-8]: $H_0:$ there is not support for the regulation; $H_0:$ $p \\leq 0.50.$ $H_A:$ the majority of borrowers support the regulation; $H_A:$ $p > 0.50.$\n\nTo apply the normal distribution framework in the context of a hypothesis test for a proportion, the independence and success-failure conditions must be satisfied.\nIn a hypothesis test, the success-failure condition is checked using the null proportion: we verify $np_0$ and $n(1-p_0)$ are at least 10, where $p_0$ is the null value.\n\n::: {.guidedpractice data-latex=\"\"}\nDo payday loan borrowers support a regulation that would require lenders to pull their credit report and evaluate their debt payments?\nFrom a random sample of 826 borrowers, 51% said they would support such a regulation.\nIs it reasonable to use a normal distribution to model $\\hat{p}$ for a hypothesis test here?[^16-inference-one-prop-9]\n:::\n\n[^16-inference-one-prop-9]: Independence holds since the poll is based on a random sample.\n The success-failure condition also holds, which is checked using the null value $(p_0 = 0.5)$ from $H_0:$ $np_0 = 826 \\times 0.5 = 413,$ $n(1 - p_0) = 826 \\times 0.5 = 413.$ Recall that here, the best guess for $p$ is $p_0$ which comes from the null hypothesis (because we assume the null hypothesis is true when performing the testing procedure steps).\n $H_0:$ there is not support for the regulation; $H_0:$ $p \\leq 0.50.$ $H_A:$ the majority of borrowers support the regulation; $H_A:$ $p > 0.50.$\n\n::: {.workedexample data-latex=\"\"}\nUsing the hypotheses and data from the previous Guided Practices, evaluate whether the poll on lending regulations provides convincing evidence that a majority of payday loan borrowers support a new regulation that would require lenders to pull credit reports and evaluate debt payments.\n\n------------------------------------------------------------------------\n\nWith hypotheses already set up and conditions checked, we can move onto calculations.\nThe standard error in the context of a one-proportion hypothesis test is computed using the null value, $p_0:$\n\n$$SE = \\sqrt{\\frac{p_0 (1 - p_0)}{n}} = \\sqrt{\\frac{0.5 (1 - 0.5)}{826}} = 0.017$$\n\nA picture of the normal model is shown with the p-value represented by the shaded region.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](16-inference-one-prop_files/figure-html/unnamed-chunk-8-1.png){width=90%}\n:::\n:::\n\n\nBased on the normal model, the test statistic can be computed as the Z score of the point estimate:\n\n$$\n\\begin{aligned}\nZ &= \\frac{\\text{point estimate} - \\text{null value}}{SE} \\\\\n &= \\frac{0.51 - 0.50}{0.017} \\\\\n &= 0.59\n\\end{aligned} \n$$\n\nThe single tail area which represents the p-value is 0.2776.\nBecause the p-value is larger than 0.05, we do not reject $H_0.$ The poll does not provide convincing evidence that a majority of payday loan borrowers support regulations around credit checks and evaluation of debt payments.\n\nIn Section \\@ref(two-prop-errors) we discuss two-sided hypothesis tests of which the payday example may have been better structured.\nThat is, we might have wanted to ask whether the borrows **support or oppose** the regulations (to study opinion in either direction away from the 50% benchmark).\nIn that case, the p-value would have been doubled to 0.5552 (again, we would not reject $H_0).$ In the two-sided hypothesis setting, the appropriate conclusion would be to claim that the poll does not provide convincing evidence that a majority of payday loan borrowers support or oppose regulations around credit checks and evaluation of debt payments.\n\nIn both the one-sided or two-sided setting, the conclusion is somewhat unsatisfactory because there is no conclusion.\nThat is, there is no resolution one way or the other about public opinion.\nWe cannot claim that exactly 50% of people support the regulation, but we cannot claim a majority in either direction.\n:::\n\n::: {.important data-latex=\"\"}\n**Mathematical model hypothesis test for a proportion.**\n\nSet up hypotheses and verify the conditions using the null value, $p_0,$ to ensure $\\hat{p}$ is nearly normal under $H_0.$ If the conditions hold, construct the standard error, again using $p_0,$ and show the p-value in a drawing.\nLastly, compute the p-value and evaluate the hypotheses.\n:::\n\nFor additional one-proportion hypothesis test examples, see Section \\@ref(HypothesisTesting).\n\n### Violating conditions\n\nWe've spent a lot of time discussing conditions for when $\\hat{p}$ can be reasonably modeled by a normal distribution.\nWhat happens when the success-failure condition fails?\nWhat about when the independence condition fails?\nIn either case, the general ideas of confidence intervals and hypothesis tests remain the same, but the strategy or technique used to generate the interval or p-value change.\n\nWhen the success-failure condition isn't met for a hypothesis test, we can simulate the null distribution of $\\hat{p}$ using the null value, $p_0,$ as seen in Section \\@ref(one-prop-null-boot).\nUnfortunately, methods for dealing with observations which are not independent (e.g., repeated measurements on subject, such as in studies where measurements from the same subjects are taken pre and post study) are outside the scope of this book.\n\n\\vspace{10mm}\n\n## Chapter review {#chp16-review}\n\n### Summary\n\nBuilding on the foundational ideas from the previous few ideas, this chapter focused exclusively on the single population proportion as the parameter of interest.\nNote that it is not possible to do a randomization test with only one variable, so to do computational hypothesis testing, we applied a bootstrapping framework.\nThe bootstrap confidence interval and the mathematical framework for both hypothesis testing and confidence intervals are similar to those applied to other data structures and parameters.\nWhen using the mathematical model, keep in mind the success-failure conditions.\nAdditionally, know that bootstrapping is always more accurate with larger samples.\n\n### Terms\n\nWe introduced the following terms in the chapter.\nIf you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.\nWe are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.\nHowever, you should be able to easily spot them as **bolded text**.\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n\n
parametric bootstrap success-failure condition
SE single proportion Z score
\n\n`````\n:::\n:::\n\n\n\\clearpage\n\n## Exercises {#chp16-exercises}\n\nAnswers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-16].\n\n::: {.exercises data-latex=\"\"}\n1. **Do aliens exist?**\nIn May 2021, YouGov asked 4,839 adult Great Britain residents whether they think aliens exist, and if so, if they have or have not visited Earth.\nYou want to evaluate if more than a quarter (25\\%) of Great Britain adults think aliens do not exist.\nIn the survey 22\\% responded \"I think they exist, and have visited Earth\", 28\\% responded \"I think they exist, but have not visited Earth\", 29% responded \"I do not think they exist\", and 22\\% responded \"Don't know\".\nA friend of yours offers to help you with setting up the hypothesis test and comes up with the following hypotheses.\nIndicate any errors you see.\n\n $H_0: \\hat{p} = 0.29 \\quad \\quad H_A: \\hat{p} > 0.29$\n \n \\vspace{5mm}\n\n1. **Married at 25.**\nA study suggests that the 25% of 25 year olds have gotten married.\nYou believe that this is incorrect and decide to collect your own sample for a hypothesis test.\nFrom a random sample of 25 year olds in census data with size 776, you find that 24% of them are married.\nA friend of yours offers to help you with setting up the hypothesis test and comes up with the following hypotheses. Indicate any errors you see.\n\n $H_0: \\hat{p} = 0.24 \\quad \\quad H_A: \\hat{p} \\neq 0.24$\n \n \\vspace{5mm}\n\n1. **Defund the police.**\nA Survey USA poll conducted in Seattle, WA in May 2021 reports that of the 650 respondents (adults living in this area), 159 support proposals to defund police departments. [@data:defundpolice]\n\n a. A journalist writing a news story on the poll results wants to use the headline \"More than 1 in 5 adults living in Seattle support proposals to defund police departments.\" You caution the journalist that they should first conduct a hypothesis test to see if the poll data provide convincing evidence for this claim. Write the hypotheses for this test.\n \n b. Calculate the proportion of Seattle adults in this sample who support proposals to defund police departments.\n \n c. Describe a setup for a simulation that would be appropriate in this situation and how the p-value can be calculated using the simulation results.\n \n d. Below is a histogram showing the distribution of $\\hat{p}_{sim}$ in 1,000 simulations under the null hypothesis. Estimate the p-value using the plot and use it to evaluate the hypotheses.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-10-1.png){width=90%}\n :::\n :::\n \n \\clearpage\n\n1. **Assisted reproduction.**\nAssisted Reproductive Technology (ART) is a collection of techniques that help facilitate pregnancy (e.g., in vitro fertilization). The 2018 ART Fertility Clinic Success Rates Report published by the Centers for Disease Control and Prevention reports that ART has been successful in leading to a live birth in 48.8% of cases where the patient is under 35 years old. [@web:art2018] A new fertility clinic claims that their success rate is higher than average for this age group. A random sample of 30 of their patients yielded a success rate of 60%. A consumer watchdog group would like to determine if this provides strong evidence to support the company's claim.\n\n a. Write the hypotheses to test if the success rate for ART at this clinic is discernibly higher than the success rate reported by the CDC.\n\n b. Describe a setup for a simulation that would be appropriate in this situation and how the p-value can be calculated using the simulation results.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-11-1.png){width=90%}\n :::\n :::\n\n c. Below is a histogram showing the distribution of $\\hat{p}_{sim}$ in 1,000 simulations under the null hypothesis. Estimate the p-value using the plot and use it to evaluate the hypotheses.\n\n d. After performing this analysis, the consumer group releases the following news headline: \"Infertility clinic falsely advertises better success rates\". Comment on the appropriateness of this statement.\n \n \\clearpage\n\n1. **If I fits, I sits, bootstrap test.**\nA citizen science project on which type of enclosed spaces cats are most likely to sit in compared (among other options) two different spaces taped to the ground. The first was a square, and the second was a shape known as [Kanizsa square illusion](https://en.wikipedia.org/wiki/Illusory_contours#Kanizsa_figures). When comparing the two options given to 7 cats, 5 chose the square, and 2 chose the Kanizsa square illusion. We are interested to know whether these data provide convincing evidence that cats prefer one of the shapes over the other. [@Smith:2021]\n \n a. What are the null and alternative hypotheses for evaluating whether these data provide convincing evidence that cats have preference for one of the shapes\n \n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below.Find the p-value using this distribution and conclude the hypothesis test in the context of the problem.\n \n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-12-1.png){width=90%}\n :::\n :::\n\n1. **Legalization of marijuana, bootstrap test.**\nThe 2018 General Social Survey asked a random sample of 1,563 US adults: \"Do you think the use of marijuana should be made legal, or not?\" 60% of the respondents said it should be made legal. [@data:gssgrass] Consider a scenario where, in order to become legal, 55% (or more) of voters must approve.\n \n a. What are the null and alternative hypotheses for evaluating whether these data provide convincing evidence that, if voted on, marijuana would be legalized in the US.\n \n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below. Find the p-value using this distribution and conclude the hypothesis test in the context of the problem.\n \n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-13-1.png){width=90%}\n :::\n :::\n \n \\clearpage\n\n1. **If I fits, I sits, standard errors.**\nThe results of a study on the type of enclosed spaces cats are most likely to sit in show that 5 out of 7 cats chose a square taped to the ground over a shape known as [Kanizsa square illusion](https://en.wikipedia.org/wiki/Illusory_contours#Kanizsa_figures), which was preferred by the remaining 2 cats. To evaluate whether these data provide convincing evidence that cats prefer one of the shapes over the other, we set $H_0: p = 0.5$, where $p$ is the population proportion of cats who prefer square over the Kanizsa square illusion and $H_A: p \\neq 0.5$, which suggests some preference, without specifying which shape is more preferred. [@Smith:2021]\n\n a. Using the mathematical model, calculate the standard error of the sample proportion in repeated samples of size 7.\n \n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below. This distribution shows the variability of the sample proportion in samples of size 7 when 50% of cats prefer the square shape over the Kanizsa square illusion. What is the approximate standard error of the sample proportion based on this distribution?\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-14-1.png){width=90%}\n :::\n :::\n \n c. Do the mathematical model and parametric bootstrap give similar standard errors?\n \n d. In order to approach the problem using the mathematical model, is the success-failure condition met for this study?Explain.\n \n e. What about the null distribution shown above (generated using the parametric bootstrap) tells us that the mathematical model should probably not be used?\n \n \\clearpage\n\n1. **Legalization of marijuana, standard errors.**\nAccording to the 2018 General Social Survey, in a random sample of 1,563 US adults, 60% think marijuana should be made legal. [@data:gssgrass] Consider a scenario where, in order to become legal, 55% (or more) of voters must approve.\n\n a. Calculate the standard error of the sample proportion using the mathematical model.\n\n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below. This distribution shows the variability of the sample proportion in samples of size 1,563 when 55% of voters approve legalizing marijuana. What is the approximate standard error of the sample proportion based on this distribution?\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-15-1.png){width=90%}\n :::\n :::\n \n c. Do the mathematical model and parametric bootstrap give similar standard errors?\n \n d. In this setting (to test whether the true underlying population proportion is greater than 0.55), would there be a strong reason to choose the mathematical model over the parametric bootstrap (or vice versa)?\n \n \\clearpage\n\n1. **Statistics and employment, describe the bootstrap.**\nA large university knows that about 70% of the full-time students are employed at least 5 hours per week. The members of the Statistics Department wonder if the same proportion of their students work at least 5 hours per week. They randomly sample 25 majors and find that 15 of the students work 5 or more hours each week.\n\n Two bootstrap sampling distributions are created to describe the variability in the proportion of statistics majors who work at least 5 hours per week. The parametric bootstrap imposes a true population proportion of $p = 0.7$ while the data bootstrap resamples from the actual data (which has 60% of the observations who work at least 5 hours per week).\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-16-1.png){width=100%}\n :::\n :::\n \n a. The bootstrap sampling was done under two different settings to generate each of the distributions shown above. Describe the two different settings.\n\n b. Where are each of the two distributions centered? Are they centered at roughly the same place?\n \n c. Estimate the standard error of the simulated proportions based on each distribution. Are the two standard errors you estimate roughly equal?\n \n d. Describe the shapes of the two distributions. Are they roughly the same?\n \n \\clearpage\n\n1. **National Health Plan, parametric bootstrap.**\nA Kaiser Family Foundation poll for a random sample of US adults in 2019 found that 79% of Democrats, 55% of Independents, and 24% of Republicans supported a generic \"National Health Plan\". \nThere were 347 Democrats, 298 Republicans, and 617 Independents surveyed. [@data:KFF2019nathealthplan]\n\n A political pundit on TV claims that a majority of Independents support a National Health Plan. Do these data provide strong evidence to support this type of statement? One approach to assessing the question of whether a majority of Independents support a National Health Plan is to simulate 1,000 parametric bootstrap samples with $p = 0.5$ as the proportion of Independents in support.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-17-1.png){width=90%}\n :::\n :::\n \n a. The histogram above displays 1000 values of what? \n\n b. Is the observed proportion of Independents consistent with the parametric bootstrap proportions under the setting where $p=0.5?$\n \n c. In order to test the claim that \"a majority of Independents support a National Health Plan\" what are the null and alternative hypotheses?\n \n d. Using the parametric bootstrap distribution, find the p-value and conclude the hypothesis test in the context of the problem.\n \n \\clearpage\n\n1. **Statistics and employment, use the bootstrap.**\nIn a large university where 70% of the full-time students are employed at least 5 hours per week, the members of the Statistics Department wonder if the same proportion of their students work at least 5 hours per week. They randomly sample 25 majors and find that 15 of the students work 5 or more hours each week.\n\n Two bootstrap sampling distributions are created to describe the variability in the proportion of statistics majors who work at least 5 hours per week. The parametric bootstrap imposes a true population proportion of $p=0.7$ while the data bootstrap resamples from the actual data (which has 60% of the observations who work at least 5 hours per week).\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-18-1.png){width=100%}\n :::\n :::\n \n a. Which bootstrap distribution should be used to test whether the proportion of all statistics majors who work at least 5 hours per week is 70%? And which bootstrap distribution should be used to find a confidence interval for the true poportion of statistics majors who work at least 5 hours per week?\n \n b. Using the appropriate histogram, test the claim that 70% of statistics majors, like their peers, work at least 5 hours per week. State the null and alternative hypotheses, find the p-value, and conclude the test in the context of the problem.\n \n c. Using the appropriate histogram, find a 98% bootstrap percentile confidence interval for the true proportion of statistics majors who work at least 5 hours per week. Interpret the confidence interval in the context of the problem.\n \n d. Using the appropriate historgram, find a 98% bootstrap SE confidence interval for the true proportion of statistics majors who work at least 5 hours per week. Interpret the confidence interval in the context of the problem.\n \n \\vspace{5mm}\n\n1. **CLT for proportions.**\nDefine the term \"sampling distribution\" of the sample proportion, and describe how the shape, center, and spread of the sampling distribution change as the sample size increases when $p = 0.1$.\n\n \\clearpage\n\n1. **Vegetarian college students.**\nSuppose that 8% of college students are vegetarians. Determine if the following statements are true or false, and explain your reasoning.\n\n a. The distribution of the sample proportions of vegetarians in random samples of size 60 is approximately normal since $n \\ge 30$.\n\n b. The distribution of the sample proportions of vegetarian college students in random samples of size 50 is right skewed.\n\n c. A random sample of 125 college students where 12% are vegetarians would be considered unusual.\n\n d. A random sample of 250 college students where 12% are vegetarians would be considered unusual.\n\n e. The standard error would be reduced by one-half if we increased the sample size from 125 to 250.\n\n1. **Young Americans, American dream.** \nAbout 77% of young adults think they can achieve the American dream. \nDetermine if the following statements are true or false, and explain your reasoning. [@news:youngAmericans1]\n\n a. The distribution of sample proportions of young Americans who think they can achieve the American dream in random samples of size 20 is left skewed.\n\n b. The distribution of sample proportions of young Americans who think they can achieve the American dream in random samples of size 40 is approximately normal since $n \\ge 30$.\n\n c. A random sample of 60 young Americans where 85% think they can achieve the American dream would be considered unusual.\n\n d. A random sample of 120 young Americans where 85% think they can achieve the American dream would be considered unusual.\n\n1. **Orange tabbies.** \nSuppose that 90% of orange tabby cats are male.\nDetermine if the following statements are true or false, and explain your reasoning.\n\n a. The distribution of sample proportions of random samples of size 30 is left skewed.\n\n b. Using a sample size that is 4 times as large will reduce the standard error of the sample proportion by one-half.\n\n c. The distribution of sample proportions of random samples of size 140 is approximately normal.\n\n d. The distribution of sample proportions of random samples of size 280 is approximately normal.\n\n1. **Young Americans, starting a family.**\nAbout 25% of young Americans have delayed starting a family due to the continued economic slump.\nDetermine if the following statements are true or false, and explain your reasoning. [@news:youngAmericans2]\n\n a. The distribution of sample proportions of young Americans who have delayed starting a family due to the continued economic slump in random samples of size 12 is right skewed.\n\n b. In order for the distribution of sample proportions of young Americans who have delayed starting a family due to the continued economic slump to be approximately normal, we need random samples where the sample size is at least 40.\n\n c. A random sample of 50 young Americans where 20% have delayed starting a family due to the continued economic slump would be considered unusual.\n\n d. A random sample of 150 young Americans where 20% have delayed starting a family due to the continued economic slump would be considered unusual.\n\n e. Tripling the sample size will reduce the standard error of the sample proportion by one-third.\n \n \\clearpage\n\n1. **Sex equality.**\nThe General Social Survey asked a random sample of 1,390 Americans the following question: \"On the whole, do you think it should or should not be the government's responsibility to promote equality between men and women?\" 82% of the respondents said it \"should be\". At a 95% confidence level, this sample has 2% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning. [@data:gsssexeq]\n\n a. We are 95% confident that between 80% and 84% of Americans in this sample think it's the government's responsibility to promote equality between men and women.\n\n b. We are 95% confident that between 80% and 84% of all Americans think it's the government's responsibility to promote equality between men and women.\n\n c. If we considered many random samples of 1,390 Americans, and we calculated 95% confidence intervals for each, 95% of these intervals would include the true population proportion of Americans who think it's the government's responsibility to promote equality between men and women.\n\n d. In order to decrease the margin of error to 1%, we would need to quadruple (multiply by 4) the sample size.\n\n e. Based on this confidence interval, there is sufficient evidence to conclude that a majority of Americans think it's the government's responsibility to promote equality between men and women.\n \n \\vspace{3mm}\n\n1. **Elderly drivers.** \nThe Marist Poll published a report stating that 66% of adults nationally think licensed drivers should be required to retake their road test once they reach 65 years of age. It was also reported that interviews were conducted on a random sample of 1,018 American adults, and that the margin of error was 3% using a 95% confidence level. [@data:elderlyDriving]\n\n a. Verify the margin of error reported by The Marist Poll using a mathematical model.\n\n b. Based on a 95% confidence interval, does the poll provide convincing evidence that *more than* two thirds of the population think that licensed drivers should be required to retake their road test once they turn 65?\n \n \\vspace{3mm}\n\n1. **Fireworks on July 4$^{\\text{th}}$.** \nA local news outlet reported that 56% of 600 randomly sampled Kansas residents planned to set off fireworks on July $4^{th}$. \nDetermine the margin of error for the 56% point estimate using a 95% confidence level using a mathematical model. [@data:july4]\n\n \\vspace{3mm}\n\n1. **Proof of COVID-19 vaccination.**\nA Gallup poll surveyed 3,731 randomly sampled US in April 2021, asking how they felt about requiring proof of COVID-19 vaccination for travel by airplane. \nThe poll found that 57% said they would favor it. [@data:gallupcovidvaccine]\n\n a. Describe the population parameter of interest. What is the value of the point estimate of this parameter?\n\n b. Check if the conditions required for constructing a confidence interval using a mathematical model based on these data are met.\n\n c. Construct a 95% confidence interval for the proportion of US adults who favor requiring proof of COVID-19 vaccination for travel by airplane.\n\n d. Without doing any calculations, describe what would happen to the confidence interval if we decided to use a higher confidence level.\n\n e. Without doing any calculations, describe what would happen to the confidence interval if we used a larger sample.\n \n \\clearpage\n\n1. **Study abroad.** \nA survey on 1,509 high school seniors who took the SAT and who completed an optional web survey shows that 55% of high school seniors are fairly certain that they will participate in a study abroad program in college. [@data:studyAbroad]\n\n a. Is this sample a representative sample from the population of all high school seniors in the US? Explain your reasoning.\n\n b. Let's suppose the conditions for inference are met. Even if your answer to part (a) indicated that this approach would not be reliable, this analysis may still be interesting to carry out (though not report). Using a mathematical model, construct a 90% confidence interval for the proportion of high school seniors (of those who took the SAT) who are fairly certain they will participate in a study abroad program in college, and interpret this interval in context.\n\n c. What does \"90% confidence\" mean?\n\n d. Based on this interval, would it be appropriate to claim that the majority of high school seniors are fairly certain that they will participate in a study abroad program in college?\n\n1. **Legalization of marijuana, mathematical interval.**\nThe General Social Survey asked a random sample of 1,563 US adults: \"Do you think the use of marijuana should be made legal, or not?\" 60% of the respondents said it should be made legal. [@data:gssgrass]\n\n a. Is 60% a sample statistic or a population parameter? Explain.\n\n b. Using a mathematical model, construct a 95% confidence interval for the proportion of US adults who think marijuana should be made legal, and interpret it.\n\n c. A critic points out that this 95% confidence interval is only accurate if the statistic follows a normal distribution, or if the normal model is a good approximation. Is this true for these data? Explain.\n\n d. A news piece on this survey's findings states, \"Majority of US adults think marijuana should be legalized.\" Based on your confidence interval, is this statement justified?\n\n1. **National Health Plan, mathematical inference.**\nA Kaiser Family Foundation poll for a random sample of US adults in 2019 found that 79% of Democrats, 55% of Independents, and 24% of Republicans supported a generic \"National Health Plan\". \nThere were 347 Democrats, 298 Republicans, and 617 Independents surveyed. [@data:KFF2019nathealthplan]\n\n a. A political pundit on TV claims that a majority of Independents support a National Health Plan. Do these data provide strong evidence to support this type of statement? Your response should use a mathematical model.\n\n b. Would you expect a confidence interval for the proportion of Independents who oppose the public option plan to include 0.5? Explain.\n\n1. **Is college worth it?**\nAmong a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school. [@data:collegeWorthIt]\n\n a. A newspaper article states that only a minority of the Americans who decide not to go to college do so because they cannot afford it and uses the point estimate from this survey as evidence. Conduct a hypothesis test to determine if these data provide strong evidence supporting this statement.\n\n b. Would you expect a confidence interval for the proportion of American adults who decide not to go to college because they cannot afford it to include 0.5? Explain.\n \n \\clearpage\n\n1. **Taste test.**\nSome people claim that they can tell the difference between a diet soda and a regular soda in the first sip.\nA researcher wanting to test this claim randomly sampled 80 such people.\nHe then filled 80 plain white cups with soda, half diet and half regular through random assignment, and asked each person to take one sip from their cup and identify the soda as diet or regular.\n53 participants correctly identified the soda.\n\n a. Do these data provide strong evidence that these people are able to detect the difference between diet and regular soda, in other words, are the results discernibly better than just random guessing? Your response should use a mathematical model.\n\n b. Interpret the p-value in this context.\n \n \\vspace{5mm}\n\n1. **Will the coronavirus bring the world closer together?**\nAn April 2021 YouGov poll asked 4,265 UK adults whether they think the coronavirus bring the world closer together or leave us further apart. \n12% of the respondents said it will bring the world closer together. 37% said it would leave us further apart, 39% said it won't make a difference and the remainder didn't have an opinion on the matter. [@data:yougovcovid]\n\n a. Calculate, using a mathematical model, a 90% confidence interval for the proportion of UK adults who think the coronavirus will bring the world closer together, and interpret the interval in context.\n\n b. Suppose we wanted the margin of error for the 90% confidence level to be about 0.5%. How large of a sample size would you recommend for the poll?\n \n \\vspace{5mm}\n\n1. **Quality control.**\nAs part of a quality control process for computer chips, an engineer at a factory randomly samples 212 chips during a week of production to test the current rate of chips with severe defects. \nShe finds that 27 of the chips are defective.\n\n a. What population is under consideration in the data set?\n\n b. What parameter is being estimated?\n\n c. What is the point estimate for the parameter?\n\n d. What is the name of the statistic that can be used to measure the uncertainty of the point estimate?\n\n e. Compute the value of the statistic from part (d) using a mathematical model.\n\n f. The historical rate of defects is 10%. Should the engineer be surprised by the observed rate of defects during the current week?\n\n g. Suppose the true population value was found to be 10%. If we use this proportion to recompute the value in part (d) using $p = 0.1$ instead of $\\hat{p}$, how much does the resulting value of the statistic change?\n \n \\vspace{5mm}\n\n1. **Nearsighted children.**\nNearsightedness (myopia) is a common vision condition in which you can see objects near to you clearly, but objects farther away are blurry. \nIt is believed that nearsightedness affects about 8% of all children. \nIn a random sample of 194 children, 21 are nearsighted. \nUsing a mathematical model, conduct a hypothesis test for the following question: do these data provide evidence that the 8% value is inaccurate?\n\n \\clearpage\n\n1. **Website registration.**\nA website is trying to increase registration for first-time visitors, exposing 1% of these visitors to a new site design. \nOf 752 randomly sampled visitors over a month who saw the new design, 64 registered.\n\n a. Check the conditions for constructing a confidence interval using a mathematical model.\n\n b. Compute the standard error which would describe the variability associated with repeated samples of size 752.\n\n c. Construct and interpret a 90% confidence interval for the fraction of first-time visitors of the site who would register under the new design (assuming stable behaviors by new visitors over time).\n \n \\vspace{5mm}\n\n1. **Coupons driving visits.**\nA store randomly samples 603 shoppers over the course of a year and finds that 142 of them made their visit because of a coupon they'd received in the mail.\nUsing a mathematical model, construct a 95% confidence interval for the fraction of all shoppers during the year whose visit was because of a coupon they'd received in the mail.\n\n\n:::\n", + "engine": "knitr", + "markdown": "\n\n\n# Inference for a single proportion {#sec-inference-one-prop}\n\n::: {.chapterintro data-latex=\"\"}\nFocusing now on statistical inference for categorical data, we will revisit many of the foundational aspects of hypothesis testing from Chapter \\@ref(foundations-randomization).\n\nThe three data structures we detail are one binary variable, summarized using a single proportion; two binary variables, summarized using a difference of two proportions; and two categorical variables, summarized using a two-way table.\nWhen appropriate, each of the data structures will be analyzed using the three methods from Chapters \\@ref(foundations-randomization), \\@ref(foundations-bootstrapping), and \\@ref(foundations-mathematical): randomization test, bootstrapping, and mathematical models, respectively.\n\nAs we build on the inferential ideas, we will visit new foundational concepts in statistical inference.\nFor example, we will cover the conditions for when a normal model is appropriate; the two different error rates in hypothesis testing; and choosing the confidence level for a confidence interval.\n:::\n\nWe encountered inference methods for a single proportion in Chapter \\@ref(foundations-bootstrapping), exploring point estimates and confidence intervals.\nIn this section, we'll do a review of these topics and how to choose an appropriate sample size when collecting data for single proportion contexts.\n\nNote that there is only one variable being measured in a study which focuses on one proportion.\nFor each observational unit, the single variable is measured as either a success or failure (e.g., \"surgical complication\" vs. \"no surgical complication\").\nBecause the nature of the research question at hand focuses on only a single variable, there is not a way to randomize the variable across a different (explanatory) variable.\nFor this reason, we will not use randomization as an analysis tool when focusing on a single proportion.\nInstead, we will apply bootstrapping techniques to test a given hypothesis, and we will also revisit the associated mathematical models.\n\n\\vspace{-4mm}\n\n## Bootstrap test for a proportion {#one-prop-null-boot}\n\nThe bootstrap simulation concept when $H_0$ is true is similar to the ideas used in the case studies presented in Chapter \\@ref(foundations-bootstrapping) where we bootstrapped without an assumption about $H_0.$ Because we will be testing a hypothesized value of $p$ (referred to as $p_0),$ the bootstrap simulation for hypothesis testing has a fantastic advantage that it can be used for any sample size (a huge benefit for small samples, a nice alternative for large samples).\n\nWe expand on the medical consultant example, see Section \\@ref(case-study-med-consult), where instead of finding an interval estimate for the true complication rate, we work to test a specific research claim.\n\n\\clearpage\n\n### Observed data\n\nRecall the set-up for the example:\n\nPeople providing an organ for donation sometimes seek the help of a special \"medical consultant\".\nThese consultants assist the patient in all aspects of the surgery, with the goal of reducing the possibility of complications during the medical procedure and recovery.\nPatients might choose a consultant based in part on the historical complication rate of the consultant's clients.\nOne consultant tried to attract patients by noting the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated.\nShe claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).\n\n::: {.workedexample data-latex=\"\"}\nUsing the data, is it possible to assess the consultant's claim that her complication rate is less than 10%?\n\n------------------------------------------------------------------------\n\nNo.\nThe claim is that there is a causal connection, but the data are observational.\nPatients who hire this medical consultant may have lower complication rates for other reasons.\n\nWhile it is not possible to assess this causal claim, it is still possible to test for an association using these data.\nFor this question we ask, could the low complication rate of $\\hat{p} = 0.048$ have simply occurred by chance, if her complication rate does not differ from the US standard rate?\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nWrite out hypotheses in both plain and statistical language to test for the association between the consultant's work and the true complication rate, $p,$ for the consultant's clients.[^16-inference-one-prop-1]\n:::\n\n[^16-inference-one-prop-1]: $H_0:$ There is no association between the consultant's contributions and the clients' complication rate.\n In statistical language, $p = 0.10.$ $H_A:$ Patients who work with the consultant tend to have a complication rate lower than 10%, i.e., $p < 0.10.$\n\nBecause, as it turns out, the conditions of working with the normal distribution are not met (see Section \\@ref(one-prop-norm)), the uncertainty associated with the sample proportion should not be modeled using the normal distribution, as doing so would underestimate the uncertainty associated with the sample statistic.\nHowever, we would still like to assess the hypotheses from the previous Guided Practice in absence of the normal framework.\nTo do so, we need to evaluate the possibility of a sample value $(\\hat{p})$ as far below the null value, $p_0 = 0.10$ as what was observed.\nThe deviation of the sample value from the hypothesized parameter is usually quantified with a p-value.\n\nThe p-value is computed based on the null distribution, which is the distribution of the test statistic if the null hypothesis is true.\nSupposing the null hypothesis is true, we can compute the p-value by identifying the probability of observing a test statistic that favors the alternative hypothesis at least as strongly as the observed test statistic.\nHere we will use a bootstrap simulation to calculate the p-value.\n\n\\clearpage\n\n### Variability of the statistic\n\nWe want to identify the sampling distribution of the test statistic $(\\hat{p})$ if the null hypothesis was true.\nIn other words, we want to see the variability we can expect from sample proportions if the null hypothesis was true.\nThen we plan to use this information to decide whether there is enough evidence to reject the null hypothesis.\n\nUnder the null hypothesis, 10% of liver donors have complications during or after surgery.\nSuppose this rate was really no different for the consultant's clients (for *all* the consultant's clients, not just the 62 previously measured).\nIf this was the case, we could *simulate* 62 clients to get a sample proportion for the complication rate from the null distribution.\nSimulating observations using a hypothesized null parameter value is often called a **parametric bootstrap simulation**\\index{parametric bootstrap}.\n\n\n\n\n\nSimilar to the process described in Chapter \\@ref(foundations-bootstrapping), each client can be simulated using a bag of marbles with 10% red marbles and 90% white marbles.\nSampling a marble from the bag (with 10% red marbles) is one way of simulating whether a patient has a complication *if the true complication rate is 10%*.\nIf we select 62 marbles and then compute the proportion of patients with complications in the simulation, $\\hat{p}_{sim1},$ then the resulting sample proportion is a sample from the null distribution.\n\nThere were 5 simulated cases with a complication and 57 simulated cases without a complication, i.e., $\\hat{p}_{sim1} = 5/62 = 0.081.$\n\n::: {.workedexample data-latex=\"\"}\nIs this one simulation enough to determine whether we should reject the null hypothesis?\n\n------------------------------------------------------------------------\n\nNo.\nTo assess the hypotheses, we need to see a distribution of many values of $\\hat{p}_{sim},$ not just a *single* draw from this sampling distribution.\n:::\n\n### Observed statistic vs. null statistics\n\nOne simulation isn't enough to get a sense of the null distribution; many simulation studies are needed.\nRoughly 10,000 seems sufficient.\nHowever, paying someone to simulate 10,000 studies by hand is a waste of time and money.\nInstead, simulations are typically programmed into a computer, which is much more efficient.\n\n\n\n\n\nFigure \\@ref(fig:nullDistForPHatIfLiverTransplantConsultantIsNotHelpful) shows the results of 10,000 simulated studies.\nThe proportions that are equal to or less than $\\hat{p} = 0.048$ are shaded.\nThe shaded areas represent sample proportions under the null distribution that provide at least as much evidence as $\\hat{p}$ favoring the alternative hypothesis.\nThere were 420 simulated sample proportions with $\\hat{p}_{sim} \\leq 0.048.$ We use these to construct the null distribution's left-tail area and find the p-value:\n\n$$\\text{left tail area} = \\frac{\\text{Number of observed simulations with }\\hat{p}_{sim} \\leq \\text{ 0.048}}{10000}$$\n\nOf the 10,000 simulated $\\hat{p}_{sim},$ 420 were equal to or smaller than $\\hat{p}.$ Since the hypothesis test is one-sided, the estimated p-value is equal to this tail area: 0.042.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![(ref:nullDistForPHatIfLiverTransplantConsultantIsNotHelpful-cap)](16-inference-one-prop_files/figure-html/nullDistForPHatIfLiverTransplantConsultantIsNotHelpful-1.png){width=90%}\n:::\n:::\n\n\n(ref:nullDistForPHatIfLiverTransplantConsultantIsNotHelpful-cap) The null distribution for $\\hat{p},$ created from 10,000 simulated studies. The left tail, representing the p-value for the hypothesis test, contains 4.2% of the simulations.\n\n::: {.guidedpractice data-latex=\"\"}\nBecause the estimated p-value is 0.042, which is smaller than the discernibility level 0.05, we reject the null hypothesis.\nExplain what this means in plain language in the context of the problem.[^16-inference-one-prop-2]\n:::\n\n[^16-inference-one-prop-2]: There is sufficiently strong evidence to reject the null hypothesis in favor of the alternative hypothesis.\n We would conclude that there is evidence that the consultant's surgery complication rate is lower than the US standard rate of 10%.\n\n::: {.guidedpractice data-latex=\"\"}\nDoes the conclusion in the previous Guided Practice imply the consultant is good at their job?\nExplain.[^16-inference-one-prop-3]\n:::\n\n[^16-inference-one-prop-3]: No.\n Not necessarily.\n The evidence supports the alternative hypothesis that the consultant's complication rate is lower, but it's not a measurement of their performance.\n\n::: {.important data-latex=\"\"}\n**Null distribution of** $\\hat{p}$ **with bootstrap simulation.**\n\nRegardless of the statistical method chosen, the p-value is always derived by analyzing the null distribution of the test statistic.\nThe normal model poorly approximates the null distribution for $\\hat{p}$ when the success-failure condition is not satisfied.\nAs a substitute, we can generate the null distribution using simulated sample proportions and use this distribution to compute the tail area, i.e., the p-value.\n:::\n\nIn the previous Guided Practice, the p-value is *estimated*.\nIt is not exact because the simulated null distribution itself is only a close approximation of the sampling distribution of the sample statistic.\nAn exact p-value can be generated using the binomial distribution, but that method will not be covered in this text.\n\n\\clearpage\n\n## Mathematical model for a proportion {#one-prop-norm}\n\n### Conditions\n\nIn Section \\@ref(normalDist), we introduced the normal distribution and showed how it can be used as a mathematical model to describe the variability of a statistic.\nThere are conditions under which a sample proportion $\\hat{p}$ is well modeled using a normal distribution.\nWhen the sample observations are independent and the sample size is sufficiently large, the normal model will describe the sampling distribution of the sample proportion quite well; when the observations violate the conditions, the normal model can be inaccurate.\nParticularly, it can underestimate the variability of the sample proportion.\n\n::: {.important data-latex=\"\"}\n**Sampling distribution of** $\\hat{p}.$\n\nThe sampling distribution for $\\hat{p}$ based on a sample of size $n$ from a population with a true proportion $p$ is nearly normal when:\n\n1. The sample's observations are independent, e.g., are from a simple random sample.\n2. We expected to see at least 10 successes and 10 failures in the sample, i.e., $np\\geq10$ and $n(1-p)\\geq10.$ This is called the **success-failure condition**.\n\nWhen these conditions are met, then the sampling distribution of $\\hat{p}$ is nearly normal with mean $p$ and standard error of $\\hat{p}$ as $SE = \\sqrt{\\frac{\\ \\hat{p}(1-\\hat{p})\\ }{n}}.$\n:::\n\nRecall that the margin of error is defined by the standard error.\nThe margin of error for $\\hat{p}$ can be directly obtained from $SE(\\hat{p}).$\n\n::: {.important data-latex=\"\"}\n**Margin of error for** $\\hat{p}.$\n\nThe margin of error is $z^\\star \\times \\sqrt{\\frac{\\ \\hat{p}(1-\\hat{p})\\ }{n}}$ where $z^\\star$ is calculated from a specified percentile on the normal distribution.\n:::\n\n\\index{success-failure condition} \\index{standard error (SE)!single proportion}\n\n\n\n\n\nTypically we do not know the true proportion $p,$ so we substitute some value to check conditions and estimate the standard error.\nFor confidence intervals, the sample proportion $\\hat{p}$ is used to check the success-failure condition and compute the standard error.\nFor hypothesis tests, typically the null value -- that is, the proportion claimed in the null hypothesis -- is used in place of $p.$\n\nThe independence condition is a more nuanced requirement.\nWhen it isn't met, it is important to understand how and why it is violated.\nFor example, there exist no statistical methods available to truly correct the inherent biases of data from a convenience sample.\nOn the other hand, if we took a cluster sample (see Section \\@ref(samp-methods)), the observations wouldn't be independent, but suitable statistical methods are available for analyzing the data (but they are beyond the scope of even most second or third courses in statistics).\n\n::: {.workedexample data-latex=\"\"}\nIn the examples based on large sample theory, we modeled $\\hat{p}$ using the normal distribution.\nWhy is this not appropriate for the case study on the medical consultant?\n\n------------------------------------------------------------------------\n\nThe independence assumption may be reasonable if each of the surgeries is from a different surgical team.\nHowever, the success-failure condition is not satisfied.\nUnder the null hypothesis, we would anticipate seeing $62 \\times 0.10 = 6.2$ complications, not the 10 required for the normal approximation.\n:::\n\nWhile this book is scoped to well-constrained statistical problems, do remember that this is just the first book in what is a large library of statistical methods that are suitable for a very wide range of data and contexts.\n\n### Confidence interval for a proportion\n\n\\index{point estimate!single proportion}\n\nA confidence interval provides a range of plausible values for the parameter $p,$ and when $\\hat{p}$ can be modeled using a normal distribution, the confidence interval for $p$ takes the form $\\hat{p} \\pm z^{\\star} \\times SE.$ We have seen $\\hat{p}$ to be the sample proportion.\nThe value $z^{\\star}$ determines the confidence level (previously set to be 1.96) and will be discussed in detail in the examples following.\nThe value of the standard error, $SE,$ depends heavily on the sample size.\n\n::: {.important data-latex=\"\"}\n**Standard error of one proportion,** $\\hat{p}.$\n\nWhen the conditions are met so that the distribution of $\\hat{p}$ is nearly normal, the **variability** of a single proportion, $\\hat{p}$ is well described by:\n\n$$SE(\\hat{p}) = \\sqrt{\\frac{p(1-p)}{n}}$$\n\nNote that we almost never know the true value of $p.$ A more helpful formula to use is:\n\n$$SE(\\hat{p}) \\approx \\sqrt{\\frac{(\\mbox{best guess of }p)(1 - \\mbox{best guess of }p)}{n}}$$\n\nFor hypothesis testing, we often use $p_0$ as the best guess of $p.$ For confidence intervals, we typically use $\\hat{p}$ as the best guess of $p.$\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nConsider taking many polls of registered voters (i.e., random samples) of size 300 asking them if they support legalized marijuana.\nIt is suspected that about 2/3 of all voters support legalized marijuana.\nTo understand how the sample proportion $(\\hat{p})$ would vary across the samples, calculate the standard error of $\\hat{p}.$[^16-inference-one-prop-4]\n:::\n\n[^16-inference-one-prop-4]: Because the $p$ is unknown but expected to be around 2/3, we will use 2/3 in place of $p$ in the formula for the standard error.\\\n $SE = \\sqrt{\\frac{p(1-p)}{n}} \\approx \\sqrt{\\frac{2/3 (1 - 2/3)} {300}} = 0.027.$\n\n\\clearpage\n\n### Variability of the sample proportion\n\n::: {.workedexample data-latex=\"\"}\nA simple random sample of 826 payday loan borrowers was surveyed to better understand their interests around regulation and costs.\n70% of the responses supported new regulations on payday lenders.\n\n1. Is it reasonable to model the variability of $\\hat{p}$ from sample to sample using a normal distribution?\n\n2. Estimate the standard error of $\\hat{p}.$\n\n3. Construct a 95% confidence interval for $p,$ the proportion of payday borrowers who support increased regulation for payday lenders.\n\n------------------------------------------------------------------------\n\n1. The data are a random sample, so it is reasonable to assume that the observations are independent and representative of the population of interest.\n\nWe also must check the success-failure condition, which we do using $\\hat{p}$ in place of $p$ when computing a confidence interval:\n\n$$\n\\begin{aligned}\n \\text{Support: }\n n p &\n \\approx 826 \\times 0.70\n = 578\\\\\n \\text{Not: }\n n (1 - p) &\n \\approx 826 \\times (1 - 0.70)\n = 248\n\\end{aligned}\n$$\n\nSince both values are at least 10, we can use the normal distribution to model $\\hat{p}.$\n\n2. Because $p$ is unknown and the standard error is for a confidence interval, use $\\hat{p}$ in place of $p$ in the formula.\n\n$$SE = \\sqrt{\\frac{p(1-p)}{n}} \\approx \\sqrt{\\frac{0.70 (1 - 0.70)} {826}} = 0.016.$$\n\n3. Using the point estimate 0.70, $z^{\\star} = 1.96$ for a 95% confidence interval, and the standard error $SE = 0.016$ from the previous Guided Practice, the confidence interval is\n\n$$ \n\\begin{aligned}\n\\text{point estimate} \\ &\\pm \\ z^{\\star} \\times \\ SE \\\\\n0.70 \\ &\\pm \\ 1.96 \\ \\times \\ 0.016 \\\\ \n(0.669 \\ &, \\ 0.731)\n\\end{aligned}\n$$\n\nWe are 95% confident that the true proportion of payday borrowers who supported regulation at the time of the poll was between 0.669 and 0.731.\n:::\n\n::: {.important data-latex=\"\"}\n**Constructing a confidence interval for a single proportion.**\n\nThere are three steps to constructing a confidence interval for $p.$\n\n1. Check if it seems reasonable to assume the observations are independent and check the success-failure condition using $\\hat{p}.$ If the conditions are met, the sampling distribution of $\\hat{p}$ may be well-approximated by the normal model.\n2. Construct the standard error using $\\hat{p}$ in place of $p$ in the standard error formula.\n3. Apply the general confidence interval formula.\n:::\n\nFor additional one-proportion confidence interval examples, see Section \\@ref(ConfidenceIntervals).\n\n### Changing the confidence level\n\n\\index{confidence level}\n\nSuppose we want to consider confidence intervals where the confidence level is somewhat higher than 95%: perhaps we would like a confidence level of 99%.\nThink back to the analogy about trying to catch a fish: if we want to be more sure that we will catch the fish, we should use a wider net.\nTo create a 99% confidence level, we must also widen our 95% interval.\nOn the other hand, if we want an interval with lower confidence, such as 90%, we could make our original 95% interval slightly slimmer.\n\nThe 95% confidence interval structure provides guidance in how to make intervals with new confidence levels.\nBelow is a general 95% confidence interval for a point estimate that comes from a nearly normal distribution:\n\n$$\\text{point estimate} \\ \\pm \\ 1.96 \\ \\times \\ SE$$\n\nThere are three components to this interval: the point estimate, \"1.96\", and the standard error.\nThe choice of $1.96 \\times SE$ was based on capturing 95% of the data since the estimate is within 1.96 standard errors of the true value about 95% of the time.\nThe choice of 1.96 corresponds to a 95% confidence level.\n\n::: {.guidedpractice data-latex=\"\"}\nIf $X$ is a normally distributed random variable, how often will $X$ be within 2.58 standard deviations of the mean?[^16-inference-one-prop-5]\n:::\n\n[^16-inference-one-prop-5]: This is equivalent to asking how often the $Z$ score will be larger than -2.58 but less than 2.58.\n (For a picture, see Figure \\@ref(fig:choosingZForCI).) To determine this probability, look up -2.58 and 2.58 in the normal probability table (0.0049 and 0.9951).\n Thus, there is a $0.9951-0.0049 \\approx 0.99$ probability that the unobserved random variable $X$ will be within 2.58 standard deviations of the mean.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![(ref:choosingZForCI-cap)](16-inference-one-prop_files/figure-html/choosingZForCI-1.png){width=90%}\n:::\n:::\n\n\n(ref:choosingZForCI-cap) The area between -$z^{\\star}$ and $z^{\\star}$ increases as $|z^{\\star}|$ becomes larger. If the confidence level is 99%, we choose $z^{\\star}$ such that 99% of the normal curve is between -$z^{\\star}$ and $z^{\\star},$ which corresponds to 0.5% in the lower tail and 0.5% in the upper tail: $z^{\\star}=2.58.$\n\n\\index{confidence interval}\n\nTo create a 99% confidence interval, change 1.96 in the 95% confidence interval formula to be $2.58.$ The previous Guided Practice highlights that 99% of the time a normal random variable will be within 2.58 standard deviations of its mean.\nThis approach -- using the Z scores in the normal model to compute confidence levels -- is appropriate when the point estimate is associated with a normal distribution and we can properly compute the standard error.\nThus, the formula for a 99% confidence interval is:\n\n$$\\text{point estimate} \\ \\pm \\ 2.58 \\ \\times \\ SE$$\n\nThe normal approximation is crucial to the precision of the $z^\\star$ confidence intervals (in contrast to the bootstrap percentile confidence intervals).\nWhen the normal model is not a good fit, we will use alternative distributions that better characterize the sampling distribution or we will use bootstrapping procedures.\n\n::: {.guidedpractice data-latex=\"\"}\nCreate a 99% confidence interval for the impact of the stent on the risk of stroke using the data from Section \\@ref(case-study-stents-strokes).\nThe point estimate is 0.090, and the standard error is $SE = 0.028.$ It has been verified for you that the point estimate can reasonably be modeled by a normal distribution.[^16-inference-one-prop-6]\n:::\n\n[^16-inference-one-prop-6]: Since the necessary conditions for applying the normal model have already been checked for us, we can go straight to the construction of the confidence interval: $\\text{point estimate} \\pm 2.58 \\times SE$ Which gives an interval of (0.018, 0.162).\\$ We are 99% confident that implanting a stent in the brain of a patient who is at risk of stroke increases the risk of stroke within 30 days by a rate of 0.018 to 0.162 (assuming the patients are representative of the population).\n\n::: {.important data-latex=\"\"}\n**Mathematical model confidence interval for any confidence level.**\n\nIf the point estimate follows the normal model with standard error $SE,$ then a confidence interval for the population parameter is\n\n$$\\text{point estimate} \\ \\pm \\ z^{\\star} \\ \\times \\ SE$$\n\nwhere $z^{\\star}$ corresponds to the confidence level selected.\n:::\n\nFigure \\@ref(fig:choosingZForCI) provides a picture of how to identify $z^{\\star}$ based on a confidence level.\nWe select $z^{\\star}$ so that the area between -$z^{\\star}$ and $z^{\\star}$ in the normal model corresponds to the confidence level.\n\n::: {.guidedpractice data-latex=\"\"}\nPreviously, we found that implanting a stent in the brain of a patient at risk for a stroke *increased* the risk of a stroke.\nThe study estimated a 9% increase in the number of patients who had a stroke, and the standard error of this estimate was about $SE = 2.8%.$ Compute a 90% confidence interval for the effect.[^16-inference-one-prop-7]\n:::\n\n[^16-inference-one-prop-7]: We must find $z^{\\star}$ such that 90% of the distribution falls between -$z^{\\star}$ and $z^{\\star}$ in the standard normal model, $N(\\mu=0, \\sigma=1).$ We can look up -$z^{\\star}$ in the normal probability table by looking for a lower tail of 5% (the other 5% is in the upper tail), thus $z^{\\star} = 1.65.$ The 90% confidence interval can then be computed as $\\text{point estimate} \\pm 1.65 \\times SE \\to (4.4\\%, 13.6\\%).$ (Note: the conditions for normality had earlier been confirmed for us.) That is, we are 90% confident that implanting a stent in a stroke patient's brain increased the risk of stroke within 30 days by 4.4% to 13.6%.\\\n Note, the problem was set up as 90% to indicate that there was not a need for a high level of confidence (such as 95% or 99%).\n A lower degree of confidence increases potential for error, but it also produces a more narrow interval.\n\n### Hypothesis test for a proportion\n\n::: {.important data-latex=\"\"}\n**The test statistic for assessing a single proportion is a Z.**\n\nThe **Z score** is a ratio of how the sample proportion differs from the hypothesized proportion as compared to the expected variability of the $\\hat{p}$ values.\n\n$$Z = \\frac{\\hat{p} - p_0}{\\sqrt{p_0(1 - p_0)/n}}$$\n\nWhen the null hypothesis is true and the conditions are met, Z has a standard normal distribution.\n\nConditions:\n\n- independent observations\\\n- large samples $(n p_0 \\geq 10$ and $n (1-p_0) \\geq 10)$\\\n:::\n\n\n\n\n\nOne possible regulation for payday lenders is that they would be required to do a credit check and evaluate debt payments against the borrower's finances.\nWe would like to know: would borrowers support this form of regulation?\n\n::: {.guidedpractice data-latex=\"\"}\nSet up hypotheses to evaluate whether borrowers have a majority support for this type of regulation.[^16-inference-one-prop-8]\n:::\n\n[^16-inference-one-prop-8]: $H_0:$ there is not support for the regulation; $H_0:$ $p \\leq 0.50.$ $H_A:$ the majority of borrowers support the regulation; $H_A:$ $p > 0.50.$\n\nTo apply the normal distribution framework in the context of a hypothesis test for a proportion, the independence and success-failure conditions must be satisfied.\nIn a hypothesis test, the success-failure condition is checked using the null proportion: we verify $np_0$ and $n(1-p_0)$ are at least 10, where $p_0$ is the null value.\n\n::: {.guidedpractice data-latex=\"\"}\nDo payday loan borrowers support a regulation that would require lenders to pull their credit report and evaluate their debt payments?\nFrom a random sample of 826 borrowers, 51% said they would support such a regulation.\nIs it reasonable to use a normal distribution to model $\\hat{p}$ for a hypothesis test here?[^16-inference-one-prop-9]\n:::\n\n[^16-inference-one-prop-9]: Independence holds since the poll is based on a random sample.\n The success-failure condition also holds, which is checked using the null value $(p_0 = 0.5)$ from $H_0:$ $np_0 = 826 \\times 0.5 = 413,$ $n(1 - p_0) = 826 \\times 0.5 = 413.$ Recall that here, the best guess for $p$ is $p_0$ which comes from the null hypothesis (because we assume the null hypothesis is true when performing the testing procedure steps).\n $H_0:$ there is not support for the regulation; $H_0:$ $p \\leq 0.50.$ $H_A:$ the majority of borrowers support the regulation; $H_A:$ $p > 0.50.$\n\n::: {.workedexample data-latex=\"\"}\nUsing the hypotheses and data from the previous Guided Practices, evaluate whether the poll on lending regulations provides convincing evidence that a majority of payday loan borrowers support a new regulation that would require lenders to pull credit reports and evaluate debt payments.\n\n------------------------------------------------------------------------\n\nWith hypotheses already set up and conditions checked, we can move onto calculations.\nThe standard error in the context of a one-proportion hypothesis test is computed using the null value, $p_0:$\n\n$$SE = \\sqrt{\\frac{p_0 (1 - p_0)}{n}} = \\sqrt{\\frac{0.5 (1 - 0.5)}{826}} = 0.017$$\n\nA picture of the normal model is shown with the p-value represented by the shaded region.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](16-inference-one-prop_files/figure-html/unnamed-chunk-8-1.png){width=90%}\n:::\n:::\n\n\nBased on the normal model, the test statistic can be computed as the Z score of the point estimate:\n\n$$\n\\begin{aligned}\nZ &= \\frac{\\text{point estimate} - \\text{null value}}{SE} \\\\\n &= \\frac{0.51 - 0.50}{0.017} \\\\\n &= 0.59\n\\end{aligned} \n$$\n\nThe single tail area which represents the p-value is 0.2776.\nBecause the p-value is larger than 0.05, we do not reject $H_0.$ The poll does not provide convincing evidence that a majority of payday loan borrowers support regulations around credit checks and evaluation of debt payments.\n\nIn Section \\@ref(two-prop-errors) we discuss two-sided hypothesis tests of which the payday example may have been better structured.\nThat is, we might have wanted to ask whether the borrows **support or oppose** the regulations (to study opinion in either direction away from the 50% benchmark).\nIn that case, the p-value would have been doubled to 0.5552 (again, we would not reject $H_0).$ In the two-sided hypothesis setting, the appropriate conclusion would be to claim that the poll does not provide convincing evidence that a majority of payday loan borrowers support or oppose regulations around credit checks and evaluation of debt payments.\n\nIn both the one-sided or two-sided setting, the conclusion is somewhat unsatisfactory because there is no conclusion.\nThat is, there is no resolution one way or the other about public opinion.\nWe cannot claim that exactly 50% of people support the regulation, but we cannot claim a majority in either direction.\n:::\n\n::: {.important data-latex=\"\"}\n**Mathematical model hypothesis test for a proportion.**\n\nSet up hypotheses and verify the conditions using the null value, $p_0,$ to ensure $\\hat{p}$ is nearly normal under $H_0.$ If the conditions hold, construct the standard error, again using $p_0,$ and show the p-value in a drawing.\nLastly, compute the p-value and evaluate the hypotheses.\n:::\n\nFor additional one-proportion hypothesis test examples, see Section \\@ref(HypothesisTesting).\n\n### Violating conditions\n\nWe've spent a lot of time discussing conditions for when $\\hat{p}$ can be reasonably modeled by a normal distribution.\nWhat happens when the success-failure condition fails?\nWhat about when the independence condition fails?\nIn either case, the general ideas of confidence intervals and hypothesis tests remain the same, but the strategy or technique used to generate the interval or p-value change.\n\nWhen the success-failure condition isn't met for a hypothesis test, we can simulate the null distribution of $\\hat{p}$ using the null value, $p_0,$ as seen in Section \\@ref(one-prop-null-boot).\nUnfortunately, methods for dealing with observations which are not independent (e.g., repeated measurements on subject, such as in studies where measurements from the same subjects are taken pre and post study) are outside the scope of this book.\n\n\\vspace{10mm}\n\n## Chapter review {#chp16-review}\n\n### Summary\n\nBuilding on the foundational ideas from the previous few ideas, this chapter focused exclusively on the single population proportion as the parameter of interest.\nNote that it is not possible to do a randomization test with only one variable, so to do computational hypothesis testing, we applied a bootstrapping framework.\nThe bootstrap confidence interval and the mathematical framework for both hypothesis testing and confidence intervals are similar to those applied to other data structures and parameters.\nWhen using the mathematical model, keep in mind the success-failure conditions.\nAdditionally, know that bootstrapping is always more accurate with larger samples.\n\n### Terms\n\nWe introduced the following terms in the chapter.\nIf you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.\nWe are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.\nHowever, you should be able to easily spot them as **bolded text**.\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n\n
parametric bootstrap success-failure condition
SE single proportion Z score
\n\n`````\n:::\n:::\n\n\n\\clearpage\n\n## Exercises {#chp16-exercises}\n\nAnswers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-16].\n\n::: {.exercises data-latex=\"\"}\n1. **Do aliens exist?**\nIn May 2021, YouGov asked 4,839 adult Great Britain residents whether they think aliens exist, and if so, if they have or have not visited Earth.\nYou want to evaluate if more than a quarter (25\\%) of Great Britain adults think aliens do not exist.\nIn the survey 22\\% responded \"I think they exist, and have visited Earth\", 28\\% responded \"I think they exist, but have not visited Earth\", 29% responded \"I do not think they exist\", and 22\\% responded \"Don't know\".\nA friend of yours offers to help you with setting up the hypothesis test and comes up with the following hypotheses.\nIndicate any errors you see.\n\n $H_0: \\hat{p} = 0.29 \\quad \\quad H_A: \\hat{p} > 0.29$\n \n \\vspace{5mm}\n\n1. **Married at 25.**\nA study suggests that the 25% of 25 year olds have gotten married.\nYou believe that this is incorrect and decide to collect your own sample for a hypothesis test.\nFrom a random sample of 25 year olds in census data with size 776, you find that 24% of them are married.\nA friend of yours offers to help you with setting up the hypothesis test and comes up with the following hypotheses. Indicate any errors you see.\n\n $H_0: \\hat{p} = 0.24 \\quad \\quad H_A: \\hat{p} \\neq 0.24$\n \n \\vspace{5mm}\n\n1. **Defund the police.**\nA Survey USA poll conducted in Seattle, WA in May 2021 reports that of the 650 respondents (adults living in this area), 159 support proposals to defund police departments. [@data:defundpolice]\n\n a. A journalist writing a news story on the poll results wants to use the headline \"More than 1 in 5 adults living in Seattle support proposals to defund police departments.\" You caution the journalist that they should first conduct a hypothesis test to see if the poll data provide convincing evidence for this claim. Write the hypotheses for this test.\n \n b. Calculate the proportion of Seattle adults in this sample who support proposals to defund police departments.\n \n c. Describe a setup for a simulation that would be appropriate in this situation and how the p-value can be calculated using the simulation results.\n \n d. Below is a histogram showing the distribution of $\\hat{p}_{sim}$ in 1,000 simulations under the null hypothesis. Estimate the p-value using the plot and use it to evaluate the hypotheses.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-10-1.png){width=90%}\n :::\n :::\n \n \\clearpage\n\n1. **Assisted reproduction.**\nAssisted Reproductive Technology (ART) is a collection of techniques that help facilitate pregnancy (e.g., in vitro fertilization). The 2018 ART Fertility Clinic Success Rates Report published by the Centers for Disease Control and Prevention reports that ART has been successful in leading to a live birth in 48.8% of cases where the patient is under 35 years old. [@web:art2018] A new fertility clinic claims that their success rate is higher than average for this age group. A random sample of 30 of their patients yielded a success rate of 60%. A consumer watchdog group would like to determine if this provides strong evidence to support the company's claim.\n\n a. Write the hypotheses to test if the success rate for ART at this clinic is discernibly higher than the success rate reported by the CDC.\n\n b. Describe a setup for a simulation that would be appropriate in this situation and how the p-value can be calculated using the simulation results.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-11-1.png){width=90%}\n :::\n :::\n\n c. Below is a histogram showing the distribution of $\\hat{p}_{sim}$ in 1,000 simulations under the null hypothesis. Estimate the p-value using the plot and use it to evaluate the hypotheses.\n\n d. After performing this analysis, the consumer group releases the following news headline: \"Infertility clinic falsely advertises better success rates\". Comment on the appropriateness of this statement.\n \n \\clearpage\n\n1. **If I fits, I sits, bootstrap test.**\nA citizen science project on which type of enclosed spaces cats are most likely to sit in compared (among other options) two different spaces taped to the ground. The first was a square, and the second was a shape known as [Kanizsa square illusion](https://en.wikipedia.org/wiki/Illusory_contours#Kanizsa_figures). When comparing the two options given to 7 cats, 5 chose the square, and 2 chose the Kanizsa square illusion. We are interested to know whether these data provide convincing evidence that cats prefer one of the shapes over the other. [@Smith:2021]\n \n a. What are the null and alternative hypotheses for evaluating whether these data provide convincing evidence that cats have preference for one of the shapes\n \n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below.Find the p-value using this distribution and conclude the hypothesis test in the context of the problem.\n \n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-12-1.png){width=90%}\n :::\n :::\n\n1. **Legalization of marijuana, bootstrap test.**\nThe 2018 General Social Survey asked a random sample of 1,563 US adults: \"Do you think the use of marijuana should be made legal, or not?\" 60% of the respondents said it should be made legal. [@data:gssgrass] Consider a scenario where, in order to become legal, 55% (or more) of voters must approve.\n \n a. What are the null and alternative hypotheses for evaluating whether these data provide convincing evidence that, if voted on, marijuana would be legalized in the US.\n \n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below. Find the p-value using this distribution and conclude the hypothesis test in the context of the problem.\n \n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-13-1.png){width=90%}\n :::\n :::\n \n \\clearpage\n\n1. **If I fits, I sits, standard errors.**\nThe results of a study on the type of enclosed spaces cats are most likely to sit in show that 5 out of 7 cats chose a square taped to the ground over a shape known as [Kanizsa square illusion](https://en.wikipedia.org/wiki/Illusory_contours#Kanizsa_figures), which was preferred by the remaining 2 cats. To evaluate whether these data provide convincing evidence that cats prefer one of the shapes over the other, we set $H_0: p = 0.5$, where $p$ is the population proportion of cats who prefer square over the Kanizsa square illusion and $H_A: p \\neq 0.5$, which suggests some preference, without specifying which shape is more preferred. [@Smith:2021]\n\n a. Using the mathematical model, calculate the standard error of the sample proportion in repeated samples of size 7.\n \n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below. This distribution shows the variability of the sample proportion in samples of size 7 when 50% of cats prefer the square shape over the Kanizsa square illusion. What is the approximate standard error of the sample proportion based on this distribution?\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-14-1.png){width=90%}\n :::\n :::\n \n c. Do the mathematical model and parametric bootstrap give similar standard errors?\n \n d. In order to approach the problem using the mathematical model, is the success-failure condition met for this study?Explain.\n \n e. What about the null distribution shown above (generated using the parametric bootstrap) tells us that the mathematical model should probably not be used?\n \n \\clearpage\n\n1. **Legalization of marijuana, standard errors.**\nAccording to the 2018 General Social Survey, in a random sample of 1,563 US adults, 60% think marijuana should be made legal. [@data:gssgrass] Consider a scenario where, in order to become legal, 55% (or more) of voters must approve.\n\n a. Calculate the standard error of the sample proportion using the mathematical model.\n\n b. A parametric bootstrap simulation (with 1,000 bootstrap samples) was run and the resulting null distribution is displayed in the histogram below. This distribution shows the variability of the sample proportion in samples of size 1,563 when 55% of voters approve legalizing marijuana. What is the approximate standard error of the sample proportion based on this distribution?\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-15-1.png){width=90%}\n :::\n :::\n \n c. Do the mathematical model and parametric bootstrap give similar standard errors?\n \n d. In this setting (to test whether the true underlying population proportion is greater than 0.55), would there be a strong reason to choose the mathematical model over the parametric bootstrap (or vice versa)?\n \n \\clearpage\n\n1. **Statistics and employment, describe the bootstrap.**\nA large university knows that about 70% of the full-time students are employed at least 5 hours per week. The members of the Statistics Department wonder if the same proportion of their students work at least 5 hours per week. They randomly sample 25 majors and find that 15 of the students work 5 or more hours each week.\n\n Two bootstrap sampling distributions are created to describe the variability in the proportion of statistics majors who work at least 5 hours per week. The parametric bootstrap imposes a true population proportion of $p = 0.7$ while the data bootstrap resamples from the actual data (which has 60% of the observations who work at least 5 hours per week).\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-16-1.png){width=100%}\n :::\n :::\n \n a. The bootstrap sampling was done under two different settings to generate each of the distributions shown above. Describe the two different settings.\n\n b. Where are each of the two distributions centered? Are they centered at roughly the same place?\n \n c. Estimate the standard error of the simulated proportions based on each distribution. Are the two standard errors you estimate roughly equal?\n \n d. Describe the shapes of the two distributions. Are they roughly the same?\n \n \\clearpage\n\n1. **National Health Plan, parametric bootstrap.**\nA Kaiser Family Foundation poll for a random sample of US adults in 2019 found that 79% of Democrats, 55% of Independents, and 24% of Republicans supported a generic \"National Health Plan\". \nThere were 347 Democrats, 298 Republicans, and 617 Independents surveyed. [@data:KFF2019nathealthplan]\n\n A political pundit on TV claims that a majority of Independents support a National Health Plan. Do these data provide strong evidence to support this type of statement? One approach to assessing the question of whether a majority of Independents support a National Health Plan is to simulate 1,000 parametric bootstrap samples with $p = 0.5$ as the proportion of Independents in support.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-17-1.png){width=90%}\n :::\n :::\n \n a. The histogram above displays 1000 values of what? \n\n b. Is the observed proportion of Independents consistent with the parametric bootstrap proportions under the setting where $p=0.5?$\n \n c. In order to test the claim that \"a majority of Independents support a National Health Plan\" what are the null and alternative hypotheses?\n \n d. Using the parametric bootstrap distribution, find the p-value and conclude the hypothesis test in the context of the problem.\n \n \\clearpage\n\n1. **Statistics and employment, use the bootstrap.**\nIn a large university where 70% of the full-time students are employed at least 5 hours per week, the members of the Statistics Department wonder if the same proportion of their students work at least 5 hours per week. They randomly sample 25 majors and find that 15 of the students work 5 or more hours each week.\n\n Two bootstrap sampling distributions are created to describe the variability in the proportion of statistics majors who work at least 5 hours per week. The parametric bootstrap imposes a true population proportion of $p=0.7$ while the data bootstrap resamples from the actual data (which has 60% of the observations who work at least 5 hours per week).\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](16-inference-one-prop_files/figure-html/unnamed-chunk-18-1.png){width=100%}\n :::\n :::\n \n a. Which bootstrap distribution should be used to test whether the proportion of all statistics majors who work at least 5 hours per week is 70%? And which bootstrap distribution should be used to find a confidence interval for the true poportion of statistics majors who work at least 5 hours per week?\n \n b. Using the appropriate histogram, test the claim that 70% of statistics majors, like their peers, work at least 5 hours per week. State the null and alternative hypotheses, find the p-value, and conclude the test in the context of the problem.\n \n c. Using the appropriate histogram, find a 98% bootstrap percentile confidence interval for the true proportion of statistics majors who work at least 5 hours per week. Interpret the confidence interval in the context of the problem.\n \n d. Using the appropriate historgram, find a 98% bootstrap SE confidence interval for the true proportion of statistics majors who work at least 5 hours per week. Interpret the confidence interval in the context of the problem.\n \n \\vspace{5mm}\n\n1. **CLT for proportions.**\nDefine the term \"sampling distribution\" of the sample proportion, and describe how the shape, center, and spread of the sampling distribution change as the sample size increases when $p = 0.1$.\n\n \\clearpage\n\n1. **Vegetarian college students.**\nSuppose that 8% of college students are vegetarians. Determine if the following statements are true or false, and explain your reasoning.\n\n a. The distribution of the sample proportions of vegetarians in random samples of size 60 is approximately normal since $n \\ge 30$.\n\n b. The distribution of the sample proportions of vegetarian college students in random samples of size 50 is right skewed.\n\n c. A random sample of 125 college students where 12% are vegetarians would be considered unusual.\n\n d. A random sample of 250 college students where 12% are vegetarians would be considered unusual.\n\n e. The standard error would be reduced by one-half if we increased the sample size from 125 to 250.\n\n1. **Young Americans, American dream.** \nAbout 77% of young adults think they can achieve the American dream. \nDetermine if the following statements are true or false, and explain your reasoning. [@news:youngAmericans1]\n\n a. The distribution of sample proportions of young Americans who think they can achieve the American dream in random samples of size 20 is left skewed.\n\n b. The distribution of sample proportions of young Americans who think they can achieve the American dream in random samples of size 40 is approximately normal since $n \\ge 30$.\n\n c. A random sample of 60 young Americans where 85% think they can achieve the American dream would be considered unusual.\n\n d. A random sample of 120 young Americans where 85% think they can achieve the American dream would be considered unusual.\n\n1. **Orange tabbies.** \nSuppose that 90% of orange tabby cats are male.\nDetermine if the following statements are true or false, and explain your reasoning.\n\n a. The distribution of sample proportions of random samples of size 30 is left skewed.\n\n b. Using a sample size that is 4 times as large will reduce the standard error of the sample proportion by one-half.\n\n c. The distribution of sample proportions of random samples of size 140 is approximately normal.\n\n d. The distribution of sample proportions of random samples of size 280 is approximately normal.\n\n1. **Young Americans, starting a family.**\nAbout 25% of young Americans have delayed starting a family due to the continued economic slump.\nDetermine if the following statements are true or false, and explain your reasoning. [@news:youngAmericans2]\n\n a. The distribution of sample proportions of young Americans who have delayed starting a family due to the continued economic slump in random samples of size 12 is right skewed.\n\n b. In order for the distribution of sample proportions of young Americans who have delayed starting a family due to the continued economic slump to be approximately normal, we need random samples where the sample size is at least 40.\n\n c. A random sample of 50 young Americans where 20% have delayed starting a family due to the continued economic slump would be considered unusual.\n\n d. A random sample of 150 young Americans where 20% have delayed starting a family due to the continued economic slump would be considered unusual.\n\n e. Tripling the sample size will reduce the standard error of the sample proportion by one-third.\n \n \\clearpage\n\n1. **Sex equality.**\nThe General Social Survey asked a random sample of 1,390 Americans the following question: \"On the whole, do you think it should or should not be the government's responsibility to promote equality between men and women?\" 82% of the respondents said it \"should be\". At a 95% confidence level, this sample has 2% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning. [@data:gsssexeq]\n\n a. We are 95% confident that between 80% and 84% of Americans in this sample think it's the government's responsibility to promote equality between men and women.\n\n b. We are 95% confident that between 80% and 84% of all Americans think it's the government's responsibility to promote equality between men and women.\n\n c. If we considered many random samples of 1,390 Americans, and we calculated 95% confidence intervals for each, 95% of these intervals would include the true population proportion of Americans who think it's the government's responsibility to promote equality between men and women.\n\n d. In order to decrease the margin of error to 1%, we would need to quadruple (multiply by 4) the sample size.\n\n e. Based on this confidence interval, there is sufficient evidence to conclude that a majority of Americans think it's the government's responsibility to promote equality between men and women.\n \n \\vspace{3mm}\n\n1. **Elderly drivers.** \nThe Marist Poll published a report stating that 66% of adults nationally think licensed drivers should be required to retake their road test once they reach 65 years of age. It was also reported that interviews were conducted on a random sample of 1,018 American adults, and that the margin of error was 3% using a 95% confidence level. [@data:elderlyDriving]\n\n a. Verify the margin of error reported by The Marist Poll using a mathematical model.\n\n b. Based on a 95% confidence interval, does the poll provide convincing evidence that *more than* two thirds of the population think that licensed drivers should be required to retake their road test once they turn 65?\n \n \\vspace{3mm}\n\n1. **Fireworks on July 4$^{\\text{th}}$.** \nA local news outlet reported that 56% of 600 randomly sampled Kansas residents planned to set off fireworks on July $4^{th}$. \nDetermine the margin of error for the 56% point estimate using a 95% confidence level using a mathematical model. [@data:july4]\n\n \\vspace{3mm}\n\n1. **Proof of COVID-19 vaccination.**\nA Gallup poll surveyed 3,731 randomly sampled US in April 2021, asking how they felt about requiring proof of COVID-19 vaccination for travel by airplane. \nThe poll found that 57% said they would favor it. [@data:gallupcovidvaccine]\n\n a. Describe the population parameter of interest. What is the value of the point estimate of this parameter?\n\n b. Check if the conditions required for constructing a confidence interval using a mathematical model based on these data are met.\n\n c. Construct a 95% confidence interval for the proportion of US adults who favor requiring proof of COVID-19 vaccination for travel by airplane.\n\n d. Without doing any calculations, describe what would happen to the confidence interval if we decided to use a higher confidence level.\n\n e. Without doing any calculations, describe what would happen to the confidence interval if we used a larger sample.\n \n \\clearpage\n\n1. **Study abroad.** \nA survey on 1,509 high school seniors who took the SAT and who completed an optional web survey shows that 55% of high school seniors are fairly certain that they will participate in a study abroad program in college. [@data:studyAbroad]\n\n a. Is this sample a representative sample from the population of all high school seniors in the US? Explain your reasoning.\n\n b. Let's suppose the conditions for inference are met. Even if your answer to part (a) indicated that this approach would not be reliable, this analysis may still be interesting to carry out (though not report). Using a mathematical model, construct a 90% confidence interval for the proportion of high school seniors (of those who took the SAT) who are fairly certain they will participate in a study abroad program in college, and interpret this interval in context.\n\n c. What does \"90% confidence\" mean?\n\n d. Based on this interval, would it be appropriate to claim that the majority of high school seniors are fairly certain that they will participate in a study abroad program in college?\n\n1. **Legalization of marijuana, mathematical interval.**\nThe General Social Survey asked a random sample of 1,563 US adults: \"Do you think the use of marijuana should be made legal, or not?\" 60% of the respondents said it should be made legal. [@data:gssgrass]\n\n a. Is 60% a sample statistic or a population parameter? Explain.\n\n b. Using a mathematical model, construct a 95% confidence interval for the proportion of US adults who think marijuana should be made legal, and interpret it.\n\n c. A critic points out that this 95% confidence interval is only accurate if the statistic follows a normal distribution, or if the normal model is a good approximation. Is this true for these data? Explain.\n\n d. A news piece on this survey's findings states, \"Majority of US adults think marijuana should be legalized.\" Based on your confidence interval, is this statement justified?\n\n1. **National Health Plan, mathematical inference.**\nA Kaiser Family Foundation poll for a random sample of US adults in 2019 found that 79% of Democrats, 55% of Independents, and 24% of Republicans supported a generic \"National Health Plan\". \nThere were 347 Democrats, 298 Republicans, and 617 Independents surveyed. [@data:KFF2019nathealthplan]\n\n a. A political pundit on TV claims that a majority of Independents support a National Health Plan. Do these data provide strong evidence to support this type of statement? Your response should use a mathematical model.\n\n b. Would you expect a confidence interval for the proportion of Independents who oppose the public option plan to include 0.5? Explain.\n\n1. **Is college worth it?**\nAmong a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school. [@data:collegeWorthIt]\n\n a. A newspaper article states that only a minority of the Americans who decide not to go to college do so because they cannot afford it and uses the point estimate from this survey as evidence. Conduct a hypothesis test to determine if these data provide strong evidence supporting this statement.\n\n b. Would you expect a confidence interval for the proportion of American adults who decide not to go to college because they cannot afford it to include 0.5? Explain.\n \n \\clearpage\n\n1. **Taste test.**\nSome people claim that they can tell the difference between a diet soda and a regular soda in the first sip.\nA researcher wanting to test this claim randomly sampled 80 such people.\nHe then filled 80 plain white cups with soda, half diet and half regular through random assignment, and asked each person to take one sip from their cup and identify the soda as diet or regular.\n53 participants correctly identified the soda.\n\n a. Do these data provide strong evidence that these people are able to detect the difference between diet and regular soda, in other words, are the results discernibly better than just random guessing? Your response should use a mathematical model.\n\n b. Interpret the p-value in this context.\n \n \\vspace{5mm}\n\n1. **Will the coronavirus bring the world closer together?**\nAn April 2021 YouGov poll asked 4,265 UK adults whether they think the coronavirus bring the world closer together or leave us further apart. \n12% of the respondents said it will bring the world closer together. 37% said it would leave us further apart, 39% said it won't make a difference and the remainder didn't have an opinion on the matter. [@data:yougovcovid]\n\n a. Calculate, using a mathematical model, a 90% confidence interval for the proportion of UK adults who think the coronavirus will bring the world closer together, and interpret the interval in context.\n\n b. Suppose we wanted the margin of error for the 90% confidence level to be about 0.5%. How large of a sample size would you recommend for the poll?\n \n \\vspace{5mm}\n\n1. **Quality control.**\nAs part of a quality control process for computer chips, an engineer at a factory randomly samples 212 chips during a week of production to test the current rate of chips with severe defects. \nShe finds that 27 of the chips are defective.\n\n a. What population is under consideration in the data set?\n\n b. What parameter is being estimated?\n\n c. What is the point estimate for the parameter?\n\n d. What is the name of the statistic that can be used to measure the uncertainty of the point estimate?\n\n e. Compute the value of the statistic from part (d) using a mathematical model.\n\n f. The historical rate of defects is 10%. Should the engineer be surprised by the observed rate of defects during the current week?\n\n g. Suppose the true population value was found to be 10%. If we use this proportion to recompute the value in part (d) using $p = 0.1$ instead of $\\hat{p}$, how much does the resulting value of the statistic change?\n \n \\vspace{5mm}\n\n1. **Nearsighted children.**\nNearsightedness (myopia) is a common vision condition in which you can see objects near to you clearly, but objects farther away are blurry. \nIt is believed that nearsightedness affects about 8% of all children. \nIn a random sample of 194 children, 21 are nearsighted. \nUsing a mathematical model, conduct a hypothesis test for the following question: do these data provide evidence that the 8% value is inaccurate?\n\n \\clearpage\n\n1. **Website registration.**\nA website is trying to increase registration for first-time visitors, exposing 1% of these visitors to a new site design. \nOf 752 randomly sampled visitors over a month who saw the new design, 64 registered.\n\n a. Check the conditions for constructing a confidence interval using a mathematical model.\n\n b. Compute the standard error which would describe the variability associated with repeated samples of size 752.\n\n c. Construct and interpret a 90% confidence interval for the fraction of first-time visitors of the site who would register under the new design (assuming stable behaviors by new visitors over time).\n \n \\vspace{5mm}\n\n1. **Coupons driving visits.**\nA store randomly samples 603 shoppers over the course of a year and finds that 142 of them made their visit because of a coupon they'd received in the mail.\nUsing a mathematical model, construct a 95% confidence interval for the fraction of all shoppers during the year whose visit was because of a coupon they'd received in the mail.\n\n\n:::\n", "supporting": [ "16-inference-one-prop_files" ],