basic-statistics.qmd

# Basic Statistics {#sec-basicStats}

## Getting Started {#sec-basicStatsGettingStarted}

### Load Packages {#sec-basicStatsLoadPackages}

```{r}
library("petersenlab")
library("DescTools")
library("pwr")
library("pwrss")
library("WebPower")
library("grid")
library("tidyverse")
```

### Load Data {#sec-basicStatsLoadData}

```{r}
#| eval: false
#| include: false

load(file = file.path(path, "/OneDrive - University of Iowa/Teaching/Courses/Fantasy Football/Data/player_stats_seasonal.RData", fsep = ""))
```

```{r}
load(file = "./data/player_stats_seasonal.RData")
```

We created the `player_stats_seasonal.RData` object in @sec-calculatePlayerAge.

## Descriptive Statistics {#sec-descriptiveStatistics}

Descriptive statistics are used to describe data.
For instance, they may be used to describe the center, spread, or shape of the data.
There are various indices of each.

### Center {#sec-descriptiveStatisticsCenter}

Indices to describe the *center* (central tendency) of a variable's data include:

- mean (aka "average")
- median
- Hodges-Lehmann statistic (aka pseudomedian)
- mode
- weighted mean
- weighted median

The mean of $X$ (written as: $\bar{X}$) is calculated as in @eq-IQR:

$$
\bar{X} = \frac{\sum X_i}{n} = \frac{X_1 + X_2 + ... + X_n}{n}
$$ {#eq-IQR}

```{r}
#| code-fold: true

exampleValues <- c(0, 0, 10, 15, 20, 30, 1000)
exampleValues_mean <- apa(mean(exampleValues), 2)
```

That is, to compute the mean, sum all of the values and divide by the number of values ($n$).
One issue with the mean is that it is sensitive to extreme (outlying) values.
For instance, the mean of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_mean`.

```{r}
#| code-fold: true

exampleValues_median <- median(exampleValues)
```

The median is determined as the value at the 50th percentile (i.e., the value that is higher than 50% of the values and is lower than the other 50% of values).
Compared to the mean, the median is less influenced by outliers.
The median of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_median`.

```{r}
#| code-fold: true

exampleValues_pseudomedian <- DescTools::HodgesLehmann(exampleValues)
```

The Hodges-Lehmann statistic (aka pseudomedian) is computed as the median of all pairwise means, and it is also robust to outliers.
The pseudomedian of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_pseudomedian`.

```{r}
#| code-fold: true

exampleValues_mode <- petersenlab::Mode(exampleValues)
```

The mode is the most common/frequent value.
The mode of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_mode`.
The [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] contains the `Mode()` function for computing the mode of a set of data.

If you want to give some values more weight to others, you can calculate a weighted mean and a weighted median (or other quantile), while assigning a weight to each value.
The [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] contains various functions for computing the weighted median (i.e., a weighted quantile at the 0.5 quantile, which is equivalent to the 50th percentile) based on @Akinshin2023.
Because some projections are outliers, we use a trimmed version of the weighted Harrell-Davis quantile estimator for greater robustness.

Below is R code to estimate each:

```{r}
mean(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
median(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
DescTools::HodgesLehmann(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
petersenlab::Mode(player_stats_seasonal$fantasyPoints)

weighted.mean(
  player_stats_seasonal$fantasyPoints,
  weights = sample( # randomly generate weights (could specify them manually)
    x = 1:3,
    size = length(player_stats_seasonal$fantasyPoints),
    replace = TRUE),
  na.rm = TRUE)

petersenlab::wthdquantile(
  player_stats_seasonal$fantasyPoints,
  w = sample( # randomly generate weights (could specify them manually)
    x = 1:3,
    size = length(player_stats_seasonal$fantasyPoints),
    replace = TRUE),
  probs = 0.5)
```

### Spread {#sec-descriptiveStatisticsSpread}

Indices to describe the *spread* (variability) of a variable's data include:

- standard deviation
- variance
- range
- minimum and maximum
- interquartile range (IQR)
- median absolute deviation

The (sample) variance of $X$ (written as: $s^2$) is calculated as in @eq-variance:

$$
s^2 = \frac{\sum (X_i - \bar{X})^2}{n-1}
$$ {#eq-variance}

where $X_i$ is each data point, $\bar{X}$ is the mean of $X$, and $n$ is the number of data points.

The (sample) standard deviation of $X$ (written as: $s$) is calculated as in @eq-sd:

$$
s = \sqrt{\frac{\sum (X_i - \bar{X})^2}{n-1}}
$$ {#eq-sd}

The range is calculated of $X$ is calculated as in @eq-range:

$$
\text{range} = \text{maximum} - \text{minimum}
$$ {#eq-range}

The interquartile range (IQR) is calculated as in @eq-IQR:

$$
\text{IQR} = Q_3 - Q_1
$$ {#eq-IQR}

where $Q_3$ is the score at the third quartile (i.e., 75th percentile), and $Q_1$ is the score at the first quartile (i.e., 25th percentile).

The median absolute deviation (MAD) is the median of all deviations from the median, and is calculated as in @eq-medianAbsoluteDeviation:

$$
\text{MAD} = \text{median}(|X_i - \tilde{X}|)
$$ {#eq-medianAbsoluteDeviation}

where $\tilde{X}$ is the median of `X`.
Compared to the standard deviation, the median absolute deviation is more robust to outliers.

Below is R code to estimate each:

```{r}

```

### Shape {#sec-descriptiveStatisticsShape}

Indices to describe the *shape* of a variable's data include:

- skewness
- kurtosis

Positive skewness (right-skewed) reflects a longer or heavier right-tailed distribution, whereas negative skewness (left-skewed) reflects a longer or heavier left-tailed distribution.
Fantasy points tend to be positively skewed.

The kurtosis reflects the extent of extreme (outlying) values in a distribution relative to a normal distribution (or bell curve).
A mesokurtic distribution (with a kurtosis value near zero) reflects a normal amount of tailedness.
Positive kurtosis values reflect a leptokurtic distribution, where there are lighter tails and a sharper peak than a normal distribution.
Negative kurtosis values reflect a platykurtic distribution, where there are heavier tails and a flatter peak than a normal distribution.
Fantasy points tend to have a leptokurtic distribution.

Below is R code to estimate each:

```{r}

```

### Combination {#sec-descriptiveStatisticsCombination}

To estimate multiple indices of center, spread, and shape of the data, you can use the following code:

```{r}
psych::describe(player_stats_seasonal["fantasyPoints"])

player_stats_seasonal %>% 
  select(age, years_of_experience, fantasyPoints) %>% 
  summarise(across(
      everything(),
      .fns = list(
        n = ~ length(na.omit(.)),
        missingness = ~ mean(is.na(.)) * 100,
        M = ~ mean(., na.rm = TRUE),
        SD = ~ sd(., na.rm = TRUE),
        min = ~ min(., na.rm = TRUE),
        max = ~ max(., na.rm = TRUE),
        range = ~ max(., na.rm = TRUE) - min(., na.rm = TRUE),
        IQR = ~ IQR(., na.rm = TRUE),
        MAD = ~ mad(., na.rm = TRUE),
        median = ~ median(., na.rm = TRUE),
        pseudomedian = ~ DescTools::HodgesLehmann(., na.rm = TRUE),
        mode = ~ petersenlab::Mode(., multipleModes = "mean"),
        skewness = ~ psych::skew(., na.rm = TRUE),
        kurtosis = ~ psych::kurtosi(., na.rm = TRUE)),
      .names = "{.col}.{.fn}")) %>%
    pivot_longer(
      cols = everything(),
      names_to = c("variable","index"),
      names_sep = "\\.") %>% 
    pivot_wider(
      names_from = index,
      values_from = value)
```

## Scores and Scales {#sec-scoresAndScales}

There are many different types of scores and scales.
This book focuses on [raw scores](#sec-rawScores) and [*z*-scores](#sec-zScores).
For information on other scores and scales, including percentile ranks, *T*-scores, standard scores, scaled scores, and stanine scores, see here: <https://isaactpetersen.github.io/Principles-Psychological-Assessment/scoresScales.html#scoreTransformation> [@PetersenPrinciplesPsychAssessment].

### Raw Scores {#sec-rawScores}

*Raw scores* are the original data on the original metric.
Thus, raw scores are considered *unstandardized*.
For example, raw scores that represent the players' age may range from 20 to 40.
Raw scores depend on the construct and unit; thus raw scores may not be comparable across variables.

### *z* Scores {#sec-zScores}

*z* scores have a mean of zero and a standard deviation of one.
*z* scores are frequently used to render scores across variables more comparable.
Thus, *z* scores are considered a form of a *standardized* score.

*z* scores are calculated using @eq-zScore:

$$
z = \frac{X - \bar{X}}{\sigma}
$$ {#eq-zScore}

where $X$ is the observed score, $\bar{X}$ is the mean observed score, and $\sigma$ is the standard deviation of the observed scores.

You can easily convert a variable to a *z* score using the `scale()` function:

```{r}
#| eval: false

scale(variable)
```

With a standard normal curve, 68% of scores fall within one standard deviation of the mean.
95% of scores fall within two standard deviations of the mean.
99.7% of scores fall within three standard deviations of the mean.

The area under a normal curve within one standard deviation of the mean is calculated below using the `pnorm()` function, which calculates the cumulative density function for a normal curve.

```{r}
stdDeviations <- 1

pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```

The area under a normal curve within one standard deviation of the mean is depicted in @fig-zScoreDensity1SD.

```{r}
#| fig.cap: "Density of Standard Normal Distribution. The blue region represents the area within one standard deviation of the mean."
#| fig.scap: "Density of Standard Normal Distribution: One Standard Deviation of the Mean."
#| label: fig-zScoreDensity1SD
#| code-fold: true

x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
     xlab = "z Score",
     ylab = "Normal Density")

x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
        c(0, y, 0),
        col = "blue")
```

The area under a normal curve within two standard deviations of the mean is calculated below:

```{r}
stdDeviations <- 2

pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```

The area under a normal curve within two standard deviations of the mean is depicted in @fig-zScoreDensity2SD.

```{r}
#| fig.cap: "Density of Standard Normal Distribution. The blue region represents the area within two standard deviations of the mean."
#| fig.scap: "Density of Standard Normal Distribution: Two Standard Deviations of the Mean."
#| label: fig-zScoreDensity2SD
#| code-fold: true

x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
     xlab = "z Score",
     ylab = "Normal Density")

x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
        c(0, y, 0),
        col = "blue")
```

The area under a normal curve within three standard deviations of the mean is calculated below:

```{r}
stdDeviations <- 3

pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```

The area under a normal curve within three standard deviations of the mean is depicted in @fig-zScoreDensity3SD.

```{r}
#| fig.cap: "Density of Standard Normal Distribution. The blue region represents the area within three standard deviations of the mean."
#| fig.scap: "Density of Standard Normal Distribution: Three Standard Deviations of the Mean."
#| label: fig-zScoreDensity3SD
#| code-fold: true

x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
     xlab = "z Score",
     ylab = "Normal Density")

x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
        c(0, y, 0),
        col = "blue")
```

If you want to determine the *z* score associated with a particular percentile in a normal distribution, you can use the `qnorm()` function.
For instance, the *z* score associated with the 37th percentile is:

```{r}
qnorm(.37)
```

## Inferential Statistics {#sec-inferentialStatistics}

Inferential statistics are used to draw inferences regarding whether there is (a) a difference in level on variable across groups or (b) an association between variables.
For instance, inferential statistics may be used to evaluate whether Quarterbacks tend to have longer careers compared to Running Backs.
Or, they could be used to evaluate whether number of carries is associated with injury likelihood.
To apply inferential statistics, we make use of the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$).

### Null Hypothesis Significance Testing {#sec-nhst}

To draw statistical inferences, the frequentist statistics paradigm leverages null hypothesis significance testing.
Frequentist statistics is the most widely used statistical paradigm.
However, frequentist statistics is not the only statistical paradigm.
Other statistical paradigms exist, including [Bayesian statistics](#sec-bayesTheorem), which is based on [Bayes' theorem](#sec-bayesTheorem).
This chapter focuses on the frequentist approach to hypothesis testing, known as null hypothesis significance testing.
We discuss Bayesian statistics in @sec-baseRates.

#### Null Hypothesis ($H_0$) {#sec-nullHypothesis}

When testing whether there are differences in level across groups on a variable of interest, the null hypothesis ($H_0$) is that there is <u>no difference</u> in level across groups.
For instance, when testing whether Quarterbacks tend to have longer careers compared to Running Backs, the null hypothesis ($H_0$) is that Quarterbacks do not systematically differ from Running Backs in the length of their career.

When testing whether there is an association between variables, the null hypothesis ($H_0$) is that there is <u>no association</u> between the variables.
For instance, when testing whether number of carries is associated with injury likelihood, the null hypothesis ($H_0$) is that there is no association between number of carries and injury likelihood.

#### Alternative Hypothesis ($H_1$) {#sec-alternativeHypothesis}

The alternative hypothesis ($H_1$) is the researcher's hypothesis that they want to evaluate.
An alternative hypothesis ($H_1$) might be directional (i.e., one-sided) or non-directional (i.e., two-sided).

Directional hypotheses specify a particular direction, such as which group will have larger scores or which direction (positive or negative) two variables will be associated.
Examples of directional hypotheses include:

- Quarterbacks have <u>longer</u> careers compared to Running Backs
- Number of carries is <u>positively</u> associated with injury likelihood

Non-directional hypotheses do not specify a particular direction.
For instance, non-directional hypotheses may state that two groups differ but do not specify which group will have larger scores.
Or, non-directional hypotheses may state that two variables are associated but do not state what the sign is of the association—i.e., positive or negative.
Examples of non-directional hypotheses include:

- Quarterbacks <u>differ</u> in the length of their careers compared to Running Backs
- Number of carries is <u>associated</u> with injury likelihood

#### Statistical Significance {#sec-statisticalSignificance}

In science, statistical significance is evaluated with the *p*-value.
The *p*-value does not represent the probability that you observed the result by chance.
The *p*-value represents a conditional probability—it examines the probability of one event given another event.
In particular, the *p*-value evaluates the likelihood that you would detect a result as at least as extreme as the one observed (in terms of the magnitude of the difference or of the association) given that the null hypothesis ($H_0$) is true.

This can be expressed in conditional probability notation, $P(A | B)$, which is the probability (likelihood) of event A occurring given that event B occurred (or given condition B).

The conditional probability notation for a left-tailed directional test (i.e., Quarterbacks have <u>shorter</u> careers than Running Backs; or number of carries is <u>negatively</u> associated with injury likelihood) is in @eq-pvalueLeftTailed.

$$
p\text{-value} = P(T \le t | H_0)
$$ {#eq-pvalueLeftTailed}

where $T$ is the test statistic of interest (e.g., the distribution of $t$-, $r-$, or $F$ values, depending on the test) and $t$ is the observed test statistic (e.g., $t$-, $r-$, or $F$-coefficient, depending on the test).

The conditional probability notation for a right-tailed directional test (i.e., Quarterbacks have <u>longer</u> careers than Running Backs; or number of carries is <u>positively</u> associated with injury likelihood) is in @eq-pvalueRightTailed.

$$
p\text{-value} = P(T \ge t | H_0)
$$ {#eq-pvalueRightTailed}

The conditional probability notation for a two-tailed non-directional test (i.e., Quarterbacks <u>differ</u> in the length of their careers compared to Running Backs; or number of carries is <u>associated</u> with injury likelihood) is in @eq-pvalueTwoTailed.

$$
p\text{-value} = 2 \times \text{min}(P(T \le t | H_0), P(T \ge t | H_0))
$$ {#eq-pvalueTwoTailed}

where `min(a, b)` is the smaller number of `a` and `b`.

If the distribution of the test statistic is symmetric around zero, the *p*-value for the two-tailed non-directional test simplifies to @eq-pvalueTwoTailedSimple.

$$
p\text{-value} = 2 \times P(T \ge |t| | H_0)
$$ {#eq-pvalueTwoTailedSimple}

Nevertheless, to be conservative (i.e., to avoid false positive/Type I errors), many researchers use two-tailed *p*-values regardless whether their hypothesis is one- or two-tailed.

For a test of group differences, the *p*-value evaluates the likelihood that you would observe a difference as large or larger than the one you observed between the groups if there were no systematic difference between the groups in the population, as depicted in @fig-pValuesDifference.
For instance, when evaluating whether Quarterbacks have <u>longer</u> careers than Running Backs, and you observed a mean difference of 0.03 years, the *p*-value evaluates the likelihood that you would observe a difference as large or larger than 0.03 years between the groups if, in truth among all Quarterbacks and Running Backs in the NFL, Quarterbacks do not differ from Running Backs in terms of the length of their career.

```{r}
#| label: fig-pValuesDifference
#| layout-ncol: 2
#| fig-cap: "Interpretation of *p*-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means."
#| fig-alt: "Interpretation of *p*-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means."
#| fig-subcap: 
#|   - "What is the probability my data would look like this..."
#|   - "...if in the population, the groups were really this?"
#| code-fold: true

set.seed(52242)

nObserved <- 1000
nPopulation <- 1000000

observedGroups <- data.frame(
  score = c(rnorm(nObserved, mean = 47, sd = 3), rnorm(nObserved, mean = 52, sd = 3)),
  group = as.factor(c(rep("Group 1", nObserved), rep("Group 2", nObserved)))
)

populationGroups <- data.frame(
  score = c(rnorm(nPopulation, mean = 50, sd = 3.03), rnorm(nPopulation, mean = 50, sd = 3)),
  group = as.factor(c(rep("Group 1", nPopulation), rep("Group 2", nPopulation)))
)

ggplot2::ggplot(
  data = observedGroups,
  mapping = aes(
    x = score,
    fill = group,
    color = group
  )
) +
  geom_density(alpha = 0.5) +
  scale_color_manual(values = c("red", "blue")) +
  scale_fill_manual(values = c("red","blue")) +
  geom_vline(xintercept = mean(observedGroups$score[which(observedGroups$group == "Group 1")])) +
  geom_vline(xintercept = mean(observedGroups$score[which(observedGroups$group == "Group 2")])) +
  ggplot2::labs(
    x = "Score",
    y = "Frequency",
    title = "What is the probability my data would look like this..."
  ) +
  ggplot2::theme_classic(
    base_size = 16) +
  ggplot2::theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    #plot.title.position = "plot"
    legend.position = "inside",
    legend.margin = margin(0, 0, 0, 0),
    legend.justification.top = "left",
    legend.justification.left = "top",
    legend.justification.bottom = "right",
    legend.justification.inside = c(1, 1),
    legend.location = "plot")

ggplot2::ggplot(
  data = populationGroups,
  mapping = aes(
    x = score,
    fill = group,
    color = group
  )
) +
  geom_density(alpha = 0.5) +
  scale_color_manual(values = c("red", "blue")) +
  scale_fill_manual(values = c("red","blue")) +
  geom_vline(xintercept = mean(populationGroups$score[which(populationGroups$group == "Group 1")])) +
  geom_vline(xintercept = mean(populationGroups$score[which(populationGroups$group == "Group 2")])) +
  ggplot2::labs(
    x = "Score",
    y = "Frequency",
    title = "...if in the population, the groups were really this:"
  ) +
  ggplot2::theme_classic(
    base_size = 16) +
  ggplot2::theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    #plot.title.position = "plot",
    legend.position = "inside",
    legend.margin = margin(0, 0, 0, 0),
    legend.justification.top = "left",
    legend.justification.left = "top",
    legend.justification.bottom = "right",
    legend.justification.inside = c(1, 1),
    legend.location = "plot")
```

For a test of whether two variables are associated, the *p*-value evaluates the likelihood that you would observe an association as strong or stronger than the one you observed if there were no actual association between the variables in the population, as depicted in @fig-pValuesAssociation.
For instance, when evaluating whether number of carries is <u>positively</u> associated with injury likelihood, and you observed a correlation coefficient of $r = .25$ between number of carries and injury likelihood, the *p*-value evaluates the likelihood that you would observe a correlation as strong or stronger than $r = .25$ between the variables if, in truth among all NFL Running Backs, number of carries is not associated with injury likelihood.

```{r}
#| label: fig-pValuesAssociation
#| layout-ncol: 2
#| fig-cap: "Interpretation of *p*-Values When Examining The Association Between Variables."
#| fig-alt: "Interpretation of *p*-Values When Examining The Association Between Variables."
#| fig-subcap: 
#|   - "What is the probability my data would look like this..."
#|   - "...if in the population, the association was really this?"
#| code-fold: true

set.seed(52242)

observedCorrelation <- 0.9

correlations <- data.frame(criterion = rnorm(2000))
correlations$sample <- NA
correlations$sample[1:100] <- complement(correlations$criterion[1:100], observedCorrelation)
correlations$population <- complement(correlations$criterion, 0)

ggplot2::ggplot(
  data = correlations,
  mapping = aes(
    x = sample,
    y = criterion
  )
) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_continuous(
    limits = c(-3.5,3)
  ) +
  annotate(
    x = 0,
    y = 4,
    label = paste("italic(r) != ", 0, sep = ""),
    parse = TRUE,
    geom = "text",
    size = 7) + 
  labs(
    x = "Predictor Variable",
    y = "Outcome Variable",
    title = "What is the probability my data would look like this..."
  ) +
  theme_classic(
    base_size = 16) +
  theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank())

ggplot2::ggplot(
  data = correlations,
  mapping = aes(
    x = population,
    y = criterion
  )
) +
  geom_point() +
  geom_smooth(
    method = "lm",
    se = FALSE) +
  scale_x_continuous(
    limits = c(-2.5,2.5)
  ) +
  annotate(
    x = 0,
    y = 4,
    label = paste("italic(r) == '", "0.00", "'", sep = ""),
    parse = TRUE,
    geom = "text",
    size = 7) + 
  labs(
    x = "Predictor Variable",
    y = "Outcome Variable",
    title = "...if in the population, the association was really this:"
  ) +
  theme_classic(
    base_size = 16) +
  theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank())
```

Using what is called null-hypothesis significance testing (NHST), we consider an effect to be *statistically significant* if the *p*-value is less than some threshold, called the *alpha level*.
In science, we typically want to be conservative because a false positive (i.e., Type I error) is considered more problematic than a false negative (i.e., Type II error).
That is, we would rather say an effect does not exist when it really does than to say an effect does exist when it really does not.
Thus, we typically set the alpha level to a low value, commonly .05.
Then, we would consider an effect to be *statistically significant* if the *p*-value is less than .05.
That is, there is a small chance (5%; or 1 in 20 times) that we would observe an effect at least as extreme as the effect observed, if the null hypothesis were true.
So, you might expect around 5% of tests where the null hypothesis is true to be statistically significant just by chance.
We could lower the rate of Type II (i.e., false negative) errors—i.e., we could detect more effects—if we set the alpha level to a higher value (e.g., .10); however, raising the alpha level would raise the possibility of Type I (false positive) errors.

If the *p*-value is less than .05, we reject the null hypothesis ($H_0$) that there was no difference or association.
Thus, we conclude that there was a statistically significant (non-zero) difference or association.
If the *p*-value is greater than .05, we fail to reject the null hypothesis; the difference/association was not statistically significant.
Thus, we do not have confidence that there was a difference or association.
However, we do not accept the null hypothesis; it could be there we did not observe an effect because we did not have adequate power to detect the effect—e.g., if the [effect size](#sec-practicalSignificance) was small, the data were noisy, and the [sample size](#sec-sampleVsPopulation) was small and/or unrepresentative.

There are four general possibilities of decision making outcomes when performing null-hypothesis significance testing:

1. We (correctly) reject the null hypothesis when it is in fact false ($1 - \beta$).
This is a true positive.
For instance, we may correctly determine that Quarterbacks have longer careers than Running Backs.
1. We (correctly) fail to reject the null hypothesis when it is in fact true ($1 - \alpha$).
This is a true negative.
For instance, we may correctly determine that Quarterbacks do not have longer careers than Running Backs.
1. We (incorrectly) reject the null hypothesis when it is in fact true ($\alpha$).
This is a false positive.
When performing null hypothesis testing, a false positive is known as a Type I error.
For instance, we may incorrectly determine that Quarterbacks have longer careers than Running Backs when, in fact, Quarterbacks and Running Backs do not differ in their career length.
1. We (incorrectly) fail to reject the null hypothesis when it is in fact false ($\beta$).
This is a false negative.
When performing null hypothesis testing, a false negative is known as a Type II error.
For instance, we may incorrectly determine that Quarterbacks and Running Backs do not differ in their career length when, in fact, Quarterbacks have longer careers than Running Backs.

A two-by-two confusion matrix for null-hypothesis significance testing is in @fig-nhstConfusionMatrix.

::: {#fig-nhstConfusionMatrix}
![](images/nhstConfusionMatrix.png){fig-alt="A Two-by-Two Confusion Matrix for Null-Hypothesis Significance Testing."}

A Two-by-Two Confusion Matrix for Null-Hypothesis Significance Testing.
:::

In statistics, *power* is the probability of detecting an effect, if, in fact, the effect exists.
Otherwise said, power is the probability of rejecting the null hypothesis, if, in fact, the null hypothesis is false.
Power is influenced by several variables:

- the [sample size](#sec-sampleVsPopulation) (*N*): the larger the *N*, the greater the power
    - for group comparisons, the power depends on the [sample size](#sec-sampleVsPopulation) of each group
- the [effect size](#sec-practicalSignificance): the larger the effect, the greater the power
    - for group comparisons, larger effect sizes reflect:
        - larger between-group variance, and
        - smaller within-group variance (i.e., strong measurement precision, i.e., [reliability](#sec-reliability))
- the alpha level: the researcher specifies the alpha level (though it is typically set at .05); the higher the alpha level, the greater the power; however, the higher we set the alpha level, the higher the likelihood of Type I errors (false positives)
- one- versus two-tailed tests: one-tailed tests have higher power than two-tailed tests
- [within-subject](#sec-withinSubject) versus [between-subject](#sec-betweenSubject) comparisons: [within-subject designs](#sec-withinSubject) tend to have greater power than [between-subject designs](#sec-betweenSubject)

A plot of statistical power is in @fig-nhst.

```{r}
#| label: fig-nhst
#| fig-cap: "Statistical Power (Adapted from Kristoffer Magnusson: <https://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics>; archived at <https://perma.cc/FG3J-85L6>). The dashed line represents the critical value or threshold."
#| fig-alt: "Statistical Power (Adapted from Kristoffer Magnusson: <https://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics>; archived at <https://perma.cc/FG3J-85L6>). The dashed line represents the critical value or threshold."
#| code-fold: true

m1 <- 0  # mu H0
sd1 <- 1.5 # sigma H0
m2 <- 3.5 # mu HA
sd2 <- 1.5 # sigma HA
 
z_crit <- qnorm(1-(0.05/2), m1, sd1)
 
# set length of tails
min1 <- m1-sd1*4
max1 <- m1+sd1*4
min2 <- m2-sd2*4
max2 <- m2+sd2*4          
# create x sequence
x <- seq(min(min1,min2), max(max1, max2), .01)
# generate normal dist #1
y1 <- dnorm(x, m1, sd1)
# put in data frame
df1 <- data.frame("x" = x, "y" = y1)
# generate normal dist #2
y2 <- dnorm(x, m2, sd2)
# put in data frame
df2 <- data.frame("x" = x, "y" = y2)
 
# Alpha polygon
y.poly <- pmin(y1,y2)
poly1 <- data.frame(x=x, y=y.poly)
poly1 <- poly1[poly1$x >= z_crit, ] 
poly1<-rbind(poly1, c(z_crit, 0))  # add lower-left corner
 
# Beta polygon
poly2 <- df2
poly2 <- poly2[poly2$x <= z_crit,] 
poly2<-rbind(poly2, c(z_crit, 0))  # add lower-left corner
 
# power polygon; 1-beta
poly3 <- df2
poly3 <- poly3[poly3$x >= z_crit,] 
poly3 <-rbind(poly3, c(z_crit, 0))  # add lower-left corner
 
# combine polygons. 
poly1$id <- 3 # alpha, give it the highest number to make it the top layer
poly2$id <- 2 # beta
poly3$id <- 1 # power; 1 - beta
poly <- rbind(poly1, poly2, poly3)
poly$id <- factor(poly$id,  labels=c("power","beta","alpha"))

# plot with ggplot2
ggplot(poly, aes(x,y, fill=id, group=id)) +
  geom_polygon(show.legend=F, alpha=I(8/10)) +
  # add line for treatment group
  geom_line(data=df1, aes(x,y, color="H0", group=NULL, fill=NULL), linewidth=1.5, show_guide=F) + 
  # add line for treatment group. These lines could be combined into one dataframe.
  geom_line(data=df2, aes(color="HA", group=NULL, fill=NULL),linewidth=1.5, show_guide=F) +
  # add vlines for z_crit
  geom_vline(xintercept = z_crit, linewidth=1, linetype="dashed") +
  # change colors 
  scale_color_manual("Group", 
                     values= c("HA" = "#981e0b","H0" = "black")) +
  scale_fill_manual("test", values= c("alpha" = "#0d6374","beta" = "#be805e","power"="#7cecee")) +
  # beta arrow
  annotate("segment", x=0.1, y=0.045, xend=1.3, yend=0.01, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
  annotate("text", label="beta", x=0, y=0.05, parse=T, size=8) +
  # alpha arrow
  annotate("segment", x=4, y=0.043, xend=3.4, yend=0.01, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
  annotate("text", label="frac(alpha,2)", x=4.2, y=0.05, parse=T, size=8) +
  # power arrow
  annotate("segment", x=6, y=0.2, xend=4.5, yend=0.15, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
  annotate("text", label=expression(paste(1-beta, "  (\"power\")")), x=6.1, y=0.21, parse=T, size=8) +
  # H_0 title
  annotate("text", label="H[0]", x=m1, y=0.28, parse=T, size=8) +
  # H_a title
  annotate("text", label="H[1]", x=m2, y=0.28, parse=T, size=8) +
  ggtitle("Statistical Power") +
  # remove some elements
  theme(
    panel.grid.minor = element_blank(),
    panel.grid.major = element_blank(),
    panel.background = element_blank(),
    plot.background = element_rect(fill="white"),
    panel.border = element_blank(),
    axis.line = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    plot.title = element_text(size=22))
```

Interactive visualizations by Kristoffer Magnusson on *p*-values and null-hypothesis significance testing are below:

- <https://rpsychologist.com/pvalue/> (archived at <https://perma.cc/JP9F-9ZVY>)
- <https://rpsychologist.com/d3/pdist/> (archived at <https://perma.cc/BE96-8LSJ>)
- <https://rpsychologist.com/d3/nhst/> (archived at <https://perma.cc/ZU9A-37F3>)

Twelve misconceptions about *p*-values [@Goodman2008] are in @tbl-pValueMisconceptions.

| Number | Misconception                                                                                                                              |
|:-------|:-------------------------------------------------------------------------------------------------------------------------------------------|
| 1      | If $p = .05$, the null hypothesis has only a 5% chance of being true.                                                                      |
| 2      | A nonsignificant difference (eg, $p > .05$) means there is no difference between groups.                                                   |
| 3      | A statistically significant finding is clinically important.                                                                               |
| 4      | Studies with $p$-values on opposite sides of .05 are conflicting.                                                                          |
| 5      | Studies with the same $p$-value provide the same evidence against the null hypothesis.                                                     |
| 6      | $p = .05$ means that we have observed data that would occur only 5% of the time under the null hypothesis.                                 |
| 7      | $p = .05$ and $p < .05$ mean the same thing.                                                                                               |
| 8      | $p$-values are properly written as inequalities (e.g., "$p \le .05$" when $p = .015$).                                                     |
| 9      | $p = .05$ means that if you reject the null hypothesis, the probability of a Type I error is only 5%.                                      |
| 10     | With a $p = .05$ threshold for significance, the chance of a Type I error will be 5%.                                                      |
| 11     | You should use a one-sided $p$-value when you don't care about a result in one direction, or a difference in that direction is impossible. |
| 12     | A scientific conclusion or treatment policy should be based on whether or not the $p$-value is significant.                                |

: Twelve Misconceptions About *p*-Values from @Goodman2008. Goodman also provides a discussion about why each statement is false. {#tbl-pValueMisconceptions}

That is, the *p*-value is <u>not</u>:

- the probability that the effect was due to chance
- the probability that the null hypothesis is true
- the size of the effect
- the importance of the effect
- whether the effect is true, real, or causal

Statistical significance involves the *consistency* of an effect/association/difference; it suggests that the association/difference is reliably non-zero.
However, just because something is statistically significant does not mean that it is important.
For instance, consider that we discover that players who consume sports drink before a game tend to perform better than players who do not ($p < .05$).
However, what if consumption of sports drinks is associated with an average improvement of 0.002 points per game.
A small effect such as this might be detectable with a large [sample size](#sec-sampleVsPopulation).
This effect would be considered to be reliable/consistent because it is statistically significant.
However, such an effect is so small that it results in differences that are not [practically important](#sec-practicalSignificance).
Thus, in addition to statistical significance, it is also important to consider [practical significance](#sec-practicalSignificance).

### Practical Significance {#sec-practicalSignificance}

*Practical significance* deals with how large or important the effect/association/difference is.
It is based on the magnitude of the effect, called the *effect size*.
Effect size can be quantified in various ways including:

- Cohen's $d$
- Standardized regression coefficient (beta; $\beta$)
- Correlation coefficient ($r$)
- Cohen's $\omega$ (omega)
- Cohen's $f$
- Cohen's $f^2$
- Coefficient of determination ($R^2$)
- Eta squared ($\eta^2$)
- Partial eta squared ($\eta_p^2$)

#### Cohen's $d$ {#sec-cohensD}

Cohen's $d$ is calculated as in @eq-cohensD:

$$
\begin{aligned}
  d &= \frac{\text{mean difference}}{\text{pooled standard deviation}} \\
   &= \frac{\bar{X_1} - \bar{X_2}}{s} \\
\end{aligned}
$$ {#eq-cohensD}

where:

$$
s = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}
$$ {#eq-pooledStandardDeviation}

where $n_1$ and $n_2$ is the sample size of group 1 and group 2, respectively, and $s_1$ and $s_2$ is the standard deviation of group 1 and group 2, respectively.

#### Standardized Regression Coefficient (Beta; $\beta$) {#sec-beta}

The standardized regression coefficient (beta; $\beta$) is used in multiple regression, and is calculated as in @eq-beta:

$$
\beta_x = B_x \times \frac{s_x}{s_y}
$$ {#eq-beta}

where $B_x$ is the unstandardized regression coefficient of the [predictor variable](#sec-correlationalStudy) $x$ in predicting the [outcome variable](#sec-correlationalStudy) $y$, $s_x$ is the standard deviation of $x$, and $s_y$ is the standard deviation of $y$.

#### Correlation Coefficient ($r$)

The formula for the correlation coefficient is in @sec-correlation.

#### Cohen's $\omega$ {#sec-cohensOmega}

Cohen's $\omega$ is used in chi-square tests, and is calculated as in @eq-cohensOmega:

$$
\omega = \sqrt{\frac{\chi^2}{N} - \frac{df}{N}}
$$ {#eq-cohensOmega}

where $\chi^2$ is the chi-square statistic from the test, $N$ is the sample size, and $df$ is the degrees of freedom.

#### Cohen's $f$ {#sec-cohensF}

Cohen's $f$ is commonly used in ANOVA, and is calculated as in @eq-cohensF:

$$
\begin{aligned}
  f &= \sqrt{\frac{R^2}{1 - R^2}} \\
    &= \sqrt{\frac{\eta^2}{1 - \eta^2}}
\end{aligned}
$$ {#eq-cohensF}

#### Cohen's $f^2$ {#sec-cohensFsquared}

Cohen's $f^2$ is commonly used in regression, and is calculated as in @eq-cohensFsquared:

$$
\begin{aligned}
  f^2 &= \frac{R^2}{1 - R^2} \\
      &= \frac{\eta^2}{1 - \eta^2}
\end{aligned}
$$ {#eq-cohensFsquared}

To calculate the effect size of a particular predictor, you can calculate $\Delta f^2$ as in @eq-deltaCohensFsquared:

$$
\begin{aligned}
  \Delta f^2 &= \frac{R^2_{\text{model}} - R^2_{\text{reduced}}}{1 - R^2_{\text{model}}} \\
             &= \frac{\eta^2_{\text{model}} - \eta^2_{\text{reduced}}}{1 - \eta^2_{\text{model}}}
\end{aligned}
$$ {#eq-deltaCohensFsquared}

where $R^2_{\text{model}}$ is the $R^2$ of the model with the [predictor variable](#sec-correlationalStudy) of interest and $R^2_{\text{reduced}}$ is the $R^2$ of the model without the [predictor variable](#sec-correlationalStudy) of interest.

#### Coefficient of Determination ($R^2$) {#sec-rSquared}

The coefficient of determination ($R^2$) reflects the proportion of variance in the [outcome variable](#sec-correlationalStudy) that is explained by the [predictor variable(s)](#sec-correlationalStudy).
$R^2$ is commonly used in regression, and is calculated as in @eq-rSquared:

$$
\begin{aligned}
  R^2 &= 1 - \frac{\sum (Y_i - \hat{Y}_i)^2}{\sum (Y_i - \bar{Y})^2} \\
      &= 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \\
      &= 1 - \frac{\text{sum of squared residuals}}{\text{total sum of squares}} \\
      &= \frac{f^2}{1 + f^2} \\
      &= \eta^2 \\
      &= \frac{\text{variance explained in }Y}{\text{total variance in }Y}
\end{aligned}
$$ {#eq-rSquared}

where $Y_i$ is the observed value of the [outcome variable](#sec-correlationalStudy) for the $i$th observation, $\hat{Y}_i$ is the model predicted value for the $i$th observation, $\bar{Y}$ is the mean of the observed values of the [outcome variable](#sec-correlationalStudy).
The total sum of squares is an index of the total variation in the [outcome variable](#sec-correlationalStudy).

#### Eta Squared ($\eta^2$) and Partial Eta Squared ($\eta_p^2$) {#sec-etaSquared}

Like $R^2$, eta squared ($\eta^2$) reflects the proportion of variance in the [dependent variable](#sec-experiment) that is explained by the [independent variable(s)](#sec-experiment).
$\eta^2$ is commonly used in ANOVA, and is calculated as in @eq-etaSquared:

$$
\begin{aligned}
  \eta^2 &= \frac{SS_{\text{effect}}}{SS_{\text{total}}} \\
      &= 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \\
      &= 1 - \frac{\text{sum of squared residuals}}{\text{total sum of squares}} \\
      &= \frac{f^2}{1 + f^2} \\
      &= R^2
\end{aligned}
$$ {#eq-etaSquared}

where $SS_{\text{effect}}$ is the sum of squares for the effect of interest and $SS_{\text{total}}$ is the total sum of squares.

Partial eta squared ($\eta_p^2$) reflects the proportion of variance in the [dependent variable](#sec-experiment) that is explained by the [independent variable](#sec-experiment) while controlling for the other [independent variables](#sec-experiment).
$\eta_p^2$ is commonly used in ANOVA, and is calculated as in @eq-partialEtaSquared:

$$
\eta_p^2 = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}}
$$ {#eq-partialEtaSquared}

where $SS_{\text{effect}}$ is the sum of squares for the effect of interest and $SS_{\text{error}}$ is the sum of squares for the residual error term.

#### Effect Size Thresholds {#sec-effectSizeThresholds}

Effect size thresholds [@Cohen1988; @McGrath2006] for small, medium, and large effect sizes are in @tbl-effectSizeThresholds.

| Effect Size Index                                   | Small       | Medium      | Large       |
|:----------------------------------------------------|:------------|:------------|:------------|
| Cohen's $d$                                         | $\ge |.20|$ | $\ge |.50|$ | $\ge |.80|$ |
| Standardized regression coefficient (beta; $\beta$) | $\ge |.10|$ | $\ge |.24|$ | $\ge |.37|$ |
| Correlation coefficient ($r$)                       | $\ge |.10|$ | $\ge |.24|$ | $\ge |.37|$ |
| Cohen's $\omega$                                    | $\ge .10$   | $\ge .30$   | $\ge .50$   |
| Cohen's $f$                                         | $\ge .10$   | $\ge .25$   | $\ge .40$   |
| Cohen's $f^2$                                       | $\ge .01$   | $\ge .06$   | $\ge .16$   |
| Coefficient of determination ($R^2$)                | $\ge .01$   | $\ge .06$   | $\ge .14$   |
| Eta squared ($\eta^2$)                              | $\ge .01$   | $\ge .06$   | $\ge .14$   |
| Partial eta squared ($\eta_p^2$)                    | $\ge .01$   | $\ge .06$   | $\ge .14$   |

: Effect Size Thresholds for Small, Medium, and Large Effect Sizes. {#tbl-effectSizeThresholds}

## Statistical Decision Tree {#sec-statisticalDecisionTree}

A statistical decision tree is a flowchart or decision tree that depicts which statistical test to use given the purpose of analysis, the type of data, etc.
An example statistical decision tree is depicted in @fig-statisticalDecisionTree.

::: {#fig-statisticalDecisionTree}
![](images/statisticalDecisionTree.png){fig-alt="A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including the addition of several statistical tests."}

A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including the addition of several statistical tests. *Note*: "Interval" as a level of measurement includes data with an "[interval](#sec-interval)" or higher level of measurement; thus, it also includes data with a "[ratio](#sec-ratio)" level of measurement.
:::

This statistical decision tree can be generally summarized such that associations are examined with the correlation/regression family, and differences are examined with the *t*-test/ANOVA family, as depicted in @fig-statisticalDecisionTreeSummary.

::: {#fig-statisticalDecisionTreeSummary}
![](images/statisticalDecisionTreeSummary.png){fig-alt="Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure."}

Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure.
:::

However, many statistical tests can be re-formulated in a regression framework, as in @fig-statisticalDecisionTreeRegression.

::: {#fig-statisticalDecisionTreeRegression}
![](images/statisticalDecisionTreeRegression.png){fig-alt="A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure, Re-Formulated in a Regression Framework. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including re-formulating the tests in a regression framework."}

A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure, Re-Formulated in a Regression Framework. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including re-formulating the tests in a regression framework.
:::

Both associations and differences can be examined with the regression family, which greatly simplifies our summary of the statistical decision tree, as depicted in @fig-statisticalDecisionTreeSummaryRegression.

::: {#fig-statisticalDecisionTreeSummaryRegression}
![](images/statisticalDecisionTreeSummaryRegression.png){fig-alt="Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure."}

Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure.
:::

Thus, in most cases, the regression framework can be used to examine most questions regarding associations between variables or differences between groups.

For an online, interactive statistical decision tree to help you decide which statistical analysis to use, see here: <https://www.statsflowchart.co.uk>

## Statistical Tests {#sec-statisticalTests}

### *t*-Test {#sec-tTest}

There are several *t*-tests:

- one-sample *t*-test
- two-samples *t*-test
    - independent samples *t*-test
    - paired samples *t*-test

A one-sample *t*-test is used to evaluate whether a sample mean differs systematically from a particular value.
The null hypothesis is that the sample mean does not differ systematically from the pre-specified value.
The alternative hypothesis is that the sample mean differs systematically from the pre-specified value.
For instance, let's say you want to test out a new draft strategy.
You could participate in a mock draft and draft players using the new strategy.
Then, you could use a one-sample *t*-test to evaluate whether your new draft strategy yields players with more projected points than the average of players' projected points for other teams.

Two-samples *t*-tests are used to test for differences between scores of two groups.
If the two groups are independent, the independent samples *t*-test is used.
If the two groups involve paired samples, the paired samples *t*-test is used.
The null hypothesis is that the mean of group 1 does not differ systematically from the mean of group 2.
The alternative hypothesis is that the mean of group 1 differs systematically from the mean of group 2.
For instance, you could use an independent-samples *t*-test if you want to examine whether Quarterbacks tend to have have longer careers than Running Backs.
By contrast, you could use a paired samples *t*-test if you want to examine whether Quarterbacks tend to score more points in the second year of their contract compared to their rookie year, because the same subjects were assessed twice (i.e., a [within-subject design](#sec-withinSubject)).

### Analysis of Variance {#sec-anova}

Analysis of variance (ANOVA) allows examining whether groups differ systematically as a function of one or more factors.
There are multiple variants of ANOVA:

- one-way ANOVA
- factorial ANOVA
- repeated measures ANOVA (RM-ANOVA)
- multivariate ANOVA (MANOVA)

Like two-samples *t*-tests, ANOVA allows examining whether groups differ as a function of an [independent variable](#sec-experiment).
However, unlike a *t*-test, ANOVA allows examining multiple multiple [independent variables](#sec-experiment) and more than two groups.
The null hypothesis is that the the groups' mean value does not differ systematically.
The alternative hypothesis is that the groups' mean value differs systematically.

A one-way ANOVA examines whether two or more groups differ as a function of an [independent variable](#sec-experiment).
For instance, you could use a one-way ANOVA to evaluate if you want to evaluate whether multiple positions differ in their length of career.
Factorial ANOVA examines whether two or more groups differ as a function of multiple [independent variables](#sec-experiment).
For instance, you could use factorial ANOVA to evaluate whether one's length of career depends on one's position and weight.
Repeated measures ANOVA examines whether scores differ across repeated measures (e.g., across time) for the same participants.
For instance, you could use repeated-measures ANOVA to evaluate whether rookies score more points as the season progresses.
Multivariate ANOVA examines whether multiple [dependent variables](#sec-experiment) differ as a function of one or more factor(s).
For instance, you could use MANOVA to evaluate whether one's contract length and pay differ as a function of one's position.

### Correlation {#sec-correlationBasic}

Correlation examines the association between a [predictor](#sec-correlationalStudy) and [outcome](#sec-correlationalStudy) variable.
The null hypothesis is that the the two variables are not associated.
The alternative hypothesis is that the two variables are associated.

The Pearson correlation coefficient ($r$) is calculated as in @eq-pearsonCorrelation:

$$
r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2 \sum (Y_i - \bar{Y})^2}}
$$ {#eq-pearsonCorrelation}

where $X$ is the [predictor variable](#sec-correlationalStudy) and $Y$ is the [outcome variable](#sec-correlationalStudy).

### (Multiple) Regression {#sec-regression}

Regression, like correlation, examines the association between a [predictor](#sec-correlationalStudy) and [outcome](#sec-correlationalStudy) variable.
However, unlike correlation, regression allows multiple [predictor variables](#sec-correlationalStudy).

Regression with a single predictor takes the form in @eq-regression.
A regression line is depicted in @fig-regression.
Multiple regression (i.e., regression with multiple predictors) takes the form in @eq-multipleRegression.

The null hypothesis is that the the [predictor variable(s)](#sec-correlationalStudy) are not associated with the [outcome variable](#sec-correlationalStudy).
The alternative hypothesis is that the [predictor variable(s)](#sec-correlationalStudy) are associated with the [outcome variable](#sec-correlationalStudy).

### Chi-Square Test {#sec-chisquare}

There are two primary types of chi-square tests:

- chi-square goodness-of-fit test
- chi-square test for association (aka test of independence)

The chi-square goodness-of-fit test evaluates whether a set of categorical data came from a specified distribution.
The null hypothesis is that the data came from the specified distribution.
The alternative hypothesis is that the data did not come from the specified distribution.

The chi-square test for association evaluates whether two categorical variables are associated.
The null hypothesis is that the two variables are not associated.
The alternative hypothesis is that the two variables are associated.

### Formulating Statistical Tests in Terms of Partitioned Variance {#sec-statisticalTestsPartitionedVariance}

Many statistical tests can be formulated in terms of partitioned variance.

For instance, the *t* statistic from the independent-samples *t*-test and the *F* statistic from ANOVA can be thought of as the ratio of between-group variance to within-group variance, as in @eq-tTestPartitionedVariance:

$$
t \text{ or } F = \frac{\text{between-group variance}}{\text{within-group variance}}
$$ {#eq-tTestPartitionedVariance}

The correlation coefficient can be thought of as the ratio of shared variance (i.e., covariance) to total variance, as in @eq-correlationPartitionedVariance:

$$
r = \frac{\text{shared variance}}{\text{total variance}}
$$ {#eq-correlationPartitionedVariance}

The coefficient of determination ($R^2$) is the proportion of variance in the [outcome variable](#sec-correlationalStudy) that is explained by the [predictor variables](#sec-correlationalStudy).
$\eta^2$ is the proportion of variance in the [dependent variable](#sec-experiment) that is explained by the [independent variables](#sec-experiment).
The coefficient of determination and $\eta^2$ can be expressed as the ratio of variance explained in the [outcome](#sec-correlationalStudy) or [dependent](#sec-experiment) variable to the total variance in the [outcome](#sec-correlationalStudy) or [dependent](#sec-experiment) variable, as in @eq-coefficientOfDeterminationPartitionedVariance:

$$
R^2 \text{ or } \eta^2 = \frac{\text{variance explained in the outcome variable}}{\text{total variance in the outcome variable}}
$$ {#eq-coefficientOfDeterminationPartitionedVariance}

### Critical Value {#sec-statisticalTestsCriticalValue}

The critical value is the test value for a given test, above which the effect is considered to be [statistically significant](#sec-statisticalSignificance).
The critical value for [statistical significance](#sec-statisticalSignificance) for each test can be determined based on the degrees of freedom and alpha level.
The degrees of freedom (*df*) refer to the number of values in the calculation of a test statistic that are free to vary.

```{r}
alpha <- .05
N <- 200
nGroup1 <- 150
nGroup2 <- 150
numGroups <- 4
numLevelsFactorA <- 3
numLevelsFactorB <- 4
numMeasurements <- 4
numPredictors <- 5
numCategories <- 6
numRows <- 5
numColumns <- 2
```

#### One-Sample *t*-Test {#sec-statisticalTestsCriticalValueOneSampleTtest}

For a one-sample *t*-test, the degrees of freedom is in @eq-dfOneSampleTtest:

$$
df = N - 1
$$ {#eq-dfOneSampleTtest}

where $N$ is sample size.

```{r}
df_oneSampleTtest <- N - 1
```

One-tailed test:

```{r}
qt(1 - alpha, df_oneSampleTtest)
```

Two-tailed test:

```{r}
qt(1 - alpha/2, df_oneSampleTtest)
```

#### Independent-Samples *t*-Test {#sec-statisticalTestsCriticalValueIndependentSamplesTtest}

For an independent-samples *t*-test, the degrees of freedom is in @eq-dfIndependentSamplesTtest:

$$
df = n_1 + n_2 - 2
$$ {#eq-dfIndependentSamplesTtest}

where $n_1$ is the sample size of group 1 and $n_2$ is the sample size of group 2.

```{r}
df_independentSamplesTtest <- nGroup1 + nGroup2 - 2
```

One-tailed test:

```{r}
qt(1 - alpha, df_independentSamplesTtest)
```

Two-tailed test:

```{r}
qt(1 - alpha/2, df_independentSamplesTtest)
```

#### Paired-Samples *t*-Test {#sec-statisticalTestsCriticalValuePairedSamplesTtest}

For a paired-samples *t*-test, the degrees of freedom is in @eq-dfPairedSamplesTtest:

$$
df = N - 1
$$ {#eq-dfPairedSamplesTtest}

where $N$ is sample size (i.e., the number of paired observations).

```{r}
df_pairedSamplesTtest <- N - 1
```

One-tailed test:

```{r}
qt(1 - alpha, df_pairedSamplesTtest)
```

Two-tailed test:

```{r}
qt(1 - alpha/2, df_pairedSamplesTtest)
```

#### One-Way ANOVA {#sec-statisticalTestsCriticalValueOneWayANOVA}

For a one-way ANOVA, the degrees of freedom is in @eq-dfOneWayANOVA:

$$
\begin{aligned}
  df_\text{between} &= g - 1 \\
  df_\text{within} &= N - g
\end{aligned}
$$ {#eq-dfOneWayANOVA}

where $N$ is sample size and $g$ is the number of groups.

```{r}
df_betweenOneWayANOVA <- numGroups - 1
df_withinOneWayANOVA <- N - numGroups
```

One-tailed test:

```{r}
qf(1 - alpha, df_betweenOneWayANOVA, df_withinOneWayANOVA)
```

Two-tailed test:

```{r}
qf(1 - alpha/2, df_betweenOneWayANOVA, df_withinOneWayANOVA)
```

#### Factorial ANOVA {#sec-statisticalTestsCriticalValueFactorialANOVA}

For a factorial two-way ANOVA, the degrees of freedom is in @eq-dfFactorialANOVA:

$$
\begin{aligned}
  df_\text{Factor A} &= a - 1 \\
  df_\text{Factor B} &= b - 1 \\
  df_\text{Interaction} &= (a - 1)(b - 1) \\
  df_\text{error} &= ab(N - 1)
\end{aligned}
$$ {#eq-dfFactorialANOVA}

where $N$ is sample size, $a$ is the number of levels for factor A, and $b$ is the number of levels for factor B.

```{r}
df_factorA <- numLevelsFactorA - 1
df_factorB <- numLevelsFactorB - 1
df_interaction <- df_factorA * df_factorB
df_error <- numLevelsFactorA * numLevelsFactorB * (N - 1)
```

Factor A (one-tailed test):

```{r}
qf(1 - alpha, df_factorA, df_error)
```

Factor B (one-tailed test):

```{r}
qf(1 - alpha, df_factorB, df_error)
```

Interaction (one-tailed test):

```{r}
qf(1 - alpha, df_interaction, df_error)
```

Factor A (two-tailed test):

```{r}
qf(1 - alpha/2, df_factorA, df_error)
```

Factor B (two-tailed test):

```{r}
qf(1 - alpha/2, df_factorB, df_error)
```

Interaction (two-tailed test):

```{r}
qf(1 - alpha/2, df_interaction, df_error)
```

#### Repeated Measures ANOVA {#sec-statisticalTestsCriticalValueRMANOVA}

For a repeated measures ANOVA, the degrees of freedom is in @eq-dfRepeatedMeasuresANOVA:

$$
\begin{aligned}
  df_1 &= T - 1 \\
  df_2 &= (T - 1)(N - 1)
\end{aligned}
$$ {#eq-dfRepeatedMeasuresANOVA}

where $N$ is sample size and $T$ is the number of measurements (i.e., the number of levels of the within-person factor: e.g., timepoints or conditions).

```{r}
df1_RMANOVA <- numMeasurements - 1
df2_RMANOVA <- (numMeasurements - 1) * (N - 1)
```

One-tailed test:

```{r}
qf(1 - alpha, df1_RMANOVA, df2_RMANOVA)
```

Two-tailed test:

```{r}
qf(1 - alpha/2, df1_RMANOVA, df2_RMANOVA)
```

#### Correlation {#sec-statisticalTestsCriticalValueCorrelation}

For a correlation, the degrees of freedom is in @eq-dfCorrelation:

$$
df = N - 2
$$ {#eq-dfCorrelation}

where $N$ is sample size.

```{r}
df_correlation <- N - 2
```

One-tailed test:

```{r}
qt(1 - alpha, df_correlation)
```

Two-tailed test:

```{r}
qt(1 - alpha/2, df_correlation)
```

#### Multiple Regression {#sec-statisticalTestsCriticalValueMultipleRegression}

For multiple regression, the degrees of freedom is in @eq-dfMultipleRegression:

$$
\begin{aligned}
  df_1 &= p \\
  df_2 &= N - p - 1
\end{aligned}
$$ {#eq-dfMultipleRegression}

where $N$ is sample size and $p$ is the number of predictors.

```{r}
df1_regression <- numPredictors
df2_regression <- N - numPredictors - 1
```

One-tailed test:

```{r}
qf(1 - alpha, df1_regression, df2_regression)
```

Two-tailed test:

```{r}
qf(1 - alpha/2, df1_regression, df2_regression)
```

#### Chi-Square Goodness-of-Fit Test {#sec-statisticalTestsCriticalValueChiSquareGOF}

For the chi-square goodness-of-fit test, the degrees of freedom is in @eq-dfChiSquareGOF:

$$
df = c - 1
$$ {#eq-dfChiSquareGOF}

where $c$ is the number of categories.

```{r}
df_chisquareGOF <- numCategories - 1
```

One-tailed test:

```{r}
qchisq(1 - alpha, df_chisquareGOF)
```

Two-tailed test:

```{r}
qchisq(1 - alpha/2, df_chisquareGOF)
```

#### Chi-Square Test for Association {#sec-statisticalTestsCriticalValueChiSquareAssociation}

For the chi-square test for association, the degrees of freedom is in @eq-dfChiSquareAssociation:

$$
df = (r - 1) \times (c - 1)
$$ {#eq-dfChiSquareAssociation}

where $r$ is the number of rows in the contingency table and $c$ is the number of columns in the contingency table.

```{r}
df_chisquareAssociation <- (numRows - 1) * (numColumns - 1)
```

One-tailed test:

```{r}
qchisq(1 - alpha, df_chisquareAssociation)
```

Two-tailed test:

```{r}
qchisq(1 - alpha/2, df_chisquareAssociation)
```

### Statistical Power {#sec-statisticalPower}

As described above, *statistical power* is the probability of detecting an effect, if, in fact, the effect exists.
Statistical power for a given test can be calculated based on three factors:

- [effect size](#sec-practicalSignificance)
- [sample size](#sec-sampleVsPopulation)
- [alpha level](#sec-statisticalSignificance)

Knowing any three of the following, you can calculate the fourth: statistical power, [effect size](#sec-practicalSignificance), [sample size](#sec-sampleVsPopulation), and [alpha level](#sec-statisticalSignificance).
Below is `R` code for calculating power for each of various statistical tests (i.e., a *power analysis*).
For free point-and-click software for calculating statistical power, see G\*Power: <https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html>

```{r}
power <- .8
effectSize_d <- .5
effectSize_r <- .24
effectSize_beta <- .24
effectSize_f <- .25
effectSize_fSquared <- .06
effectSize_omega <- .3
```

When designing a study, it is important to consider power and the [sample size](#sec-sampleVsPopulation) needed to detect the hypothesized effect size.
If your [sample size](#sec-sampleVsPopulation) is too small and you do not detect an effect (i.e., $p > .05$), you do not know whether your failure to detect the effect was because a) the effect does not exist, or b) the effect exists but you did not have enough power to detect it.

#### One-Sample *t*-Test {#sec-statisticalPowerOneSampleTtest}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  n = N,
  d = effectSize_d,
  sig.level = alpha,
  type = "one.sample",
  alternative = "two.sided")
```

Solving for sample size needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  power = power,
  d = effectSize_d,
  sig.level = alpha,
  type = "one.sample",
  alternative = "two.sided")
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  power = power,
  n = N,
  sig.level = alpha,
  type = "one.sample",
  alternative = "two.sided")
```

#### Independent-Samples *t*-Test {#sec-statisticalPowerIndependentSamplesTtest}

##### Balanced Group Sizes {#sec-statisticalPowerIndependentSamplesTtestBalanced}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  n = N,
  d = effectSize_d,
  sig.level = alpha,
  type = "two.sample",
  alternative = "two.sided")
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  power = power,
  d = effectSize_d,
  sig.level = alpha,
  type = "two.sample",
  alternative = "two.sided")
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  power = power,
  n = N,
  sig.level = alpha,
  type = "two.sample",
  alternative = "two.sided")
```

##### Unbalanced Group Sizes {#sec-statisticalPowerIndependentSamplesTtestUnbalanced}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t2n.test(
  n1 = nGroup1,
  n2 = nGroup2,
  d = effectSize_d,
  sig.level = alpha,
  alternative = "two.sided")
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t2n.test(
  power = power,
  n1 = nGroup1,
  d = effectSize_d,
  sig.level = alpha,
  alternative = "two.sided")
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t2n.test(
  power = power,
  n1 = nGroup1,
  n2 = nGroup2,
  sig.level = alpha,
  alternative = "two.sided")
```

#### Paired-Samples *t*-Test {#sec-statisticalPowerPairedSamplesTtest}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  n = N,
  d = effectSize_d,
  sig.level = alpha,
  type = "paired",
  alternative = "two.sided")
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  power = power,
  d = effectSize_d,
  sig.level = alpha,
  type = "paired",
  alternative = "two.sided")
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.t.test(
  power = power,
  n = N,
  sig.level = alpha,
  type = "paired",
  alternative = "two.sided")
```

#### One-Way ANOVA {#sec-statisticalPowerOneWayANOVA}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numGroups)
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numGroups)
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numGroups)
```

The power analysis code above assumes the groups are of equal size (i.e., a balanced design).
If the design is unbalanced (i.e., there are different numbers of participants in each group), it may be necessary to conduct a power analysis via a simulation.
Below is an example of evaluating the statistical power for detecting an effect unbalanced designs via simulation:

```{r}
nSim <- 1000 # number of simulations

# Function to generate data and perform ANOVA
simulate_anova <- function(nGroup1, nGroup2, f, alpha) {
  # Means for each group
  mean1 <- 0
  mean2 <- f * sqrt((nGroup1 + nGroup2) / 2)
  
  # Generate data
  group1 <- rnorm(nGroup1, mean = mean1, sd = 1)
  group2 <- rnorm(nGroup2, mean = mean2, sd = 1)
  
  # Combine data
  data <- data.frame(
    value = c(group1, group2),
    group = factor(rep(c("Group1", "Group2"), c(nGroup1, nGroup2)))
  )
  
  # Perform ANOVA
  aov_result <- aov(value ~ group, data = data)
  p_value <- summary(aov_result)[[1]][["Pr(>F)"]][1]
  
  # Check if p-value is less than alpha
  return(p_value < alpha)
}

# Run simulations
set.seed(52242) # for reproducibility
powerSimulationOneWayAnova <- replicate(
  nSim,
  simulate_anova(
    nGroup1 = 10,
    nGroup2 = 25,
    f = effectSize_f,
    alpha = alpha))

# Estimate power
mean(powerSimulationOneWayAnova)
```

#### Factorial ANOVA {#sec-statisticalPowerFactorialANOVA}

The power analysis code below assumes the groups are of equal size (i.e., a balanced design).
If the design is unbalanced (i.e., there are different numbers of participants in each group), it may be necessary to conduct a power analysis via a simulation.
See @sec-statisticalPowerOneWayANOVA for an example power analysis simulation for one-way ANOVA.

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA)

pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorB)

pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA + numLevelsFactorB)
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA)

pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorB)

pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA + numLevelsFactorB)
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numLevelsFactorA)

pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numLevelsFactorB)

pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numLevelsFactorA + numLevelsFactorB)
```

#### Repeated Measures ANOVA {#sec-statisticalPowerRepeatedMeasuresANOVA}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
WebPower::wp.rmanova(
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 0)

WebPower::wp.rmanova(
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 1)

WebPower::wp.rmanova(
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 2)
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
WebPower::wp.rmanova(
  power = power,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 0)

WebPower::wp.rmanova(
  power = power,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 1)

WebPower::wp.rmanova(
  power = power,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 2)
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
WebPower::wp.rmanova(
  power = power,
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  alpha = alpha,
  type = 0)

WebPower::wp.rmanova(
  power = power,
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  alpha = alpha,
  type = 1)

WebPower::wp.rmanova(
  power = power,
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  alpha = alpha,
  type = 2)
```

#### Correlation {#sec-statisticalPowerCorrelation}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size per group, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.r.test(
  n = N,
  r = effectSize_r,
  sig.level = alpha,
  alternative = "two.sided")
```

Solving for sample size per group needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.r.test(
  power = power,
  r = effectSize_r,
  sig.level = alpha,
  alternative = "two.sided")
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size per group, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.r.test(
  power = power,
  n = N,
  sig.level = alpha,
  alternative = "two.sided")
```

#### Multiple Regression {#sec-statisticalPowerMultipleRegression}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.f2.test(
  f2 = effectSize_fSquared,
  sig.level = alpha,
  u = numPredictors,
  v = N - numPredictors - 1)

pwrss::pwrss.t.reg(
  n = N,
  beta1 = effectSize_beta,
  k = numPredictors,
  alpha = alpha,
  alternative = "not equal")
```

Solving for sample size needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance))—$v = N - \text{numberOfPredictors} - 1$; thus, $N = v + \text{numberOfPredictors} + 1$:

```{r}
multipleRegressionSampleSizeModel <- pwr::pwr.f2.test(
  power = power,
  f2 = effectSize_fSquared,
  sig.level = alpha,
  u = numPredictors)

multipleRegressionSampleSizeModel

vNeeded <- multipleRegressionSampleSizeModel$v
sampleSizeNeeded <- vNeeded + numPredictors + 1
sampleSizeNeeded

pwrss::pwrss.t.reg(
  power = power,
  beta1 = effectSize_beta,
  k = numPredictors,
  alpha = alpha,
  alternative = "not equal")
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.f2.test(
  power = power,
  sig.level = alpha,
  u = numPredictors,
  v = N - numPredictors - 1)
```

#### Chi-Square Goodness-of-Fit Test {#sec-statisticalPowerChiSquareGOF}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.chisq.test(
  N = N,
  w = effectSize_omega,
  df = numCategories - 1,
  sig.level = alpha)
```

Solving for sample size needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.chisq.test(
  power = power,
  w = effectSize_omega,
  df = numCategories - 1,
  sig.level = alpha)
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.chisq.test(
  power = power,
  N = N,
  df = numCategories - 1,
  sig.level = alpha)
```

#### Chi-Square Test for Association {#sec-statisticalPowerChiSquareAssociation}

Solving for statistical power achieved (given [effect size](#sec-practicalSignificance), sample size, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.chisq.test(
  N = N,
  w = effectSize_omega,
  df = (numRows - 1)*(numColumns - 1),
  sig.level = alpha)
```

Solving for sample size needed (given [effect size](#sec-practicalSignificance), power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.chisq.test(
  power = power,
  w = effectSize_omega,
  df = (numRows - 1)*(numColumns - 1),
  sig.level = alpha)
```

Solving for the minimum detectable [effect size](#sec-practicalSignificance) (given sample size, power, and [alpha level](#sec-statisticalSignificance)):

```{r}
pwr::pwr.chisq.test(
  power = power,
  N = N,
  df = (numRows - 1)*(numColumns - 1),
  sig.level = alpha)
```

#### Multilevel Modeling {#sec-statisticalPowerMultilevelModeling}

Power analysis for multilevel modeling approaches is more complicated than it is for other statistical analyses, such as [correlation](#sec-correlation), [multiple regression](#multipleRegression), [*t*-tests](#sec-tTest), [ANOVA](@sec-anova), etc.

There are free web applications for calculating power in multilevel modeling:

- <https://aguinis.shinyapps.io/ml_power/>
- <https://koumurayama.shinyapps.io/tmethod_mlm/>
- <https://webpower.psychstat.org/wiki/models/index>

#### Path Analysis, Factor Analysis, and Structural Equation Modeling {#sec-statisticalPowerSEM}

Power analysis for latent variable modeling approaches like structural equation modeling (SEM) is more complicated than it is for other statistical analyses, such as [correlation](#sec-correlation), [multiple regression](#multipleRegression), [*t*-tests](#sec-tTest), [ANOVA](@sec-anova), etc.

I provide an example of power analysis in SEM using Monte Carlo simulation in R here: <https://isaactpetersen.github.io/Principles-Psychological-Assessment/sem.html#monteCarloPowerAnalysis> [@PetersenPrinciplesPsychAssessment].

There are also free web applications for calculating power in SEM:

- <https://sjak.shinyapps.io/power4SEM/>
- <https://sempower.shinyapps.io/sempower/>
- <https://yilinandrewang.shinyapps.io/pwrSEM/>
- <https://webpower.psychstat.org/wiki/models/index>

#### Mediation and Moderation

There are free tools for calculating power for tests of [mediation](#sec-mediation) and [moderation](#sec-moderation):

- <https://schoemanna.shinyapps.io/mc_power_med/>
- <https://www.causalevaluation.org/power-analysis.html> (web application: <https://powerupr.shinyapps.io/index/>)
- <https://webpower.psychstat.org/wiki/models/index>

## Conclusion {#sec-basicStatsConclusion}

[Descriptive statistics](#sec-descriptiveStatistics) are used to describe the data, including the center, spread, or shape of data.
[Inferential statistics](#sec-inferentialStatistics) are used to draw inferences regarding differences between groups or associations between variables.
Null hypothesis signficance testing is a framework for [inferential statistics](#sec-inferentialStatistics), in which there is a [null hypothesis](#sec-nullHypothesis) and [alternative hypothesis](##sec-alternativeHypothesis).
The null hypothesis is that there is no difference between groups or that there is no association between variables.
[Statistical significance](#sec-statisticalSignificance) is evaluated with a $p$-value, which represents the probability of obtaining a result at least as extreme as the result observed if the null hypothesis is true.
Effects with *p*-values less than .05 are considered statistically significant.
However, it is also important to consider [practical significance](#sec-practicalSignificance) and effect sizes.
When designing a study, it is important to consider [statistical power](#sec-statisticalPower) and the [sample size](#sec-sampleVsPopulation) needed to detect the hypothesized effect size.
You can determine a study's [power](#sec-statisticalPower) based on the [effect size](#sec-practicalSignificance), [sample size](#sec-sampleVsPopulation), and [alpha level](#sec-statisticalSignificance).

::: {.content-visible when-format="html"}

## Session Info {#sec-basicStatsSessionInfo}

```{r}
sessionInfo()
```

:::