-
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathbasic-statistics.qmd
2070 lines (1582 loc) · 79 KB
/
basic-statistics.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Basic Statistics {#sec-basicStats}
## Getting Started {#sec-basicStatsGettingStarted}
### Load Packages {#sec-basicStatsLoadPackages}
```{r}
library("petersenlab")
library("DescTools")
library("pwr")
library("pwrss")
library("WebPower")
library("grid")
library("tidyverse")
```
### Load Data {#sec-basicStatsLoadData}
```{r}
#| eval: false
#| include: false
load(file = file.path(path, "/OneDrive - University of Iowa/Teaching/Courses/Fantasy Football/Data/player_stats_seasonal.RData", fsep = ""))
```
```{r}
load(file = "./data/player_stats_seasonal.RData")
```
We created the `player_stats_seasonal.RData` object in @sec-calculatePlayerAge.
## Descriptive Statistics {#sec-descriptiveStatistics}
Descriptive statistics are used to describe data.
For instance, they may be used to describe the center, spread, or shape of the data.
There are various indices of each.
### Center {#sec-descriptiveStatisticsCenter}
Indices to describe the *center* (central tendency) of a variable's data include:
- mean (aka "average")
- median
- Hodges-Lehmann statistic (aka pseudomedian)
- mode
- weighted mean
- weighted median
The mean of $X$ (written as: $\bar{X}$) is calculated as in @eq-IQR:
$$
\bar{X} = \frac{\sum X_i}{n} = \frac{X_1 + X_2 + ... + X_n}{n}
$$ {#eq-IQR}
```{r}
#| code-fold: true
exampleValues <- c(0, 0, 10, 15, 20, 30, 1000)
exampleValues_mean <- apa(mean(exampleValues), 2)
```
That is, to compute the mean, sum all of the values and divide by the number of values ($n$).
One issue with the mean is that it is sensitive to extreme (outlying) values.
For instance, the mean of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_mean`.
```{r}
#| code-fold: true
exampleValues_median <- median(exampleValues)
```
The median is determined as the value at the 50th percentile (i.e., the value that is higher than 50% of the values and is lower than the other 50% of values).
Compared to the mean, the median is less influenced by outliers.
The median of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_median`.
```{r}
#| code-fold: true
exampleValues_pseudomedian <- DescTools::HodgesLehmann(exampleValues)
```
The Hodges-Lehmann statistic (aka pseudomedian) is computed as the median of all pairwise means, and it is also robust to outliers.
The pseudomedian of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_pseudomedian`.
```{r}
#| code-fold: true
exampleValues_mode <- petersenlab::Mode(exampleValues)
```
The mode is the most common/frequent value.
The mode of the values of 0, 0, 10, 15, 20, 30, and 1000 is `{r} exampleValues_mode`.
The [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] contains the `Mode()` function for computing the mode of a set of data.
If you want to give some values more weight to others, you can calculate a weighted mean and a weighted median (or other quantile), while assigning a weight to each value.
The [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] contains various functions for computing the weighted median (i.e., a weighted quantile at the 0.5 quantile, which is equivalent to the 50th percentile) based on @Akinshin2023.
Because some projections are outliers, we use a trimmed version of the weighted Harrell-Davis quantile estimator for greater robustness.
Below is R code to estimate each:
```{r}
mean(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
median(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
DescTools::HodgesLehmann(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
petersenlab::Mode(player_stats_seasonal$fantasyPoints)
weighted.mean(
player_stats_seasonal$fantasyPoints,
weights = sample( # randomly generate weights (could specify them manually)
x = 1:3,
size = length(player_stats_seasonal$fantasyPoints),
replace = TRUE),
na.rm = TRUE)
petersenlab::wthdquantile(
player_stats_seasonal$fantasyPoints,
w = sample( # randomly generate weights (could specify them manually)
x = 1:3,
size = length(player_stats_seasonal$fantasyPoints),
replace = TRUE),
probs = 0.5)
```
### Spread {#sec-descriptiveStatisticsSpread}
Indices to describe the *spread* (variability) of a variable's data include:
- standard deviation
- variance
- range
- minimum and maximum
- interquartile range (IQR)
- median absolute deviation
The (sample) variance of $X$ (written as: $s^2$) is calculated as in @eq-variance:
$$
s^2 = \frac{\sum (X_i - \bar{X})^2}{n-1}
$$ {#eq-variance}
where $X_i$ is each data point, $\bar{X}$ is the mean of $X$, and $n$ is the number of data points.
The (sample) standard deviation of $X$ (written as: $s$) is calculated as in @eq-sd:
$$
s = \sqrt{\frac{\sum (X_i - \bar{X})^2}{n-1}}
$$ {#eq-sd}
The range is calculated of $X$ is calculated as in @eq-range:
$$
\text{range} = \text{maximum} - \text{minimum}
$$ {#eq-range}
The interquartile range (IQR) is calculated as in @eq-IQR:
$$
\text{IQR} = Q_3 - Q_1
$$ {#eq-IQR}
where $Q_3$ is the score at the third quartile (i.e., 75th percentile), and $Q_1$ is the score at the first quartile (i.e., 25th percentile).
The median absolute deviation (MAD) is the median of all deviations from the median, and is calculated as in @eq-medianAbsoluteDeviation:
$$
\text{MAD} = \text{median}(|X_i - \tilde{X}|)
$$ {#eq-medianAbsoluteDeviation}
where $\tilde{X}$ is the median of `X`.
Compared to the standard deviation, the median absolute deviation is more robust to outliers.
Below is R code to estimate each:
```{r}
```
### Shape {#sec-descriptiveStatisticsShape}
Indices to describe the *shape* of a variable's data include:
- skewness
- kurtosis
Positive skewness (right-skewed) reflects a longer or heavier right-tailed distribution, whereas negative skewness (left-skewed) reflects a longer or heavier left-tailed distribution.
Fantasy points tend to be positively skewed.
The kurtosis reflects the extent of extreme (outlying) values in a distribution relative to a normal distribution (or bell curve).
A mesokurtic distribution (with a kurtosis value near zero) reflects a normal amount of tailedness.
Positive kurtosis values reflect a leptokurtic distribution, where there are lighter tails and a sharper peak than a normal distribution.
Negative kurtosis values reflect a platykurtic distribution, where there are heavier tails and a flatter peak than a normal distribution.
Fantasy points tend to have a leptokurtic distribution.
Below is R code to estimate each:
```{r}
```
### Combination {#sec-descriptiveStatisticsCombination}
To estimate multiple indices of center, spread, and shape of the data, you can use the following code:
```{r}
psych::describe(player_stats_seasonal["fantasyPoints"])
player_stats_seasonal %>%
select(age, years_of_experience, fantasyPoints) %>%
summarise(across(
everything(),
.fns = list(
n = ~ length(na.omit(.)),
missingness = ~ mean(is.na(.)) * 100,
M = ~ mean(., na.rm = TRUE),
SD = ~ sd(., na.rm = TRUE),
min = ~ min(., na.rm = TRUE),
max = ~ max(., na.rm = TRUE),
range = ~ max(., na.rm = TRUE) - min(., na.rm = TRUE),
IQR = ~ IQR(., na.rm = TRUE),
MAD = ~ mad(., na.rm = TRUE),
median = ~ median(., na.rm = TRUE),
pseudomedian = ~ DescTools::HodgesLehmann(., na.rm = TRUE),
mode = ~ petersenlab::Mode(., multipleModes = "mean"),
skewness = ~ psych::skew(., na.rm = TRUE),
kurtosis = ~ psych::kurtosi(., na.rm = TRUE)),
.names = "{.col}.{.fn}")) %>%
pivot_longer(
cols = everything(),
names_to = c("variable","index"),
names_sep = "\\.") %>%
pivot_wider(
names_from = index,
values_from = value)
```
## Scores and Scales {#sec-scoresAndScales}
There are many different types of scores and scales.
This book focuses on [raw scores](#sec-rawScores) and [*z*-scores](#sec-zScores).
For information on other scores and scales, including percentile ranks, *T*-scores, standard scores, scaled scores, and stanine scores, see here: <https://isaactpetersen.github.io/Principles-Psychological-Assessment/scoresScales.html#scoreTransformation> [@PetersenPrinciplesPsychAssessment].
### Raw Scores {#sec-rawScores}
*Raw scores* are the original data on the original metric.
Thus, raw scores are considered *unstandardized*.
For example, raw scores that represent the players' age may range from 20 to 40.
Raw scores depend on the construct and unit; thus raw scores may not be comparable across variables.
### *z* Scores {#sec-zScores}
*z* scores have a mean of zero and a standard deviation of one.
*z* scores are frequently used to render scores across variables more comparable.
Thus, *z* scores are considered a form of a *standardized* score.
*z* scores are calculated using @eq-zScore:
$$
z = \frac{X - \bar{X}}{\sigma}
$$ {#eq-zScore}
where $X$ is the observed score, $\bar{X}$ is the mean observed score, and $\sigma$ is the standard deviation of the observed scores.
You can easily convert a variable to a *z* score using the `scale()` function:
```{r}
#| eval: false
scale(variable)
```
With a standard normal curve, 68% of scores fall within one standard deviation of the mean.
95% of scores fall within two standard deviations of the mean.
99.7% of scores fall within three standard deviations of the mean.
The area under a normal curve within one standard deviation of the mean is calculated below using the `pnorm()` function, which calculates the cumulative density function for a normal curve.
```{r}
stdDeviations <- 1
pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```
The area under a normal curve within one standard deviation of the mean is depicted in @fig-zScoreDensity1SD.
```{r}
#| fig.cap: "Density of Standard Normal Distribution. The blue region represents the area within one standard deviation of the mean."
#| fig.scap: "Density of Standard Normal Distribution: One Standard Deviation of the Mean."
#| label: fig-zScoreDensity1SD
#| code-fold: true
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
xlab = "z Score",
ylab = "Normal Density")
x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
c(0, y, 0),
col = "blue")
```
The area under a normal curve within two standard deviations of the mean is calculated below:
```{r}
stdDeviations <- 2
pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```
The area under a normal curve within two standard deviations of the mean is depicted in @fig-zScoreDensity2SD.
```{r}
#| fig.cap: "Density of Standard Normal Distribution. The blue region represents the area within two standard deviations of the mean."
#| fig.scap: "Density of Standard Normal Distribution: Two Standard Deviations of the Mean."
#| label: fig-zScoreDensity2SD
#| code-fold: true
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
xlab = "z Score",
ylab = "Normal Density")
x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
c(0, y, 0),
col = "blue")
```
The area under a normal curve within three standard deviations of the mean is calculated below:
```{r}
stdDeviations <- 3
pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```
The area under a normal curve within three standard deviations of the mean is depicted in @fig-zScoreDensity3SD.
```{r}
#| fig.cap: "Density of Standard Normal Distribution. The blue region represents the area within three standard deviations of the mean."
#| fig.scap: "Density of Standard Normal Distribution: Three Standard Deviations of the Mean."
#| label: fig-zScoreDensity3SD
#| code-fold: true
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
xlab = "z Score",
ylab = "Normal Density")
x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
c(0, y, 0),
col = "blue")
```
If you want to determine the *z* score associated with a particular percentile in a normal distribution, you can use the `qnorm()` function.
For instance, the *z* score associated with the 37th percentile is:
```{r}
qnorm(.37)
```
## Inferential Statistics {#sec-inferentialStatistics}
Inferential statistics are used to draw inferences regarding whether there is (a) a difference in level on variable across groups or (b) an association between variables.
For instance, inferential statistics may be used to evaluate whether Quarterbacks tend to have longer careers compared to Running Backs.
Or, they could be used to evaluate whether number of carries is associated with injury likelihood.
To apply inferential statistics, we make use of the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$).
### Null Hypothesis Significance Testing {#sec-nhst}
To draw statistical inferences, the frequentist statistics paradigm leverages null hypothesis significance testing.
Frequentist statistics is the most widely used statistical paradigm.
However, frequentist statistics is not the only statistical paradigm.
Other statistical paradigms exist, including [Bayesian statistics](#sec-bayesTheorem), which is based on [Bayes' theorem](#sec-bayesTheorem).
This chapter focuses on the frequentist approach to hypothesis testing, known as null hypothesis significance testing.
We discuss Bayesian statistics in @sec-baseRates.
#### Null Hypothesis ($H_0$) {#sec-nullHypothesis}
When testing whether there are differences in level across groups on a variable of interest, the null hypothesis ($H_0$) is that there is <u>no difference</u> in level across groups.
For instance, when testing whether Quarterbacks tend to have longer careers compared to Running Backs, the null hypothesis ($H_0$) is that Quarterbacks do not systematically differ from Running Backs in the length of their career.
When testing whether there is an association between variables, the null hypothesis ($H_0$) is that there is <u>no association</u> between the variables.
For instance, when testing whether number of carries is associated with injury likelihood, the null hypothesis ($H_0$) is that there is no association between number of carries and injury likelihood.
#### Alternative Hypothesis ($H_1$) {#sec-alternativeHypothesis}
The alternative hypothesis ($H_1$) is the researcher's hypothesis that they want to evaluate.
An alternative hypothesis ($H_1$) might be directional (i.e., one-sided) or non-directional (i.e., two-sided).
Directional hypotheses specify a particular direction, such as which group will have larger scores or which direction (positive or negative) two variables will be associated.
Examples of directional hypotheses include:
- Quarterbacks have <u>longer</u> careers compared to Running Backs
- Number of carries is <u>positively</u> associated with injury likelihood
Non-directional hypotheses do not specify a particular direction.
For instance, non-directional hypotheses may state that two groups differ but do not specify which group will have larger scores.
Or, non-directional hypotheses may state that two variables are associated but do not state what the sign is of the association—i.e., positive or negative.
Examples of non-directional hypotheses include:
- Quarterbacks <u>differ</u> in the length of their careers compared to Running Backs
- Number of carries is <u>associated</u> with injury likelihood
#### Statistical Significance {#sec-statisticalSignificance}
In science, statistical significance is evaluated with the *p*-value.
The *p*-value does not represent the probability that you observed the result by chance.
The *p*-value represents a conditional probability—it examines the probability of one event given another event.
In particular, the *p*-value evaluates the likelihood that you would detect a result as at least as extreme as the one observed (in terms of the magnitude of the difference or of the association) given that the null hypothesis ($H_0$) is true.
This can be expressed in conditional probability notation, $P(A | B)$, which is the probability (likelihood) of event A occurring given that event B occurred (or given condition B).
The conditional probability notation for a left-tailed directional test (i.e., Quarterbacks have <u>shorter</u> careers than Running Backs; or number of carries is <u>negatively</u> associated with injury likelihood) is in @eq-pvalueLeftTailed.
$$
p\text{-value} = P(T \le t | H_0)
$$ {#eq-pvalueLeftTailed}
where $T$ is the test statistic of interest (e.g., the distribution of $t$-, $r-$, or $F$ values, depending on the test) and $t$ is the observed test statistic (e.g., $t$-, $r-$, or $F$-coefficient, depending on the test).
The conditional probability notation for a right-tailed directional test (i.e., Quarterbacks have <u>longer</u> careers than Running Backs; or number of carries is <u>positively</u> associated with injury likelihood) is in @eq-pvalueRightTailed.
$$
p\text{-value} = P(T \ge t | H_0)
$$ {#eq-pvalueRightTailed}
The conditional probability notation for a two-tailed non-directional test (i.e., Quarterbacks <u>differ</u> in the length of their careers compared to Running Backs; or number of carries is <u>associated</u> with injury likelihood) is in @eq-pvalueTwoTailed.
$$
p\text{-value} = 2 \times \text{min}(P(T \le t | H_0), P(T \ge t | H_0))
$$ {#eq-pvalueTwoTailed}
where `min(a, b)` is the smaller number of `a` and `b`.
If the distribution of the test statistic is symmetric around zero, the *p*-value for the two-tailed non-directional test simplifies to @eq-pvalueTwoTailedSimple.
$$
p\text{-value} = 2 \times P(T \ge |t| | H_0)
$$ {#eq-pvalueTwoTailedSimple}
Nevertheless, to be conservative (i.e., to avoid false positive/Type I errors), many researchers use two-tailed *p*-values regardless whether their hypothesis is one- or two-tailed.
For a test of group differences, the *p*-value evaluates the likelihood that you would observe a difference as large or larger than the one you observed between the groups if there were no systematic difference between the groups in the population, as depicted in @fig-pValuesDifference.
For instance, when evaluating whether Quarterbacks have <u>longer</u> careers than Running Backs, and you observed a mean difference of 0.03 years, the *p*-value evaluates the likelihood that you would observe a difference as large or larger than 0.03 years between the groups if, in truth among all Quarterbacks and Running Backs in the NFL, Quarterbacks do not differ from Running Backs in terms of the length of their career.
```{r}
#| label: fig-pValuesDifference
#| layout-ncol: 2
#| fig-cap: "Interpretation of *p*-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means."
#| fig-alt: "Interpretation of *p*-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means."
#| fig-subcap:
#| - "What is the probability my data would look like this..."
#| - "...if in the population, the groups were really this?"
#| code-fold: true
set.seed(52242)
nObserved <- 1000
nPopulation <- 1000000
observedGroups <- data.frame(
score = c(rnorm(nObserved, mean = 47, sd = 3), rnorm(nObserved, mean = 52, sd = 3)),
group = as.factor(c(rep("Group 1", nObserved), rep("Group 2", nObserved)))
)
populationGroups <- data.frame(
score = c(rnorm(nPopulation, mean = 50, sd = 3.03), rnorm(nPopulation, mean = 50, sd = 3)),
group = as.factor(c(rep("Group 1", nPopulation), rep("Group 2", nPopulation)))
)
ggplot2::ggplot(
data = observedGroups,
mapping = aes(
x = score,
fill = group,
color = group
)
) +
geom_density(alpha = 0.5) +
scale_color_manual(values = c("red", "blue")) +
scale_fill_manual(values = c("red","blue")) +
geom_vline(xintercept = mean(observedGroups$score[which(observedGroups$group == "Group 1")])) +
geom_vline(xintercept = mean(observedGroups$score[which(observedGroups$group == "Group 2")])) +
ggplot2::labs(
x = "Score",
y = "Frequency",
title = "What is the probability my data would look like this..."
) +
ggplot2::theme_classic(
base_size = 16) +
ggplot2::theme(
legend.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
#plot.title.position = "plot"
legend.position = "inside",
legend.margin = margin(0, 0, 0, 0),
legend.justification.top = "left",
legend.justification.left = "top",
legend.justification.bottom = "right",
legend.justification.inside = c(1, 1),
legend.location = "plot")
ggplot2::ggplot(
data = populationGroups,
mapping = aes(
x = score,
fill = group,
color = group
)
) +
geom_density(alpha = 0.5) +
scale_color_manual(values = c("red", "blue")) +
scale_fill_manual(values = c("red","blue")) +
geom_vline(xintercept = mean(populationGroups$score[which(populationGroups$group == "Group 1")])) +
geom_vline(xintercept = mean(populationGroups$score[which(populationGroups$group == "Group 2")])) +
ggplot2::labs(
x = "Score",
y = "Frequency",
title = "...if in the population, the groups were really this:"
) +
ggplot2::theme_classic(
base_size = 16) +
ggplot2::theme(
legend.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
#plot.title.position = "plot",
legend.position = "inside",
legend.margin = margin(0, 0, 0, 0),
legend.justification.top = "left",
legend.justification.left = "top",
legend.justification.bottom = "right",
legend.justification.inside = c(1, 1),
legend.location = "plot")
```
For a test of whether two variables are associated, the *p*-value evaluates the likelihood that you would observe an association as strong or stronger than the one you observed if there were no actual association between the variables in the population, as depicted in @fig-pValuesAssociation.
For instance, when evaluating whether number of carries is <u>positively</u> associated with injury likelihood, and you observed a correlation coefficient of $r = .25$ between number of carries and injury likelihood, the *p*-value evaluates the likelihood that you would observe a correlation as strong or stronger than $r = .25$ between the variables if, in truth among all NFL Running Backs, number of carries is not associated with injury likelihood.
```{r}
#| label: fig-pValuesAssociation
#| layout-ncol: 2
#| fig-cap: "Interpretation of *p*-Values When Examining The Association Between Variables."
#| fig-alt: "Interpretation of *p*-Values When Examining The Association Between Variables."
#| fig-subcap:
#| - "What is the probability my data would look like this..."
#| - "...if in the population, the association was really this?"
#| code-fold: true
set.seed(52242)
observedCorrelation <- 0.9
correlations <- data.frame(criterion = rnorm(2000))
correlations$sample <- NA
correlations$sample[1:100] <- complement(correlations$criterion[1:100], observedCorrelation)
correlations$population <- complement(correlations$criterion, 0)
ggplot2::ggplot(
data = correlations,
mapping = aes(
x = sample,
y = criterion
)
) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_continuous(
limits = c(-3.5,3)
) +
annotate(
x = 0,
y = 4,
label = paste("italic(r) != ", 0, sep = ""),
parse = TRUE,
geom = "text",
size = 7) +
labs(
x = "Predictor Variable",
y = "Outcome Variable",
title = "What is the probability my data would look like this..."
) +
theme_classic(
base_size = 16) +
theme(
legend.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
ggplot2::ggplot(
data = correlations,
mapping = aes(
x = population,
y = criterion
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
scale_x_continuous(
limits = c(-2.5,2.5)
) +
annotate(
x = 0,
y = 4,
label = paste("italic(r) == '", "0.00", "'", sep = ""),
parse = TRUE,
geom = "text",
size = 7) +
labs(
x = "Predictor Variable",
y = "Outcome Variable",
title = "...if in the population, the association was really this:"
) +
theme_classic(
base_size = 16) +
theme(
legend.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
```
Using what is called null-hypothesis significance testing (NHST), we consider an effect to be *statistically significant* if the *p*-value is less than some threshold, called the *alpha level*.
In science, we typically want to be conservative because a false positive (i.e., Type I error) is considered more problematic than a false negative (i.e., Type II error).
That is, we would rather say an effect does not exist when it really does than to say an effect does exist when it really does not.
Thus, we typically set the alpha level to a low value, commonly .05.
Then, we would consider an effect to be *statistically significant* if the *p*-value is less than .05.
That is, there is a small chance (5%; or 1 in 20 times) that we would observe an effect at least as extreme as the effect observed, if the null hypothesis were true.
So, you might expect around 5% of tests where the null hypothesis is true to be statistically significant just by chance.
We could lower the rate of Type II (i.e., false negative) errors—i.e., we could detect more effects—if we set the alpha level to a higher value (e.g., .10); however, raising the alpha level would raise the possibility of Type I (false positive) errors.
If the *p*-value is less than .05, we reject the null hypothesis ($H_0$) that there was no difference or association.
Thus, we conclude that there was a statistically significant (non-zero) difference or association.
If the *p*-value is greater than .05, we fail to reject the null hypothesis; the difference/association was not statistically significant.
Thus, we do not have confidence that there was a difference or association.
However, we do not accept the null hypothesis; it could be there we did not observe an effect because we did not have adequate power to detect the effect—e.g., if the [effect size](#sec-practicalSignificance) was small, the data were noisy, and the [sample size](#sec-sampleVsPopulation) was small and/or unrepresentative.
There are four general possibilities of decision making outcomes when performing null-hypothesis significance testing:
1. We (correctly) reject the null hypothesis when it is in fact false ($1 - \beta$).
This is a true positive.
For instance, we may correctly determine that Quarterbacks have longer careers than Running Backs.
1. We (correctly) fail to reject the null hypothesis when it is in fact true ($1 - \alpha$).
This is a true negative.
For instance, we may correctly determine that Quarterbacks do not have longer careers than Running Backs.
1. We (incorrectly) reject the null hypothesis when it is in fact true ($\alpha$).
This is a false positive.
When performing null hypothesis testing, a false positive is known as a Type I error.
For instance, we may incorrectly determine that Quarterbacks have longer careers than Running Backs when, in fact, Quarterbacks and Running Backs do not differ in their career length.
1. We (incorrectly) fail to reject the null hypothesis when it is in fact false ($\beta$).
This is a false negative.
When performing null hypothesis testing, a false negative is known as a Type II error.
For instance, we may incorrectly determine that Quarterbacks and Running Backs do not differ in their career length when, in fact, Quarterbacks have longer careers than Running Backs.
A two-by-two confusion matrix for null-hypothesis significance testing is in @fig-nhstConfusionMatrix.
::: {#fig-nhstConfusionMatrix}
![](images/nhstConfusionMatrix.png){fig-alt="A Two-by-Two Confusion Matrix for Null-Hypothesis Significance Testing."}
A Two-by-Two Confusion Matrix for Null-Hypothesis Significance Testing.
:::
In statistics, *power* is the probability of detecting an effect, if, in fact, the effect exists.
Otherwise said, power is the probability of rejecting the null hypothesis, if, in fact, the null hypothesis is false.
Power is influenced by several variables:
- the [sample size](#sec-sampleVsPopulation) (*N*): the larger the *N*, the greater the power
- for group comparisons, the power depends on the [sample size](#sec-sampleVsPopulation) of each group
- the [effect size](#sec-practicalSignificance): the larger the effect, the greater the power
- for group comparisons, larger effect sizes reflect:
- larger between-group variance, and
- smaller within-group variance (i.e., strong measurement precision, i.e., [reliability](#sec-reliability))
- the alpha level: the researcher specifies the alpha level (though it is typically set at .05); the higher the alpha level, the greater the power; however, the higher we set the alpha level, the higher the likelihood of Type I errors (false positives)
- one- versus two-tailed tests: one-tailed tests have higher power than two-tailed tests
- [within-subject](#sec-withinSubject) versus [between-subject](#sec-betweenSubject) comparisons: [within-subject designs](#sec-withinSubject) tend to have greater power than [between-subject designs](#sec-betweenSubject)
A plot of statistical power is in @fig-nhst.
```{r}
#| label: fig-nhst
#| fig-cap: "Statistical Power (Adapted from Kristoffer Magnusson: <https://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics>; archived at <https://perma.cc/FG3J-85L6>). The dashed line represents the critical value or threshold."
#| fig-alt: "Statistical Power (Adapted from Kristoffer Magnusson: <https://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics>; archived at <https://perma.cc/FG3J-85L6>). The dashed line represents the critical value or threshold."
#| code-fold: true
m1 <- 0 # mu H0
sd1 <- 1.5 # sigma H0
m2 <- 3.5 # mu HA
sd2 <- 1.5 # sigma HA
z_crit <- qnorm(1-(0.05/2), m1, sd1)
# set length of tails
min1 <- m1-sd1*4
max1 <- m1+sd1*4
min2 <- m2-sd2*4
max2 <- m2+sd2*4
# create x sequence
x <- seq(min(min1,min2), max(max1, max2), .01)
# generate normal dist #1
y1 <- dnorm(x, m1, sd1)
# put in data frame
df1 <- data.frame("x" = x, "y" = y1)
# generate normal dist #2
y2 <- dnorm(x, m2, sd2)
# put in data frame
df2 <- data.frame("x" = x, "y" = y2)
# Alpha polygon
y.poly <- pmin(y1,y2)
poly1 <- data.frame(x=x, y=y.poly)
poly1 <- poly1[poly1$x >= z_crit, ]
poly1<-rbind(poly1, c(z_crit, 0)) # add lower-left corner
# Beta polygon
poly2 <- df2
poly2 <- poly2[poly2$x <= z_crit,]
poly2<-rbind(poly2, c(z_crit, 0)) # add lower-left corner
# power polygon; 1-beta
poly3 <- df2
poly3 <- poly3[poly3$x >= z_crit,]
poly3 <-rbind(poly3, c(z_crit, 0)) # add lower-left corner
# combine polygons.
poly1$id <- 3 # alpha, give it the highest number to make it the top layer
poly2$id <- 2 # beta
poly3$id <- 1 # power; 1 - beta
poly <- rbind(poly1, poly2, poly3)
poly$id <- factor(poly$id, labels=c("power","beta","alpha"))
# plot with ggplot2
ggplot(poly, aes(x,y, fill=id, group=id)) +
geom_polygon(show.legend=F, alpha=I(8/10)) +
# add line for treatment group
geom_line(data=df1, aes(x,y, color="H0", group=NULL, fill=NULL), linewidth=1.5, show_guide=F) +
# add line for treatment group. These lines could be combined into one dataframe.
geom_line(data=df2, aes(color="HA", group=NULL, fill=NULL),linewidth=1.5, show_guide=F) +
# add vlines for z_crit
geom_vline(xintercept = z_crit, linewidth=1, linetype="dashed") +
# change colors
scale_color_manual("Group",
values= c("HA" = "#981e0b","H0" = "black")) +
scale_fill_manual("test", values= c("alpha" = "#0d6374","beta" = "#be805e","power"="#7cecee")) +
# beta arrow
annotate("segment", x=0.1, y=0.045, xend=1.3, yend=0.01, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
annotate("text", label="beta", x=0, y=0.05, parse=T, size=8) +
# alpha arrow
annotate("segment", x=4, y=0.043, xend=3.4, yend=0.01, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
annotate("text", label="frac(alpha,2)", x=4.2, y=0.05, parse=T, size=8) +
# power arrow
annotate("segment", x=6, y=0.2, xend=4.5, yend=0.15, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
annotate("text", label=expression(paste(1-beta, " (\"power\")")), x=6.1, y=0.21, parse=T, size=8) +
# H_0 title
annotate("text", label="H[0]", x=m1, y=0.28, parse=T, size=8) +
# H_a title
annotate("text", label="H[1]", x=m2, y=0.28, parse=T, size=8) +
ggtitle("Statistical Power") +
# remove some elements
theme(
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.background = element_blank(),
plot.background = element_rect(fill="white"),
panel.border = element_blank(),
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size=22))
```
Interactive visualizations by Kristoffer Magnusson on *p*-values and null-hypothesis significance testing are below:
- <https://rpsychologist.com/pvalue/> (archived at <https://perma.cc/JP9F-9ZVY>)
- <https://rpsychologist.com/d3/pdist/> (archived at <https://perma.cc/BE96-8LSJ>)
- <https://rpsychologist.com/d3/nhst/> (archived at <https://perma.cc/ZU9A-37F3>)
Twelve misconceptions about *p*-values [@Goodman2008] are in @tbl-pValueMisconceptions.
| Number | Misconception |
|:-------|:-------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | If $p = .05$, the null hypothesis has only a 5% chance of being true. |
| 2 | A nonsignificant difference (eg, $p > .05$) means there is no difference between groups. |
| 3 | A statistically significant finding is clinically important. |
| 4 | Studies with $p$-values on opposite sides of .05 are conflicting. |
| 5 | Studies with the same $p$-value provide the same evidence against the null hypothesis. |
| 6 | $p = .05$ means that we have observed data that would occur only 5% of the time under the null hypothesis. |
| 7 | $p = .05$ and $p < .05$ mean the same thing. |
| 8 | $p$-values are properly written as inequalities (e.g., "$p \le .05$" when $p = .015$). |
| 9 | $p = .05$ means that if you reject the null hypothesis, the probability of a Type I error is only 5%. |
| 10 | With a $p = .05$ threshold for significance, the chance of a Type I error will be 5%. |
| 11 | You should use a one-sided $p$-value when you don't care about a result in one direction, or a difference in that direction is impossible. |
| 12 | A scientific conclusion or treatment policy should be based on whether or not the $p$-value is significant. |
: Twelve Misconceptions About *p*-Values from @Goodman2008. Goodman also provides a discussion about why each statement is false. {#tbl-pValueMisconceptions}
That is, the *p*-value is <u>not</u>:
- the probability that the effect was due to chance
- the probability that the null hypothesis is true
- the size of the effect
- the importance of the effect
- whether the effect is true, real, or causal
Statistical significance involves the *consistency* of an effect/association/difference; it suggests that the association/difference is reliably non-zero.
However, just because something is statistically significant does not mean that it is important.
For instance, consider that we discover that players who consume sports drink before a game tend to perform better than players who do not ($p < .05$).
However, what if consumption of sports drinks is associated with an average improvement of 0.002 points per game.
A small effect such as this might be detectable with a large [sample size](#sec-sampleVsPopulation).
This effect would be considered to be reliable/consistent because it is statistically significant.
However, such an effect is so small that it results in differences that are not [practically important](#sec-practicalSignificance).
Thus, in addition to statistical significance, it is also important to consider [practical significance](#sec-practicalSignificance).
### Practical Significance {#sec-practicalSignificance}
*Practical significance* deals with how large or important the effect/association/difference is.
It is based on the magnitude of the effect, called the *effect size*.
Effect size can be quantified in various ways including:
- Cohen's $d$
- Standardized regression coefficient (beta; $\beta$)
- Correlation coefficient ($r$)
- Cohen's $\omega$ (omega)
- Cohen's $f$
- Cohen's $f^2$
- Coefficient of determination ($R^2$)
- Eta squared ($\eta^2$)
- Partial eta squared ($\eta_p^2$)
#### Cohen's $d$ {#sec-cohensD}
Cohen's $d$ is calculated as in @eq-cohensD:
$$
\begin{aligned}
d &= \frac{\text{mean difference}}{\text{pooled standard deviation}} \\
&= \frac{\bar{X_1} - \bar{X_2}}{s} \\
\end{aligned}
$$ {#eq-cohensD}
where:
$$
s = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}
$$ {#eq-pooledStandardDeviation}
where $n_1$ and $n_2$ is the sample size of group 1 and group 2, respectively, and $s_1$ and $s_2$ is the standard deviation of group 1 and group 2, respectively.
#### Standardized Regression Coefficient (Beta; $\beta$) {#sec-beta}
The standardized regression coefficient (beta; $\beta$) is used in multiple regression, and is calculated as in @eq-beta:
$$
\beta_x = B_x \times \frac{s_x}{s_y}
$$ {#eq-beta}
where $B_x$ is the unstandardized regression coefficient of the [predictor variable](#sec-correlationalStudy) $x$ in predicting the [outcome variable](#sec-correlationalStudy) $y$, $s_x$ is the standard deviation of $x$, and $s_y$ is the standard deviation of $y$.
#### Correlation Coefficient ($r$)
The formula for the correlation coefficient is in @sec-correlation.
#### Cohen's $\omega$ {#sec-cohensOmega}
Cohen's $\omega$ is used in chi-square tests, and is calculated as in @eq-cohensOmega:
$$
\omega = \sqrt{\frac{\chi^2}{N} - \frac{df}{N}}
$$ {#eq-cohensOmega}
where $\chi^2$ is the chi-square statistic from the test, $N$ is the sample size, and $df$ is the degrees of freedom.
#### Cohen's $f$ {#sec-cohensF}
Cohen's $f$ is commonly used in ANOVA, and is calculated as in @eq-cohensF:
$$
\begin{aligned}
f &= \sqrt{\frac{R^2}{1 - R^2}} \\
&= \sqrt{\frac{\eta^2}{1 - \eta^2}}
\end{aligned}
$$ {#eq-cohensF}
#### Cohen's $f^2$ {#sec-cohensFsquared}
Cohen's $f^2$ is commonly used in regression, and is calculated as in @eq-cohensFsquared:
$$
\begin{aligned}
f^2 &= \frac{R^2}{1 - R^2} \\
&= \frac{\eta^2}{1 - \eta^2}
\end{aligned}
$$ {#eq-cohensFsquared}
To calculate the effect size of a particular predictor, you can calculate $\Delta f^2$ as in @eq-deltaCohensFsquared:
$$
\begin{aligned}
\Delta f^2 &= \frac{R^2_{\text{model}} - R^2_{\text{reduced}}}{1 - R^2_{\text{model}}} \\
&= \frac{\eta^2_{\text{model}} - \eta^2_{\text{reduced}}}{1 - \eta^2_{\text{model}}}
\end{aligned}
$$ {#eq-deltaCohensFsquared}
where $R^2_{\text{model}}$ is the $R^2$ of the model with the [predictor variable](#sec-correlationalStudy) of interest and $R^2_{\text{reduced}}$ is the $R^2$ of the model without the [predictor variable](#sec-correlationalStudy) of interest.
#### Coefficient of Determination ($R^2$) {#sec-rSquared}
The coefficient of determination ($R^2$) reflects the proportion of variance in the [outcome variable](#sec-correlationalStudy) that is explained by the [predictor variable(s)](#sec-correlationalStudy).
$R^2$ is commonly used in regression, and is calculated as in @eq-rSquared:
$$
\begin{aligned}
R^2 &= 1 - \frac{\sum (Y_i - \hat{Y}_i)^2}{\sum (Y_i - \bar{Y})^2} \\
&= 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \\
&= 1 - \frac{\text{sum of squared residuals}}{\text{total sum of squares}} \\
&= \frac{f^2}{1 + f^2} \\
&= \eta^2 \\
&= \frac{\text{variance explained in }Y}{\text{total variance in }Y}
\end{aligned}
$$ {#eq-rSquared}
where $Y_i$ is the observed value of the [outcome variable](#sec-correlationalStudy) for the $i$th observation, $\hat{Y}_i$ is the model predicted value for the $i$th observation, $\bar{Y}$ is the mean of the observed values of the [outcome variable](#sec-correlationalStudy).
The total sum of squares is an index of the total variation in the [outcome variable](#sec-correlationalStudy).
#### Eta Squared ($\eta^2$) and Partial Eta Squared ($\eta_p^2$) {#sec-etaSquared}
Like $R^2$, eta squared ($\eta^2$) reflects the proportion of variance in the [dependent variable](#sec-experiment) that is explained by the [independent variable(s)](#sec-experiment).
$\eta^2$ is commonly used in ANOVA, and is calculated as in @eq-etaSquared:
$$
\begin{aligned}
\eta^2 &= \frac{SS_{\text{effect}}}{SS_{\text{total}}} \\
&= 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \\
&= 1 - \frac{\text{sum of squared residuals}}{\text{total sum of squares}} \\
&= \frac{f^2}{1 + f^2} \\
&= R^2
\end{aligned}
$$ {#eq-etaSquared}
where $SS_{\text{effect}}$ is the sum of squares for the effect of interest and $SS_{\text{total}}$ is the total sum of squares.
Partial eta squared ($\eta_p^2$) reflects the proportion of variance in the [dependent variable](#sec-experiment) that is explained by the [independent variable](#sec-experiment) while controlling for the other [independent variables](#sec-experiment).
$\eta_p^2$ is commonly used in ANOVA, and is calculated as in @eq-partialEtaSquared:
$$
\eta_p^2 = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}}
$$ {#eq-partialEtaSquared}
where $SS_{\text{effect}}$ is the sum of squares for the effect of interest and $SS_{\text{error}}$ is the sum of squares for the residual error term.
#### Effect Size Thresholds {#sec-effectSizeThresholds}
Effect size thresholds [@Cohen1988; @McGrath2006] for small, medium, and large effect sizes are in @tbl-effectSizeThresholds.
| Effect Size Index | Small | Medium | Large |
|:----------------------------------------------------|:------------|:------------|:------------|
| Cohen's $d$ | $\ge |.20|$ | $\ge |.50|$ | $\ge |.80|$ |
| Standardized regression coefficient (beta; $\beta$) | $\ge |.10|$ | $\ge |.24|$ | $\ge |.37|$ |
| Correlation coefficient ($r$) | $\ge |.10|$ | $\ge |.24|$ | $\ge |.37|$ |
| Cohen's $\omega$ | $\ge .10$ | $\ge .30$ | $\ge .50$ |
| Cohen's $f$ | $\ge .10$ | $\ge .25$ | $\ge .40$ |
| Cohen's $f^2$ | $\ge .01$ | $\ge .06$ | $\ge .16$ |
| Coefficient of determination ($R^2$) | $\ge .01$ | $\ge .06$ | $\ge .14$ |
| Eta squared ($\eta^2$) | $\ge .01$ | $\ge .06$ | $\ge .14$ |
| Partial eta squared ($\eta_p^2$) | $\ge .01$ | $\ge .06$ | $\ge .14$ |
: Effect Size Thresholds for Small, Medium, and Large Effect Sizes. {#tbl-effectSizeThresholds}
## Statistical Decision Tree {#sec-statisticalDecisionTree}
A statistical decision tree is a flowchart or decision tree that depicts which statistical test to use given the purpose of analysis, the type of data, etc.
An example statistical decision tree is depicted in @fig-statisticalDecisionTree.
::: {#fig-statisticalDecisionTree}
![](images/statisticalDecisionTree.png){fig-alt="A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including the addition of several statistical tests."}
A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including the addition of several statistical tests. *Note*: "Interval" as a level of measurement includes data with an "[interval](#sec-interval)" or higher level of measurement; thus, it also includes data with a "[ratio](#sec-ratio)" level of measurement.
:::
This statistical decision tree can be generally summarized such that associations are examined with the correlation/regression family, and differences are examined with the *t*-test/ANOVA family, as depicted in @fig-statisticalDecisionTreeSummary.