-
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathbase-rates.qmd
812 lines (623 loc) · 47.9 KB
/
base-rates.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
# Base Rates {#sec-baseRates}
## Getting Started {#sec-baseRatesGettingStarted}
### Load Packages {#sec-baseRatesLoadPackages}
```{r}
library("petersenlab")
```
## Overview {#sec-baseRatesOverview}
Predicting player performance is a complex prediction task.
Performance is probabilistically influenced by many processes, including processes internal to the player in addition to external processes.
Moreover, people's performance occurs in the context of a dynamic system with nonlinear, probabilistic, and cascading influences that change across time.
The ever-changing system makes behavior challenging to predict.
And, similar to chaos theory, one small change in the system can lead to large differences later on.
Moreover, there are important factors to keep in mind when making predictions.
Let's consider a prediction example, assuming the following probabilities:
- The probability of contracting HIV is .3%
- The probability of a positive test for HIV is 1%
- The probability of a positive test if you have HIV is 95%
What is the probability of HIV if you have a positive test?
As we will see, the probability is: $\frac{95\% \times .3\%}{1\%} = 28.5\%$.
So based on the above probabilities, if you have a positive test, the probability that you have HIV is 28.5%.
Most people tend to vastly overestimate the likelihood that the person has HIV in this example.
Why?
Because they do not pay enough attention to the base rate (in this example, the base rate of HIV is .3%).
In general, people tend to overestimate the likelihood of low base-rate events.
That is, if the base rate of an event or condition—such as schizophrenia—is low (e.g., ~0.5%), people overestimate the likelihood that a person has schizophrenia when given specific information about the person such as their symptoms and history.
## Issues Around Probability {#sec-probability}
### Types of Probabilities {#sec-probabilityTypes}
It is important to distinguish between different types of probabilities: marginal probabilities, joint probabilities, and conditional probabilities.
#### Base Rate (Marginal Probability) {#sec-baseRate}
The *base rate* is a marginal probability, which is the general probability of an event irrespective of other things.
For instance, the base rate of HIV is the probability of developing HIV.
In the U.S., [the prevalence rate of HIV is ~0.4% of the adult population](https://map.aidsvu.org/profiles/nation/usa/overview) (archived at <https://perma.cc/8GE6-GAPC>).
For instance, we can consider the following marginal probabilities:
$P(C_i)$ is the probability (i.e., base rate) of a classification, $C$, independent of other things.
A base rate is often used as the "*prior probability*" in a Bayesian model.
In our example above, $P(C_i)$ is the base rate (i.e., prevalence) of HIV in the population: $P(\text{HIV}) = .3\%$.
$P(R_i)$ is the probability (base rate) of a response, $R$, independent of other things.
In the example above, $P(R_i)$ is the base rate of a positive test for HIV: $P(\text{positive test}) = 1\%$.
The base rate of a positive test is known as the *positivity rate* or *selection ratio*.
#### Joint Probability {#sec-jointProbability}
A *joint probability* is the probability of two (or more) events occurring simultaneously.
For instance, the probability of events $A$ and $B$ both occurring together is $P(A, B)$.
A joint probability can be calculated using the [marginal probability](#sec-baseRate) of each event, as in @eq-jointProbability:
$$
P(A, B) = P(A) \cdot P(B)
$$ {#eq-jointProbability}
Conversely (and rearranging the terms for the calculation of [conditional probability](#sec-conditionalProbability)), a [joint probability](#sec-jointProbability) can also be calculated using the [conditional probability](#sec-conditionalProbability) and [marginal probability](#sec-baseRate), as in @eq-jointProbability2:
$$
P(A, B) = P(A | B) \cdot P(B)
$$ {#eq-jointProbability2}
#### Conditional Probability {#sec-conditionalProbability}
A *conditional probability* is the probability of one event occurring given the occurrence of another event.
Conditional probabilities are written as: $P(A | B)$.
This is read as the probability that event $A$ occurs given that event $B$ occurred.
For instance, we can consider the following conditional probabilities:
$P(C | R)$ is the probability of a classification, $C$, given a response, $R$.
In other words, $P(C | R)$ is the probability of having HIV given a positive test: $P(\text{HIV} | \text{positive test})$.
$P(R | C)$ is the probability of a response, $R$, given a classification, $C$.
In the example above, $P(R | C)$ is the probability of having a positive test given that a person has HIV: $P(\text{positive test} | \text{HIV}) = 95\%$.
A conditional probability can be calculated using the [joint probability](#sec-jointProbability) and [marginal probability](#sec-baseRate) (base rate), as in @eq-conditionalProbability:
$$
P(A, B) = P(A | B) \cdot P(B)
$$ {#eq-conditionalProbability}
### Confusion of the Inverse {#sec-inverseFallacy}
A [conditional probability](#sec-conditionalProbability) is not the same thing as its reverse (or inverse) [conditional probability](#sec-conditionalProbability).
Unless the [base rate](#sec-baseRate) of the two events ($C$ and $R$) are the same, $P(C | R) \neq P(R | C)$.
However, people frequently make the mistake of thinking that two inverse [conditional probabilities](#sec-conditionalProbability) are the same.
This mistake is known as the "confusion of the inverse", or the "inverse fallacy", or the "conditional probability fallacy".
The confusion of inverse probabilities is the logical error of representative thinking that leads people to assume that the probability of $C$ given $R$ is the same as the probability of $R$ given C, even though this is not true.
As a few examples to demonstrate the logical fallacy, if 93% of breast cancers occur in high-risk women, this does not mean that 93% of high-risk women will eventually get breast cancer.
As another example, if 77% of car accidents take place within 15 miles of a driver's home, this does not mean that you will get in an accident 77% of times you drive within 15 miles of your home.
Which car is the most frequently stolen?
It is often the Honda Accord or Honda Civic—probably because they are among the most popular/commonly available cars.
The probability that the car is a Honda Accord given that a car was stolen ($p(\text{Honda Accord } | \text{ Stolen})$) is what the media reports and what the police care about.
However, that is not what buyers and car insurance companies should care about.
Instead, they care about the probability that the car will be stolen given that it is a Honda Accord ($p(\text{Stolen } | \text{ Honda Accord})$).
Applied to fantasy football, the probability that a given player will be injured given that he is a Running Back ($p(\text{Injured } | \text{ RB})$) is not the same as the probability that a given player is a Running Back given that he is injured ($p(\text{RB } | \text{ Injured})$).
### Bayes' Theorem {#sec-bayesTheorem}
#### Standard Formulation {#sec-bayesTheoremStandard}
An alternative way of calculating a [conditional probability](#sec-conditionalProbability) is using the inverse [conditional probability](#sec-conditionalProbability) (instead of the [joint probability](#sec-jointProbability)).
This is known as Bayes' theorem.
Bayes' theorem can help us calculate a [conditional probability](#sec-conditionalProbability) of some classification, $C$, given some response, $R$, if we know the inverse [conditional probability](#sec-conditionalProbability) and the [base rate](#sec-baseRate) (marginal probability) of each.
Bayes' theorem is in @eq-bayes1:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)}
\end{aligned}
$$ {#eq-bayes1}
Or, equivalently (rearranging the terms):
$$
\begin{aligned}
\frac{P(C | R)}{P(R | C)} = \frac{P(C_i)}{P(R_i)}
\end{aligned}
$$ {#eq-bayes2}
Or, equivalently (rearranging the terms):
$$
\begin{aligned}
\frac{P(C | R)}{P(C_i)} = \frac{P(R | C)}{P(R_i)}
\end{aligned}
$$ {#eq-bayes3}
More generally, Bayes' theorem has been described as:
$$
\begin{aligned}
P(H | E) &= \frac{P(E | H) \cdot P(H)}{P(E)} \\
\text{posterior probability} &= \frac{\text{likelihood} \times \text{prior probability}}{\text{model evidence}}
\end{aligned}
$$ {#eq-bayes6}
where $H$ is the hypothesis, and $E$ is the evidence—the new information that was not used in computing the prior probability.
In Bayesian terms, the *posterior probability* is the conditional probability of one event occurring given another event—it is the updated probability after the evidence is considered.
In this case, the posterior probability is the probability of the classification occurring ($C$) given the response ($R$).
The *likelihood* is the inverse conditional probability—the probability of the response ($R$) occurring given the classification ($C$).
The *prior probability* is the marginal probability of the event (i.e., the classification) occurring, before we take into account any new information.
The *model evidence* is the marginal probability of the other event occurring—i.e., the marginal probability of seeing the evidence.
Bayes' theorem provides the foundation for a paradigm of statistics called Bayesian statistics, which (unlike frequentist statistics) does not use *p*-values.
In the HIV example above, we can calculate the [conditional probability](#sec-conditionalProbability) of HIV given a positive test using three terms: the [conditional probability](#sec-conditionalProbability) of a positive test given HIV (i.e., the sensitivity of the test), the [base rate](#sec-baseRate) of HIV, and the [base rate](#sec-baseRate) of a positive test for HIV.
The [conditional probability](#sec-conditionalProbability) of HIV given a positive test is in @eq-hivExample1:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)} \\
P(\text{HIV} | \text{positive test}) &= \frac{P(\text{positive test} | \text{HIV}) \cdot P(\text{HIV})}{P(\text{positive test})} \\
&= \frac{\text{sensitivity of test} \times \text{base rate of HIV}}{\text{base rate of positive test}} \\
&= \frac{95\% \times .3\%}{1\%} = \frac{.95 \times .003}{.01}\\
&= 28.5\%
\end{aligned}
$$ {#eq-hivExample1}
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pAgivenB()` function that estimates the probability of one event, $A$, given another event, $B$.
```{r}
petersenlab::pAgivenB(
pBgivenA = .95,
pA = .003,
pB = .01)
```
Thus, assuming the probabilities in the example above, the [conditional probability](#conditionalProbability) of having HIV if a person has a positive test is 28.5%.
Given a positive test, chances are higher than not that the person does not have HIV.
Now let's see what happens if the person tests positive a second time.
We would revise our "[prior probability](#sec-baseRate)" for HIV from the general prevalence in the population (0.3%) to be the "posterior probability" of HIV given a first positive test (28.5%).
This is known as *Bayesian updating*.
We would also update the "evidence" to be the [marginal probability](#sec-baseRate) of getting a second positive test.
If we do not know a [marginal probability](#sec-baseRate) (i.e., base rate) of an event (e.g., getting a second positive test), we can calculate a [marginal probability](#sec-baseRate) with the *law of total probability* using [conditional probabilities](#sec-conditionalProbability) and the [marginal probability](#sec-baseRate) of another event (e.g., having HIV).
According to the law of total probability, the probability of getting a positive test is the probability that a person with HIV gets a positive test (i.e., sensitivity) times the base rate of HIV plus the probability that a person without HIV gets a positive test (i.e., false positive rate) times the [base rate](#sec-baseRate) of not having HIV, as in @eq-lawOfTotalProbability:
$$
\begin{aligned}
P(\text{not } C_i) &= 1 - P(C_i) \\
P(R_i) &= P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot P(\text{not } C_i) \\
1\% &= 95\% \times .3\% + P(R | \text{not } C) \times 99.7\% \\
\end{aligned}
$$ {#eq-lawOfTotalProbability}
In this case, we know the [marginal probability](#sec-baseRate) ($P(R_i)$), and we can use that to solve for the unknown [conditional probability](#sec-conditionalProbability) that reflects the false positive rate ($P(R | \text{not } C)$), as in @eq-conditionalProbabilityRevised:
$$
\scriptsize
\begin{aligned}
P(R_i) &= P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot P(\text{not } C_i) && \\
P(R_i) - [P(R | \text{not } C) \cdot P(\text{not } C_i)] &= P(R | C) \cdot P(C_i) && \text{Move } P(R | \text{not } C) \text{ to the left side} \\
- [P(R | \text{not } C) \cdot P(\text{not } C_i)] &= P(R | C) \cdot P(C_i) - P(R_i) && \text{Move } P(R_i) \text{ to the right side} \\
P(R | \text{not } C) \cdot P(\text{not } C_i) &= P(R_i) - [P(R | C) \cdot P(C_i)] && \text{Multiply by } -1 \\
P(R | \text{not } C) &= \frac{P(R_i) - [P(R | C) \cdot P(C_i)]}{P(\text{not } C_i)} && \text{Divide by } P(R | \text{not } C) \\
&= \frac{1\% - [95\% \times .3\%]}{99.7\%} = \frac{.01 - [.95 \times .003]}{.997}\\
&= .7171515\% \\
\end{aligned}
$$ {#eq-conditionalProbabilityRevised}
We can then estimate the marginal probability of the event, substititing in $P(R | \text{not } C)$, using the law of total probability.
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pA()` function that estimates the marginal probability of one event, $A$.
```{r}
petersenlab::pA(
pAgivenB = .95,
pB = .003,
pAgivenNotB = .007171515)
```
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pBgivenNotA()` function that estimates the probability of one event, $B$, given that another event, $A$, did not occur.
```{r}
petersenlab::pBgivenNotA(
pBgivenA = .95,
pA = .003,
pB = .01)
```
With this [conditional probability](#sec-conditionalProbability) ($P(R | \text{not } C)$), the updated [marginal probability](#sec-baseRate) of having HIV ($P(C_i)$), and the updated marginal probability of not having HIV ($P(\text{not } C_i)$), we can now calculate an updated estimate of the [marginal probability](#sec-baseRate) of getting a second positive test.
The probability of getting a second positive test is the probability that a person with HIV gets a second positive test (i.e., sensitivity) times the updated probability of HIV plus the probability that a person without HIV gets a second positive test (i.e., false positive rate) times the updated probability of not having HIV, as in @eq-baseRateUpdated:
$$
\begin{aligned}
P(R_{i}) &= P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot P(\text{not } C_i) \\
&= 95\% \times 28.5\% + .7171515\% \times 71.5\% = .95 \times .285 + .007171515 \times .715 \\
&= 27.58776\%
\end{aligned}
$$ {#eq-baseRateUpdated}
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pB()` function that estimates the marginal probability of one event, $B$.
```{r}
petersenlab::pB(
pBgivenA = .95,
pA = .285,
pBgivenNotA = .007171515)
```
We then substitute the updated [marginal probability](#sec-baseRate) of HIV ($P(C_i)$) and the updated [marginal probability](#sec-baseRate) of getting a second positive test ($P(R_i)$) into Bayes' theorem to get the probability that the person has HIV if they have a second positive test (assuming the errors of each test are independent, i.e., uncorrelated), as in @eq-baseRateUpdated2:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)} \\
P(\text{HIV} | \text{a second positive test}) &= \frac{P(\text{a second positive test} | \text{HIV}) \cdot P(\text{HIV})}{P(\text{a second positive test})} \\
&= \frac{\text{sensitivity of test} \times \text{updated base rate of HIV}}{\text{updated base rate of positive test}} \\
&= \frac{95\% \times 28.5\%}{27.58776\%} \\
&= 98.14\%
\end{aligned}
$$ {#eq-baseRateUpdated2}
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pAgivenB()` function that estimates the probability of one event, $A$, given another event, $B$.
```{r}
petersenlab::pAgivenB(
pBgivenA = .95,
pA = .285,
pB = .2758776)
```
Thus, a second positive test greatly increases the posterior probability that the person has HIV from 28.5% to over 98%.
As seen in the rearranged formula in @eq-bayes2, the ratio of the [conditional probabilities](#sec-conditionalProbability) is equal to the ratio of the [base rates](#sec-baseRate).
Thus, it is important to consider [base rates](#sec-baseRate).
People have a strong tendency to ignore (or give insufficient weight to) [base rates](#sec-baseRate) when making predictions.
The failure to consider the [base rate](#sec-baseRate) when making predictions when given specific information about a case is known as the [base rate fallacy](#sec-fallaciesBaseRate) or as [base rate neglect](#sec-fallaciesBaseRate).
For example, people tend to say that the probability of a rare event is more likely than it actually is given specific information.
As seen in the rearranged formula in @eq-bayes3, the inverse [conditional probabilities](#sec-conditionalProbability) ($P(C | R)$ and $P(R | C)$) are not equal unless the [base rates](#sec-baseRate) of $C$ and $R$ are the same.
If the [base rates](#sec-baseRate) are not equal, we are making at least some prediction errors.
If $P(C_i) > P(R_i)$, our predictions must include some false negatives.
If $P(R_i) > P(C_i)$, our predictions must include some false positives.
#### Alternative Formulation {#sec-bayesTheoremAlternative}
Using the law of total probability, we can substitute the calculation of the [marginal probability](#sec-baseRate) ($P(R_i)$) into Bayes' theorem to get an alternative formulation of Bayes' theorem, as in @eq-baseRateUpdated3:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)} \\
&= \frac{P(R | C) \cdot P(C_i)}{P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot P(\text{not } C_i)} \\
&= \frac{P(R | C) \cdot P(C_i)}{P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot [1 - P(C_i)]}
\end{aligned}
$$ {#eq-baseRateUpdated3}
Instead of using [marginal probability](#sec-baseRate) ([base rate](#sec-baseRate)) of $R$, as in the original formulation of Bayes' theorem, it uses the [conditional probability](#sec-conditionalProbability), $P(R|\text{not } C)$.
Thus, it uses three terms: two [conditional probabilities](#sec-conditionalProbability)—$P(R|C)$ and $P(R|\text{not } C)$—and one [marginal probability](#sec-baseRate), $P(C_i)$.
Let us see how the alternative formulation of Bayes' theorem applies to the HIV example above.
We can calculate the probability of HIV given a positive test using three terms: the [conditional probability](#sec-conditionalProbability) that a person with HIV gets a positive test (i.e., [sensitivity](#sec-sensitivity)), the [conditional probability](#sec-conditionalProbability) that a person without HIV gets a positive test (i.e., [false positive rate](#sec-falsePositiveRate)), and the [base rate](#sec-baseRate) of HIV.
Using the $P(R|\text{not } C)$ calculated in @eq-conditionalProbabilityRevised, the [conditional probability](#sec-conditionalProbability) of HIV given a single positive test is in @eq-bayes4:
$$
\small
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot [1 - P(C_i)]} \\
&= \frac{\text{sensitivity of test} \times \text{base rate of HIV}}{\text{sensitivity of test} \times \text{base rate of HIV} + \text{false positive rate of test} \times (1 - \text{base rate of HIV})} \\
&= \frac{95\% \times .3\%}{95\% \times .3\% + .7171515\% \times (1 - .3\%)} = \frac{.95 \times .003}{.95 \times .003 + .007171515 \times (1 - .003)}\\
&= 28.5\%
\end{aligned}
$$ {#eq-bayes4}
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pAgivenB()` function that estimates the probability of one event, $A$, given another event, $B$.
```{r}
pAgivenB(
pBgivenA = .95,
pA = .003,
pBgivenNotA = .007171515)
pAgivenB(
pBgivenA = .95,
pA = .003,
pBgivenNotA = pBgivenNotA(
pBgivenA = .95,
pA = .003,
pB = .01))
```
To calculate the [conditional probability](#sec-conditionalProbability) of HIV given a second positive test, we update our priors because the person has now tested positive for HIV.
We update the [prior probability](#sec-baseRate) of HIV ($P(C_i)$) based on the posterior probability of HIV after a positive test ($P(C | R)$) that we calculated above.
We can calculate the [conditional probability](#sec-conditionalProbability) of HIV given a second positive test using three terms: the [conditional probability](#sec-conditionalProbability) that a person with HIV gets a positive test (i.e., [sensitivity](#sec-sensitivity); which stays the same), the [conditional probability](#sec-conditionalProbability) that a person without HIV gets a positive test (i.e., [false positive rate](#sec-falsePositiveRate); which stays the same), and the updated [marginal probability](#sec-baseRate) of HIV.
The [conditional probability](#sec-conditionalProbability) of HIV given a second positive test is in @eq-baseRateUpdated4:
$$
\scriptsize
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot [1 - P(C_i)]} \\
&= \frac{\text{sensitivity of test} \times \text{updated base rate of HIV}}{\text{sensitivity of test} \times \text{updated base rate of HIV} + \text{false positive rate of test} \times (1 - \text{updated base rate of HIV})} \\
&= \frac{95\% \times 28.5\%}{95\% \times 28.5\% + .7171515\% \times (1 - 28.5\%)} = \frac{.95 \times .285}{.95 \times .285 + .007171515 \times (1 - .285)}\\
&= 98.14\%
\end{aligned}
$$ {#eq-baseRateUpdated4}
The [`petersenlab`](https://cran.r-project.org/web/packages/petersenlab/index.html) package [@R-petersenlab] contains the `pAgivenB()` function that estimates the probability of one event, $A$, given another event, $B$.
```{r}
pAgivenB(
pBgivenA = .95,
pA = .285,
pBgivenNotA = .007171515)
pAgivenB(
pBgivenA = .95,
pA = .285,
pBgivenNotA = pBgivenNotA(
pBgivenA = .95,
pA = .003,
pB = .01))
```
#### Interim Summary
In sum, the [marginal probability](#sec-baseRate), including the [prior probability](#sec-baseRate) or [base rate](#sec-baseRate), should be weighed heavily in predictions unless there are sufficient data to indicate otherwise, i.e., to update the posterior probability based on new evidence.
Bayes' theorem specifies how prior beliefs (i.e., [base rate](#sec-baseRate) informations) should be integrated with the [predictive accuracy](#sec-predictiveValidity) of the evidence to make predictions.
It thus provides a powerful tool to [anchor](#sec-heuristicsAnchoringAdjustment) predictions to the [base rate](#sec-baseRate) unless sufficient evidence changes the posterior probability (by updating the evidence and [prior probability](#sec-baseRate)).
In general, you should [anchor](#sec-heuristicsAnchoringAdjustment) your predictions to the [base rate](#sec-baseRate) and [adjust](#sec-heuristicsAnchoringAdjustment) from there.
As noted by @Kahneman2011, if you have doubts about the quality of the evidence for a particular prediction question, keep your predictions close to the [base rate](#sec-baseRate), and modify them only modifying them based on the new information.
## Cab Example {#sec-cabExample}
Below is an example:
> A cab was involved in a hit-and-run accident at night.
> Two cab companies, the Green and the Blue, operate in the city.
> You are given the following data:
>
> - 85% of the cabs in the city are Green and 15% are Blue.
> - A witness identified the cab as Blue.
> The court tested the reliability of the witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.
>
> What is the probability that the cab involved in the accident was Blue rather than Green?
>
> --- Kahneman [-@Kahneman2011, p. 166]
Thus, we know the following:
$$
\begin{aligned}
P(\text{Blue}) &= .15 && \text{prior probability of a Blue cab}\\
P(\text{Green}) &= .85 && \text{prior probability of a Green cab}\\
P(\text{Correct}|\text{Blue}) &= .80 && \text{probability the witness correctly identifies a Blue cab}\\
P(\text{Correct}|\text{Green}) &= .80 && \text{probability the witness correctly identifies a Green cab}\\
P(\text{Incorrect}|\text{Blue}) &= .20 && \text{probability the witness incorrectly identifies a Blue cab}\\
P(\text{Incorrect}|\text{Green}) &= .20 && \text{probability the witness incorrectly identifies a Green cab}\\
\end{aligned}
$$ {#eq-cabExampleGivenInfo}
We want to know the probability that the cab involved in the accident was Blue, given that the witness identified it as Blue ($P(\text{Blue}|\text{Identified as Blue})$).
To estimate this probability, we can apply [Bayes' theorem](#sec-bayesTheorem) to estimate the posterior probability:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)}\\
P(\text{Blue}|\text{Identified as Blue}) &= \frac{P(\text{Identified as Blue}|\text{Blue}) \cdot P(\text{Blue})}{P(\text{Identified as Blue})}
\end{aligned}
$$ {#eq-cabExampleBayesTheorem}
We can compute the term in the denominator ($P(\text{Identified as Blue})$) using the law of total probability (described in @sec-bayesTheorem).
$$
\begin{aligned}
P(R_i) &= P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot P(\text{not } C_i)\\
P(R_i) &= P(\text{Identified as Blue}|\text{Blue}) \cdot P(\text{Blue}) + P(\text{Identified as Blue}|\text{Green}) \cdot P(\text{Green})\\
0.29 &= (.80 \times .15) + (.20 \times .85) \\
\end{aligned}
$$ {#eq-cabExampleLawOfTotalProbability}
```{r}
petersenlab::pA(
pAgivenB = .80,
pB = .15,
pAgivenNotB = .20)
```
We can now substitute this value into the denominator of Bayes' theorem to estimate the posterior probability:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)}\\
P(\text{Blue}|\text{Identified as Blue}) &= \frac{P(\text{Identified as Blue}|\text{Blue}) \cdot P(\text{Blue})}{P(\text{Identified as Blue})}\\
0.414 &= \frac{0.80 \times 0.15}{0.29}
\end{aligned}
$$ {#eq-cabExamplePosterior}
```{r}
petersenlab::pAgivenB(
pBgivenA = .80,
pA = .15,
pB = .29)
```
Thus, there was a 41.4% probability that the car involved in the accident was Blue rather than Green.
However, when faced with this problem, people tend to [ignore the base rate](#sec-fallaciesBaseRate) and go with the witness [@Kahneman2011].
According to @Kahneman2011, the most frequent response to this question regarding is that there is an 80% that the car was Blue.
## Nate Silver Examples {#sec-nateSilverExamples}
@Silver2012 provides several examples that leverage the [alternative formulation of Bayes' theorem](#sec-bayesTheoremAlternative) provided in @eq-baseRateUpdated3 and summarized below:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R | C) \cdot P(C_i) + P(R | \text{not } C) \cdot [1 - P(C_i)]}
\end{aligned}
$$ {#eq-baseRateAlternative}
In each example, the formula uses three elements to calculate the probability that the hypothesis is true:
1. the [conditional probability](#sec-conditionalProbability) the likelihood of observing the evidence, $R$, given that the hypothesis, $C$, is true (i.e., $P(R|C)$; [true positive rate](#sec-sensitivity))
1. the [conditional probability](#sec-conditionalProbability) the likelihood of observing the evidence, $R$, given that the hypothesis, $C$, is false (i.e., $P(R | \text{not } C)$; [false positive rate](#sec-falsePositiveRate))
1. the [marginal probability](#sec-baseRate) (base rate) of the event occurring (i.e., the prior probability of the hypothesis, $C$, being true; $P(C_i)$)
Thus, the formula uses the [base rate](#sec-baseRate), the [true positive rate](#sec-sensitivity) (sensitivity), and the [false positive rate](#sec-falsePositiveRate).
The ratio of the [true positive rate](#sec-sensitivity) to the [false positive rate](#sec-falsePositiveRate) is called the [positive likelihood ratio](#sec-positiveLikelihoodRatio), and is used in [Bayesian updating](#sec-bayesianUpdating).
### Example 1: Is Your Partner Cheating on You? {#sec-nateSilverExample1}
Example 1: You came home and found a strange pair of underwear in your underwear drawer.
What is the probability that your partner is cheating on you?
- the [prior probability](#sec-baseRate) that your partner is cheating on you: 4%
- the [conditional probability](#sec-conditionalProbability) of underwear appearing given that your partner is cheating on you: 50%
- the [conditional probability](#sec-conditionalProbability) of underwear appearing given that your partner is *not* cheating on you: 5%
```{r}
pAgivenB(
pBgivenA = .50,
pA = .04,
pBgivenNotA = .05)
```
### Example 2: Does a Person Have Breast Cancer? {#sec-nateSilverExample2}
Example 2: What is the probability that a woman in her 40s has breast cancer if she tested positive on a mammogram?
- the [prior probability](#sec-baseRate) that she has breast cancer: 1.4%
- the [conditional probability](#sec-conditionalProbability) that she has a positive test given that she has breast cancer: 75%
- the [conditional probability](#sec-conditionalProbability) that she has a positive test given that she does *not* have breast cancer: 10%
```{r}
pAgivenB(
pBgivenA = .75,
pA = .014,
pBgivenNotA = .10)
```
### Example 3: Was it a Terrorist Attack? {#sec-nateSilverExample3}
#### Example 3A: The First Plane Hit the World Trade Center {#sec-nateSilverExample3A}
Example 3A: Consider the information we had on 9/11 when the first plane hit the World Trade Center.
What is the probability that a terror attack occurred given that the first plane hit the World Trade Center?
- the [prior probability](#sec-baseRate) that terrorists crash a plane into a Manhattan skyscraper: 0.005%
- the [conditional probability](#sec-conditionalProbability) that a plane crashes into a Manhattan skyscraper if terrorists are attacking Manhattan skyscrapers: 100%
- the [conditional probability](#sec-conditionalProbability) that a plane crashes into a Manhattan skyscraper if terrorists are *not* attacking Manhattan skyscrapers (i.e., it is an accident): 0.008%
```{r}
pAgivenB(
pBgivenA = 1,
pA = .00005,
pBgivenNotA = .00008)
```
#### Example 3B: The Second Plane Hit the World Trade Center {#sec-nateSilverExample3B}
Example 3B: Now, consider that a second plane just hit the World Trade Center.
What is the probability that a terror attack occurred given that a second plane hit the World Trade Center?
- the revised [prior probability](#sec-baseRate) that terrorists crash a plane into a Manhattan skyscraper (from [Example 3A](#sec-nateSilverExample3A)): 38.46272%
- the [conditional probability](#sec-conditionalProbability) that a plane crashes into a Manhattan skyscraper if terrorists are attacking Manhattan skyscrapers: 100%
- the [conditional probability](#sec-conditionalProbability) that a plane crashes into a Manhattan skyscraper if terrorists are *not* attacking Manhattan skyscrapers (i.e., it is an accident): 0.008%
```{r}
pAgivenB(
pBgivenA = 1,
pA = .3846272,
pBgivenNotA = .00008)
```
## Base Rates Applied to Fantasy Football {#sec-baseRateFantasyFootball}
Base rates are also relevant to fantasy football.
Unlike yardage (e.g., passing yards, rushing yards, receiving yards), touchdowns occur *relatively* less frequently.
Whereas a solid Wide Receiver may log 100+ receptions and 1,200+ yards in a season, they may have "only" 8–14 receiving touchdowns in a given season.
As noted in @sec-differentErrorsDifferentCosts, lower base-rate events—including touchdowns—are harder to predict accurately.
As noted by [Harris (2012)](https://www.espn.com/fantasy/football/ffl/story?page=nfldk2k12_vbdwork): "NFL statistical projections are basically impossible to get right. (Take it from someone who helps create them for a living.) Yes, we can do a passable job with yardage totals for players who don't suffer unexpected injuries or depth-chart pratfalls. But so much of fantasy football hinges on touchdowns, and touchdowns are impossibly difficult to predict from season to season (let alone week to week)." (archived at <https://perma.cc/4QNH-J2LD>).
Thus, it is important not to lend too much credence to predictions of touchdowns.
Focus on other things that may be more predictable (and that may be indirectly prognostic of touchdowns) such as yards, carries/targets, receptions, depth of targets, red zone carries/targets, short distance carries/targets, etc.
[PROVIDE ACCURACY OF PROJECTIONS OF TOUCHDOWNS VS YARDAGE]
When dealing with numeric predictions (rather than categorical outcomes), the equivalent of the base rate is the average value.
For instance, the "base rate" of fantasy points for a given position is the average number of fantasy points for that position.
We could subdivide even further to identify, for instance, the "base rate" of fantasy points for the Wide Receiver at the top of the depth chart on a team.
## Base Rate of Rookie Performance {#sec-baseRateRookiePerformance}
### Quarterbacks {#sec-baseRateRookiePerformanceQBs}
### Running Backs {#sec-baseRateRookiePerformanceRBs}
## How to Account for Base Rates {#sec-accountForBaseRates}
There are various ways to account for [base rates](#sec-baseRate), including the use of [actuarial formulas](#sec-accountForBaseRatesActuarial) and the use of [Bayesian updating](#sec-bayesianUpdating).
### Actuarial Formula {#sec-accountForBaseRatesActuarial}
One approach to account for [base rates](#sec-baseRate) is to use [actuarial formulas](#sec-actuarialPrediction) (rather than [human judgment](#sec-humanJudgment)) to make the predictions.
[Actuarial formulas](#sec-actuarialPrediction) based on [multiple regression](#sec-multipleRegression) or [machine learning](#sec-machineLearning) can account for the [base rate](#sec-baseRate) of the event.
### Bayesian Updating {#sec-bayesianUpdating}
Another approach to account for [base rates](#sec-baseRate) is to leverage Bayes' theorem, using Bayesian updating and the [probability nomogram](#sec-probabilityNomogram).
Bayesian updating is a form of [anchoring and adjustment](#sec-heuristicsAnchoringAdjustment); however, unlike the [anchoring and adjustment heuristic](#sec-heuristicsAnchoringAdjustment), it is a systematic approach to [anchoring and adjustment](#sec-heuristicsAnchoringAdjustment) that anchors one's predictions to the base rate, and then adjusts according to new information.
That is, we start with a [pretest probability](#sec-baseRate) (i.e., [base rate](#sec-baseRate)) and update our predictions based on the extent of new information (i.e., the [likelihood ratio](#sec-diagnosticLikelihoodRatio)).
To perform Bayesian updating involves comparing the relative probability of two outcomes, $P(C | R)$ versus $P(\text{not } C | R)$.
If we want to compare the relative probability of two outcomes, we can use the odds form of Bayes' theorem, as in @eq-bayes5:
$$
\begin{aligned}
P(C | R) &= \frac{P(R | C) \cdot P(C_i)}{P(R_i)} \\
P(\text{not } C | R) &= \frac{P(R | \text{not } C) \cdot P(\text{not } C_i)}{P(R_i)} \\
\frac{P(C | R)}{P(\text{not } C | R)} &= \frac{\frac{P(R | C) \cdot P(C_i)}{P(R_i)}}{\frac{P(R | \text{not } C) \cdot P(\text{not } C_i)}{P(R_i)}} \\
&= \frac{P(R | C) \cdot P(C_i)}{P(R | \text{not } C) \cdot P(\text{not } C_i)} \\
&= \frac{P(C_i)}{P(\text{not } C_i)} \times \frac{P(R | C)}{P(R | \text{not } C)} \\
\text{posterior odds} &= \text{prior odds} \times \text{likelihood ratio}
\end{aligned}
$$ {#eq-bayes5}
As presented in @eq-bayes5, the posttest (or posterior) odds are equal to the pretest odds multiplied by the [likelihood ratio](#sec-diagnosticLikelihoodRatio).
Below, we describe the [likelihood ratio](#sec-diagnosticLikelihoodRatio).
#### Diagnostic Likelihood Ratio {#sec-diagnosticLikelihoodRatio}
A likelihood ratio is the ratio of two probabilities.
It can be used to compare the likelihood of two possibilities.
The diagnostic likelihood ratio is an index of the predictive validity of an instrument: it is the ratio of the probability that a test result is correct to the probability that the test result is incorrect.
The diagnostic likelihood ratio is also called the risk ratio.
There are two types of diagnostic likelihood ratios: the [positive likelihood ratio](#sec-positiveLikelihoodRatio) and the [negative likelihood ratio](#sec-negativeLikelihoodRatio).
##### Positive Likelihood Ratio (LR+) {#sec-positiveLikelihoodRatio}
The positive likelihood ratio (LR+) compares the [true positive rate](#sec-sensitivity) to the [false positive rate](#sec-falsePositiveRate).
Positive likelihood ratio values range from 1 to infinity.\index{positive likelihood ratio}
Higher values reflect greater accuracy, because it indicates the degree to which a [true positive](#sec-truePositive) is more likely than a [false positive](#sec-falsePositive).
The formula for calculating the positive likelihood ratio is in @eq-positiveLikelihoodRatio.
$$
\begin{aligned}
\text{positive likelihood ratio (LR+)} &= \frac{\text{TPR}}{\text{FPR}} \\
&= \frac{P(R|C)}{P(R|\text{not } C)} \\
&= \frac{P(R|C)}{1 - P(\text{not } R|\text{not } C)} \\
&= \frac{\text{sensitivity}}{1 - \text{specificity}}
\end{aligned}
$$ {#eq-positiveLikelihoodRatio}
##### Negative Likelihood Ratio (LR−) {#sec-negativeLikelihoodRatio}
The negative likelihood ratio (LR−) compares the [false negative rate](#sec-falseNegativeRate) to the [true negative rate](#sec-specificity).
Negative likelihood ratio values range from 0 to 1.
Smaller values reflect greater accuracy, because it indicates that a [false negative](#sec-falseNegative) is less likely than a [true negative](#sec-trueNegative).
The formula for calculating the negative likelihood ratio is in @eq-negativeLikelihoodRatio.
$$
\begin{aligned}
\text{negative likelihood ratio } (\text{LR}-) &= \frac{\text{FNR}}{\text{TNR}} \\
&= \frac{P(\text{not } R|C)}{P(\text{not } R|\text{not } C)} \\
&= \frac{1 - P(R|C)}{P(\text{not } R|\text{not } C)} \\
&= \frac{1 - \text{sensitivity}}{\text{specificity}}
\end{aligned}
$$ {#eq-negativeLikelihoodRatio}
#### Probability Nomogram {#sec-probabilityNomogram}
Using [Bayes' theorem](#sec-bayesTheorem) (described in @sec-bayesTheorem), solving for posttest odds (based on pretest odds and the [likelihood ratio](#sec-diagnosticLikelihoodRatio), as in @eq-bayes5), and converting odds to probabilities, we can use a Fagan probability nomogram to determine the posttest probability following a test result.
The calculation of posttest probability is described in INSERT.
A *probability nomogram* (aka Fagan nomogram) is a way of visually applying [Bayes' theorem](#sec-bayesTheorem) to determine the posttest probability of having a condition based on the [pretest (or prior) probability](#sec-baseRate) and [likelihood ratio](#sec-diagnosticLikelihoodRatio), as depicted in @fig-probabilityNomogram.
To use a probability nomogram, connect the dots from the starting probability (left line) with the [likelihood ratio](#sec-diagnosticLikelihoodRatio) (middle line) to see the updated probability.
The updated (posttest) probability is where the connecting line crosses the third, right line.
::: {#fig-probabilityNomogram}
![](images/probabilityNomogram.png){width=50% fig-alt="Probability Nomogram. (Figure retrieved from https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Fagan_nomogram.svg/945px-Fagan_nomogram.svg.png)"}
Probability Nomogram. (Figure retrieved from [https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Fagan_nomogram.svg/945px-Fagan_nomogram.svg.png](https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Fagan_nomogram.svg/945px-Fagan_nomogram.svg.png)).
:::
For instance, if the starting probability is 0.5% and the [likelihood ratio](#sec-diagnosticLikelihoodRatio) is 10 (e.g., [sensitivity](#sec-sensitivity) = .90, [specificity](#sec-specificity) = .91: $\text{likelihood ratio} = \frac{\text{sensitivity}}{1 - \text{specificity}} = \frac{.9}{1-.91} = 10$) from a positive test (i.e., [positive likelihood ratio](#sec-positiveLikelihoodRatio)), the updated probability is less than 5%, as depicted in @fig-probabilityNomogramLine.
An interactive probability nomogram is available at the following link: <https://jamaevidence.mhmedical.com/data/calculators/LR_nomogram.html> (archived at <https://perma.cc/Z3SW-QMJ3>).
The [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] contains the `posttestProbability()` function that estimates the posttest probability of an event, given the [pretest probability](#sec-baseRate) and the [likelihood ratio](#sec-diagnosticLikelihoodRatio), or given the [pretest probability](#sec-baseRate) and the [sensitivity](#sec-sensitivity) (SN) and [specificity](#sec-specificity) (SP) of the test.
```{r}
petersenlab::posttestProbability(
pretestProb = .005,
likelihoodRatio = 10)
petersenlab::posttestProbability(
pretestProb = .005,
SN = .90,
SP = .91)
```
The function can also estimate the posttest probability of an event given the number of [true positives](#sec-truePositive) (TP), [true negatives](#sec-trueNegative) (TN), [false positives](#sec-falsePositive) (FP), and [false negatives](#sec-falseNegative) (FN):
```{r}
petersenlab::posttestProbability(
TP = 450,
TN = 90545,
FP = 8955,
FN = 50)
```
We discuss [true positives](#sec-truePositive) (TP), [true negatives](#sec-trueNegative) (TN), [false positives](#sec-falsePositive) (FP), [false negatives](#sec-falseNegative) (FN), [sensitivity](#sec-sensitivity) (SN), and [specificity](#sec-specificity) (SP) in @sec-thresholdDependentAccuracy (@sec-decisionOutcomes and @sec-sensitivitySpecificityPPVnpv).
If the starting probability is 0.5% and the [likelihood ratio](#sec-diagnosticLikelihoodRatio) is 0.11 from a negative test (i.e., [negative likelihood ratio](#sec-negativeLikelihoodRatio)), the updated probability is nearly indistinguishable from zero (0.05%).
```{r}
petersenlab::posttestProbability(
pretestProb = .005,
likelihoodRatio = 0.11)
```
::: {#fig-probabilityNomogramLine}
![](images/probabilityNomogramLine.png){width=50% fig-alt="Probability Nomogram. (Figure retrieved from https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Fagan_nomogram.svg/945px-Fagan_nomogram.svg.png). Also provided in: @Petersen2024a and @PetersenPrinciplesPsychAssessment."}
Probability Nomogram Example. (Figure adapted from [https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Fagan_nomogram.svg/945px-Fagan_nomogram.svg.png](https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Fagan_nomogram.svg/945px-Fagan_nomogram.svg.png). Also provided in: @Petersen2024a and @PetersenPrinciplesPsychAssessment.)
:::
A probability nomogram calculator can be found at the following link: [http://araw.mede.uic.edu/cgi-bin/testcalc.pl](http://araw.mede.uic.edu/cgi-bin/testcalc.pl) (archived at <https://perma.cc/X8TF-7YBX>).
The [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] contains the `nomogrammer()` function that creates a nomogram plot using the [positive](#sec-positiveLikelihoodRatio) and [negative](#sec-negativeLikelihoodRatio) [likelihood ratio](#sec-diagnosticLikelihoodRatio) or using the [sensitivity](#sec-sensitivity) (SN) and [specificity](#sec-specificity) (SP) of the test, as adapted from Adam Chekroud (<https://github.com/achekroud/nomogrammer>):
```{r}
petersenlab::nomogrammer(
pretestProb = .005,
SN = 0.90,
SP = 0.91)
```
The blue line indicates the [posterior probability](#posttestProbability) of the condition given a positive test.
The pink line indicates the [posterior probability](#posttestProbability) of the condition given a negative test.
```{r}
petersenlab::nomogrammer(
pretestProb = .005,
PLR = 10,
NLR = 0.11)
```
The function can also create a nomogram plot using the [true positives](#sec-truePositive) (TP), [true negatives](#sec-trueNegative) (TN), [false positives](#sec-falsePositive) (FP), and [false negatives](#sec-falseNegative) (FN):
```{r}
petersenlab::nomogrammer(
TP = 450,
TN = 90545,
FP = 8955,
FN = 50)
```
The function can also create a nomogram plot using the [sensitivity](#sec-sensitivity) (SN) and [selection rate](#sec-selectionRatio) of the test.
Here is a nomogram plot from the [HIV example](#sec-baseRatesOverview):
```{r}
petersenlab::nomogrammer(
pretestProb = .003,
SN = .95,
selectionRate = .01
)
```
Here is a nomogram plot from the [cab example](#sec-cabExample) [@Kahneman2011]:
```{r}
petersenlab::nomogrammer(
pretestProb = .15,
SN = .80,
SP = .80
)
```
The function can also create a nomogram plot using the [sensitivity](#sec-sensitivity) (SN) and [false positive rate](#sec-falsePositiveRate) of the test.
Here is a nomogram plot from [Example 1](#sec-nateSilverExample1) from @Silver2012:
```{r}
petersenlab::nomogrammer(
pretestProb = .04,
SN = .50,
FPR = .05
)
```
Here is a nomogram plot from [Example 2](#sec-nateSilverExample2) from @Silver2012:
```{r}
petersenlab::nomogrammer(
pretestProb = .014,
SN = .75,
FPR = .10
)
```
Here is a nomogram plot from [Example 3A](#sec-nateSilverExample3A) from @Silver2012:
```{r}
petersenlab::nomogrammer(
pretestProb = .00005,
SN = 1,
FPR = .00008
)
```
Here is a nomogram plot from [Example 3B](#sec-nateSilverExample3B) from @Silver2012:
```{r}
petersenlab::nomogrammer(
pretestProb = .3846272,
SN = 1,
FPR = .00008
)
```
#### Informal Updating {#sec-informalUpdating}
@Kahneman2011 provides the following guidance for an informal approach to updating that anchors predictions to the base rate:
1. Start with the "baseline prediction" (i.e., base rate or average outcome).
1. Generate or identify your "intuitive prediction"—the number that matches your impression of the evidence.
1. Your posterior prediction should fall somewhere between the baseline prediction and the intuitive prediction.
"In the default case of no useful evidence, you stay with the baseline [prediction].
At the other extreme, you also stay with your initial [i.e., intuitive] prediction.
This will happen, of course, only if you remain completely confident in your initial prediction after a critical review of the evidence that supports it.
In most cases you will find some reasons to doubt that the correlation between your intuitive judgment and the truth is perfect, and you will end up somewhere between the two poles." (pp. 191–192).
Base the extent of adjustment (from the baseline prediction) on the magnitude of the correlation between your prediction/evidence and the truth, which acts similar to the [likelihood ratio](#sec-diagnosticLikelihoodRatio).
For instance, if the correlation between your prediction/evidence and the truth is .5, move 50% of the difference from the baseline prediction to the intuitive prediction.
## Conclusion {#sec-baseRatesConclusion}
Fantasy performance—and behavior more generally—is challenging to predict.
People commonly demonstrate [biases](#sec-cognitiveBiases) and [fallacies](#sec-fallacies) when making predictions.
People tend to ignore base rates ([base rate fallacy](#sec-fallaciesBaseRate)) when making predictions.
They also tend to confuse inverse conditional probabilities ([conditional probability fallacy](#sec-inverseFallacy)).
[Bayes' theorem](#sec-bayesTheorem) provides a way to convert from one conditional probability to its inverse conditional probability using the [base rate](#sec-baseRate) of each event.
There are various ways to account for [base rates](#sec-baseRate) for more accurate predictions, including through the use of [actuarial formulas](#sec-accountForBaseRatesActuarial), [Bayesian updating](#sec-bayesianUpdating), and more [informal approaches](#sec-informalUpdating).
[Bayesian updating](#sec-bayesianUpdating) uses [Bayes' theorem](#sec-bayesTheorem) to calculate a posttest probability from a [pretest probability](#sec-baseRate) and a test result ([likelihood ratio](#sec-diagnosticLikelihoodRatio)).
The [probability nomogram](#sec-probabilityNomogram) is a visual approach to [Bayesian updating](#sec-bayesianUpdating).
::: {.content-visible when-format="html"}
## Session Info {#sec-baseRatesSessionInfo}
```{r}
sessionInfo()
```
:::