-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-binomial-distribution.qmd
644 lines (453 loc) · 24.7 KB
/
04-binomial-distribution.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
---
engine: knitr
---
# Creating a Binomial Probability Distribution {#sec-prob-dist}
## Structure of a Binomial Distribution
A binomial distribution is used to calculate the probability of a
certain number of successful outcomes, given a number of trials and the
probability of the successful outcome. The "bi" in the term binomial
refers to the two possible outcomes that we're concerned with: an event
happening and an event not happening. (If there are more than two
outcomes, the distribution is called multinomial.)
Examples for a binomial distribution are:
- Flipping two heads in three coin tosses
- Buying 1 million lottery tickets and winning at least once
- Rolling fewer than three 20s in 10 rolls of a 20-sided die
------------------------------------------------------------------------
::: {#def-binom-dist-param}
#### Parameter for the binomial distribution
All binomial distributions involve three parameters:
- `k` The number of outcomes we care about
- `n` The total number of trials
- `p` The probability of the event happening
:::
------------------------------------------------------------------------
Calculating the probability of flipping two heads in three coin tosses:
- $k = 2$, the number of events we care about, in this case flipping a
heads
- $n = 3$, the number times the coin is flipped
- $p = 1/2$, the probability of flipping a heads in a coin toss
------------------------------------------------------------------------
::: {#thm-binomial-distribution-shorthand-notation}
#### Shorthand notation of a binomial distribution
$$B(k; n, p)$$ {#eq-short-notation}
:::
------------------------------------------------------------------------
For the example of two heads in three coin tosses, we would write
$B(2; 3, 1/2)$.
- `B` stands for *binomial* distribution
- `k` is separated from the other parameters by a semicolon. This is
because when we are talking about a distribution of values, we
usually care about all values of $k$ for a fixed $n$ and $p$.
- `B(k; n, p)` denotes each value in the distribution
- `B(n, p)` denotes the whole distribution
## Understanding and Abstracting Out the Details of Our Problem
We'll continue with the example of calculating the probability of
flipping two heads in three coin tosses. Since the number of possible
outcomes is small, we can quickly figure out the results we care about
with just pencil and paper.
$$HHT, HTH, THH$$ To start generalizing, we'll break this problem down
into smaller pieces we can solve right now, and reduce those pieces into
manageable equations. As we build up the equations, we'll put them
together to create a generalized function for the binomial distribution.
***
::: {#thm-permuation-example}
#### Permuation Example for the Binomial Distribution
- Each outcome we care about will have the *same* probability.
- Each outcome is just a `r glossary("permutation")`, or reordering,
of the others
$$
\begin{align*}
P({heads, heads, tails}) = P({heads, tails, heads}) = P({tails, heads, heads}) = \\
P(\text{Desired Outcome})
\end{align*}
$$ {#eq-permutation-example}
See how to do this calculation with R in @sec-example-4-1.
:::
***
There are three outcomes, but only one of them can possibly happen and
we don't care which. And because it's only possible for one outcome to
occur, we know that these are mutually exclusive, denoted as:
$$P(\{heads, heads, tails\},\{heads, tails, heads\},\{tails, heads, heads\}) = 0$$
This makes using the sum rule of probability easy.
$$
\begin{align*}
P(\{heads, heads, tails\} \operatorname{OR} \{heads, tails, heads\} \operatorname{OR} \{tails, heads, heads\}) = \\
P(\text{Desired Outcome}) + P(\text{Desired Outcome}) + P(\text{Desired Outcome}) = \\
3 \times P(Desired Outcome)
\end{align*}
$$ The value "3" is specific to this problem and therefore not
generalizable. We can fix this by simply replacing "3" with a variable
called $N_{outcomes}$.
------------------------------------------------------------------------
::: {#thm-pmf-placeholder}
#### Solution with place holders
$$B(k;n,p) = N_{outcomes} \times P(\text{Desired Outcome})$$ {#eq-pmf-placeholder}
:::
------------------------------------------------------------------------
Now we have to figure out two subproblems:
1. How to count the number of outcomes we care about?
2. How to determine the probability for a single outcome?
## Counting Our Outcomes with the Binomial Coefficient
First we need to figure out how many outcomes there are for a given k
(the outcomes we care about) and n (the number of trials). For small
numbers we can simply count. But it doesn't take much for this to become
too difficult to do by hand. The solution is
`r glossary("combinatorics")`.
### Combinatorics: Advanced Counting with the Binomial Coefficient
There is a special operation in combinatorics, called the *binomial
coefficient*, that represents counting the number of ways we can select
`k` from `n` --- that is, selecting the outcomes we care about from the
total number of trials.
------------------------------------------------------------------------
::: {#thm-binomial-coefficient}
#### Notation for the binomial coefficient
$$\binom{n}{k}$$ {#eq-binomial-coefficient}
:::
------------------------------------------------------------------------
We read this as "n choose k". In our example we would say "in three
tosses choose two heads":
$$\binom{3}{2}$$
------------------------------------------------------------------------
::: {#thm-def-bin-coeff}
#### Definition of the binomial coefficient operation
$$\binom{n}{k} = \frac{n!}{k! \times (n-k)!}$$ {#eq-def-bin-coeff}
:::
------------------------------------------------------------------------
The `!` means *factorial*, which is the product of all the numbers up to
and including the number before the `!` symbol, so
$5! = (5 × 4 × 3 × 2 × 1)$.
In R we compute the binomial coefficient for the case of flipping two
heads in three tosses with the following function call:
------------------------------------------------------------------------
```{r}
#| label: binomial-coeff
#| attr-source: '#lst-binomial-coeff lst-cap="**Compute the binomial coefficient for flipping two heads in three tosses**"'
choose(3,2)
```
------------------------------------------------------------------------
See how to calculate the binomial coefficient with Base R in
@sec-example-4-2.
We can now replace $N_{Outcomes}$ in @eq-pmf-placeholder with the
binomial coefficient:
$$B(k;n,p) = \binom{n}{k} \times P(\text{Desired Outcome})$$
### Calculating the Probability of the Desired Outcome
All we have left to figure out is the $P(\text{Desired Outcome})$, which
is the probability of any of the possible events we care about. So far
we've been using $P(\text{Desired Outcome})$ as a variable to help
organize our solution to this problem, but now we need to figure out
exactly how to calculate this value.
Let's focus on a single case of our example of tow heads in three
tosses: $HHT$. Using the product rule and negation from the previous
chapter, we can describe this problem as:
$$P(heads, heads, no heads) = P(heads, heads, 1-heads)$$ Now we can use
the product rule from @eq-product-rule:
$$
\begin{align*}
P(heads, heads, 1-heads) = \\
P(heads) \times P(heads) \times (1-P(heads)) = \\
P(heads)^2 \times (1-P(heads))^1
\end{align*}
$$ You can see that the exponents for $P(heads)^2$ and $1 – P(heads)^1$
are just the number of heads and the number of not heads in our
scenario. These equate to `k`, the number of outcomes we care about, and
`n – k`, the number of trials minus the outcomes we care about. Puting
all together:
$$
\binom{n}{k} \times P(heads)^{k} \times (1- P(heads))^{n-k}
$$ Generalizing for any probability, not just heads, we replace
$P(heads)$ with just `p`. This gives us a *general solution*. Compare
the following list with @def-binom-dist-param.
- `k`, the number of outcomes we care about;
- `n`, the number of trials; and
- `p`, the probability of the individual outcome.
------------------------------------------------------------------------
::: {#thm-pmf}
#### Probability Mass Function (PMF) for the Binomial Distribution
$$
\binom{n}{k} \times p^{k} \times (1- p)^{n-k}
$$ {#eq-pmf}
:::
------------------------------------------------------------------------
@eq-pmf is the basis of the binomial distribution. It is called a
`r glossary("Probability Mass Function")` (PMF). The *mass* part of the
name comes from the fact that we can use it to calculate the amount of
probability for any given `k` using a fixed `n` and `p`, so this is the
mass of our probability.
Now that we have this equation, we can solve any problem related to
outcomes of a coin toss. For example, we could calculate the probability
of flipping exactly 12 heads in 24 coin tosses:
$$
B(12,24,\frac{1}{2}) = \binom{24}{12} \times (\frac{1}{2})^{12} \times (1-\frac{1}{2})^{24-12} = 0.1611803
$$
------------------------------------------------------------------------
```{r}
#| label: calc-example
#| attr-source: '#lst-calc-example lst-cap="**Calculate the probability of flipping exactly 12 heads in 24 coin tosses**"'
choose(24,12) * (1 / 2)^(12) * (1 - 1/2)^(24 - 12)
```
------------------------------------------------------------------------
The calculation in @lst-calc-example is only valid for our concrete
example.
For example, we can plug in all the possible values for `k` in 10 coin
tosses into our PMF and visualize what the binomial distribution looks
like for all possible values.
{#fig-10-coin-flips
fig-alt="Bar graph showing the probability of getting k in 10 coin flips"
fig-align="center" width=70%}
See my @fig-repl-10-coin-flips as a replication of @fig-10-coin-flips.
We can also look at the same distribution for the probability of getting a 6 when rolling a six-sided die 10 times, as shown in @fig-10-dice-rolls.
{#fig-10-dice-rolls
fig-alt="Bar graph showing the probability of getting a 6 when rolling a six-sided die 10 times"
fig-align="center" width=70%}
Again I replicated @fig-10-dice-rolls with my @fig-repl-10-dice-rolls.
Bottom line of the discussion in this section: A probability distribution is a way of generalizing an entire class of problems.
## Example: Gacha Games {#sec-gacha-games}
There is only one new content in this section: Instead of computing the probability of one event we are going to calculate the possibility of drawing *at least* one specific card from a pile of infinite numbers of cards where we know the probability of this card we are interested. The aim is to have a `p` of at last 50% with 100 trials and a probability of 0.720% for the card we are interested.
At first let us compute the probability for getting *exactly one card* we are interested with 100 draws form the pile. We know the probability to draw the featured card is 0.720%.
$$\binom{100}{1} \times 0.00720^{1} \times (1- 0.00720)^{100-1}$$
```{r}
#| label: draw-exact-one-card
#| attr-source: '#lst-draw-exact-one-card lst-cap="Draw exact one card that has p = 0.720%"'
dbinom(1, 100, 0.00720)
```
And now let's compute the probability for *at least one card* we are interested.
In R, we can use the Binomial Cumulative Distribution Function `pbinom()` to automatically sum up all the values of the card we are interested in our PMF.
{#fig-pbinom-function
fig-alt="Explaining the structure of the pbinom() function"
fig-align="center" width=60%}
The `pbinom()` function takes three required arguments and an optional fourth called `lower.tail` (which defaults to TRUE). When the fourth argument is TRUE, the first argument sums up all of the probabilities less than or equal to our argument. When `lower.tail` is set to FALSE, it sums up the probabilities strictly greater than the first argument. By setting the first argument to $0$, we are looking at the probability of getting one or more the cards we are interested. We set `lower.tail` to FALSE because that means we want values greater than the first argument (by default, we get values less than the first argument).
```{r}
#| label: use-pbinom-example
#| attr-source: '#lst-use-pbinom-example lst-cap="Example Calculation with the pbinom() Function"'
pbinom(0, 100, 0.00720, lower.tail = FALSE)
```
Voilá! This is the same result as in @#fig-pbinom-function.
## Wrapping Up
In this chapter Will Kurt demonstrated how we can deduce intuitively the formula for the `r glossary("binomial coefficient")`.
::: {.callout-note}
I have seen this monstrosity of expression for the binomial coefficient many times in different books and was always overwhelmed from its complexity. But this has changed now: Will Kurt succeeded to demystify the formula for me!
:::
## Exercises
Try answering the following questions to make sure you’ve grasped binomial distributions fully. The solutions can be found at https://nostarch.com/learnbayes/.
### Exercise 4.1
::: {#exr-04-1}
What are the parameters of the binomial distribution for the probability of rolling either a 1 or a 20 on a 20-sided die, if we roll the die 12 times?
- k = interested events: 1 and 20 = 2.
- n = number of trials = 12
- p = probability for each trial = $\frac{2}{12}$
```{r}
#| label: exercise-04-1
#| attr-source: '#lst-exercise-04-1 lst-cap="**Binomial distribution for the probability of rolling either a 1 or a 20 on a 20-sided die, if we roll the die 12 times**"'
dbinom(2, 12, 2/20)
```
:::
***
### Exercise 4.2
::: {#exr-04-2}
There are four aces in a deck of 52 cards. If you pull a card, return the card, then reshuffle and pull a card again, how many ways can you pull just one ace in five pulls?
```{r}
#| label: exercise-04-2
#| attr-source: '#lst-exercise-04-2 lst-cap="**If you pull a card, return the card, then reshuffle and pull a card again, how many ways can you pull just one ace in five pulls?**"'
combinat::combn(x = 1:5, m = 1, fun = tabulate, simplify = TRUE, nbins = 5)
choose(5,1)
```
::: {.callout-note}
In contrast to the result in the solution manual I programmed the exercise with `combinat::combn()`. This correct solution pretends that I could use it for every possible arrangement. That is not true. For instance I could not manage to display the many ways of rolling two 6s in three rolls of a six-sided die.
:::
:::
***
### Exercise 4.3
:::: {#exr-04-3}
For the example in @exr-04-2, what is the probability of pulling five aces in 10 pulls (remember the card is shuffled back in the deck when it is pulled)?
```{r}
#| label: exercise-04-3
#| attr-source: '#lst-exercise-04-3 lst-cap="**Probability of pulling five aces in 10 pulls with replacing**"'
dbinom(5, 10, 4 / 52) * 100
```
::: {.callout-warning}
Only about 0.0455%. But this is different than the result in the solution manual with 1/32000 = `r 1/32000 * 100`%. I don't know why there is this difference.
:::
::::
***
### Exercise 4.4
:::: {#exr-04-4}
When you’re searching for a new job, it’s always helpful to have more than one offer on the table so you can use it in negotiations. If you have a 1/5 probability of receiving a job offer when you interview, and you interview with seven companies in a month, what is the probability you’ll have at least two competing offers by the end of that month?
```{r}
#| label: exercise-04-4
#| attr-source: '#lst-exercise-04-4 lst-cap="**Probability of at least 2 interviews each with 1/5 chance for a job offer having 7 interviews**"'
pbinom(1, 7, 1/5, lower.tail = FALSE)
```
::: {.callout-note}
There is chance of about 42% that you receive at least two job offers.
In my first trial I calculated wrongly the probability for *exact 2* competing offers with `dbinom(2, 7, 1/2)` instead of *at least 2* competing offers! Additionally I forgot to add `lower.tail = FALSE`. to calculate *more than x*: x = 1, more than 1 = 2, therefore the first parameter is 1 (and not 2 as could be thought wrongly). Not using `lower.tail = FALSE` means that the default value of `lower.tail = TRUE` computes the probability of $P[X <= x]$ (instead of $P[X > x]$).
:::
::::
***
### Exercise 4.5
::: {#exr-04-5}
You get a bunch of recruiter emails and find out you have 25 interviews lined up in the next month. Unfortunately, you know this will leave you exhausted, and the probability of getting an offer will drop to 1/10 if you’re tired. You really don’t want to go on this many interviews unless you are at least twice as likely to get at least two competing offers. Are you more likely to get at least two offers if you go for 25 interviews, or stick to just 7?
```{r}
#| label: exercise-04-5
#| attr-source: '#lst-exercise-04-5 lst-cap="**Probability of at least 2 interviews each with 1/10 chance for a job offer having 25 interviews**"'
pbinom(1, 25, 1/10, lower.tail = FALSE)
```
With a reduced probability per interview you raised your changes from 42,3% to 72,9%. But to get an job offer is not twice as likely so you stick with 7 interviews.
:::
***
## Experiments
### Permutation with R {#sec-example-4-1}
I want to get the same result as in @eq-permutation-example, but this
time using R. What are the permutations of possible events by flipping
two heads in three coin tosses?
Let's P(heads) = 1 and P(tails) = 0, then we can use
`combinat::combn()`. The package {**combinat**} is not part of Base R,
so you have to install it.
```{r}
#| label: permutate-3-2
#| attr-source: '#lst-permutate-3-2 lst-cap="Permutations of possible events by flipping two heads in three coin tosses"'
combinat::combn(x = c(1,2,3), m = 2, fun = tabulate, simplify = TRUE, nbins = 3)
```
The syntax is: `combn(x, m, fun = NULL, simplify = TRUE, ...)`.
- `x`: vector source for combinations equivalent to the the number of
events.
- `m`: number of elements we are interested in
- `fun` = function to be applied to each combination (may be null). I
am using the `base::tabulate()` to take the integer-valued vector
and counting the number of times each integer occurs in it.
- `simplify`: logical, if FALSE, returns a list, otherwise returns
vector or array.
- `...`: arguments for the used function. In our case `nbins` refers
to the number of bin used by the `tabulate()` function.
Let's try another example to understand better the pattern of the
`combn()` functions: What are the permutations of possible events by
flipping a coin 5 times and getting three heads:
```{r}
#| label: permutate-5-3
#| attr-source: '#lst-permutate-3-2 lst-cap="Permutations of possible events by flipping two heads in five coin tosses"'
combinat::combn(x = 1:5, m = 2, fun = tabulate, simplify = TRUE, nbins = 5)
```
::: callout-tip
There is also a special package {**dice**} for the calculation of
various dice-rolling events. We could for instance compute the
probability "What is the probability of rolling two 6s in three rolls of
a six-sided die?" directly with:
```{r}
#| label: package-dice-example
#| attr-source: '#lst-package-dice-example lst-cap="Compute the probability of rolling two 6s in three rolls of a six-sided die"'
dice::getEventProb(nrolls = 3,
ndicePerRoll = 2,
nsidesPerDie = 6,
eventList = list(6, 6))
```
:::
::: callout-warning
Actually I did not understand the many implications of computing
combinations and / or permutations with different functions and
different packages:
- `utils::combn()`
- `combinat::combn()`
- `gtools::combinations()` and `gtools::permutations()`
- `permute::permute()`, `permute::shuffle()`
- `base::expand.grid()`
- `tidyr::expand()`, `tidyr::crossing()`, `tidyr::nesting()`,
`tidyr::expand_grid()`
As far as I understand from my study of the Statistical Rethinking book,
these functions are an important topic to understand Bayesian
Statistics. These types of functions are used for grid approximation and
in Bayesian statistics to extract or draw samples from fit models (e.g.,
`rethinking::extract.samples()`, `rethinking::extract.prior()`)
I am sure I will need to come back to this issue and study available
material more in detail! But at the moment I am stuck and will skip this
subject.
:::
### Compute binomial coefficient manually {#sec-example-4-2}
I want to replicate the Base R function `chosse()` with
@eq-def-bin-coeff. This involves to calculate factorials with the
`base::factorial()` function.
$$\binom{n}{k} = \frac{n!}{k! \times (n-k)!}$$
```{r}
#| label: binomial-coeff-example
#| attr-source: '#lst-binomial-coeff-example lst-cap="**Calculate the probability of flipping exactly 12 heads in 24 coin tosses**"'
my_choose <- function(n, k) {
factorial(n) / (factorial(k) * factorial(n - k))
}
choose(24, 12) == my_choose(24, 12)
```
Calculation of $\binom{24}{12}$:
- With Base R function `choose(24, 12)` = `r choose(24, 12)`.
- With my own function `my_choose(24, 12)` = `r my_choose(24, 12)`.
### Compute density of the binomial distribution {#sec-example-4-3}
To generalize I write my own function for the density of the binomial
distribution. I will use the same arguments names as in
@def-binom-dist-param.
```{r}
#| label: my-dbinom-func
#| attr-source: '#lst-my-dbinom-func lst-cap="**Function for the density of the binomial distribution**"'
my_dbinom <- function(k, n, p) {
choose(n, k) * p^k * (1 - p)^(n - k)
}
my_dbinom(12, 24, 0.5)
```
Voilá: It gives the same result as the manual calculation in
@lst-calc-example.
I wonder if there is not a Base R function which does the same as
`my_dbinom()`. I tried `stats::dbinom()` and it worked!
```{r}
#| label: dbinom-func
#| attr-source: '#lst-dbinom-func lst-cap="**Base R dbinom() function calculates the density of the binomial distribution**"'
dbinom(12, 24, 0.5)
```
Again the same result as in @lst-calc-example and @lst-my-dbinom-func!
### Replication of the Binomial Distribution of 10 Coin Flips {#sec-10-coin-flips}
Here I am going to try to replicate @fig-10-coin-flips.
```{r}
#| label: fig-repl-10-coin-flips
#| fig-cap: "Replication of Figure 4.1: Binomial Distribution of 10 Coin Flips with {ggplot2}"
#| attr-source: '#lst-repl-10-coin-flips lst-cap="**Replicate Figure 4.1: Binomial Distribution of 10 Coin Flips**"'
k_values <- seq.int(from = 0, to = 10 , by = 1)
data.frame(x = k_values,
y = dbinom(k_values, 10, 0.5)) |>
ggplot2::ggplot(ggplot2::aes(x = x, y = y)) +
ggplot2::geom_col()
```
Writing @lst-repl-10-coin-flips I had troubles applying the correct geom. At first I used `geom_bar()` but this did not work until I learned that I have to add the option "stat = identity" or to use `geom_col()`. The difference is:
- `geom_bar()` makes the height of the bar proportional to the number of cases in each group. It uses `stat_count()` by default, e.g. it counts the number of cases at each x position. If there aren't cases but only the values then one has to add "stat = identity" to declare that ggplot2 should take the data "as-is".
- `geom_col()` instead takes the heights of the bars to represent values in the data. It uses `stat_identity()` and leaves the data "as-is" by default.
::: {.callout-tip}
During my research for writing @lst-repl-10-coin-flips I learned of the {**tidydice**} package. It simulates dice rolls and coin flips and can be used for teaching basic experiments in introductory statistics courses.
With {**tidydice**} we replicate Figure 4.1 with just 2 lines using the `binom_coin()` inside the `plot_binom()` function. In addition to the graphical distribution it print also the exact values on top of the bars.
```{r}
#| label: fig-repl-10-coin-flips-2
#| fig-cap: "Replication of Figure 4.1: Binomial Distribution of 10 Coin Flips with {tidydice}"
#| attr-source: '#lst-repl-10-coin-flips-2 lst-cap="**Replicate Figure 4.1: Binomial Distribution of 10 Coin Flips {tidydice}**"'
tidydice::plot_binom(
tidydice::binom_coin(times = 10, sides = 2, success = 2),
title = "Binomial distribution of 10 coin flips"
)
```
{**tidydice**} has many other functions related to coin and dice experiments.
:::
### Replication of the Binomial Distribution of 10 Dice Rolls {#sec-example-4-4}
I will replicate @fig-10-dice-rolls with {**ggplot2**} and with {**tidydice**}:
```{r}
#| label: fig-repl-10-dice-rolls
#| fig-cap: "Replication of Figure 4.2: Binomial Distribution of 10 Dice Rolls with {ggplot2}"
#| attr-source: '#lst-repl-10-cdice-rolls lst-cap="**Replicate Figure 4.2: Binomial Distribution of 10 Dice Rolls**"'
k_values <- seq.int(from = 0, to = 10 , by = 1)
data.frame(x = k_values,
y = dbinom(k_values, 10, 1/6)) |>
ggplot2::ggplot(ggplot2::aes(x = x, y = y)) +
ggplot2::geom_col()
```
```{r}
#| label: fig-repl-10-dice-rolls-2
#| fig-cap: "Replication of Figure 4.2: Binomial Distribution of 10 Dice Rolls with {tidydice}"
#| attr-source: '#lst-repl-10-dice-rolls-2 lst-cap="**Replicate Figure 4.2: Binomial Distribution of 10 Dice Rolls {tidydice}**"'
tidydice::plot_binom(
tidydice::binom_dice(times = 10, sides = 6, success = 6),
title = "Binomial distribution of 10 dice rolls"
)
```