-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
471 lines (352 loc) · 23.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,comment = "#>",collapse = TRUE, fig.retina=2, fig.path = "man/figures/",
out.width = "100%")
library(badger)
```
# fitODBOD <img src="man/figures/logo.png" align="right" alt="" width="150" />
<!-- badges: start -->
`r badge_cran_release("fitODBOD")`
`r badge_cran_checks("fitODBOD")``
`r badge_runiverse()`
`r badge_cran_download("fitODBOD", "grand-total", "green")`
`r badge_cran_download("fitODBOD", "last-month", "green")`
`r badge_cran_download("fitODBOD", "last-week", "green")`
`r badge_repostatus("Active")`
`r badge_lifecycle("stable")`
[](https://github.com/Amalan-ConStat/fitODBOD/issues)
[](https://app.codecov.io/gh/Amalan-ConStat/fitODBOD?branch=main)
`r badge_codefactor("Amalan-ConStat/fitODBOD")`
`r badge_code_size("Amalan-ConStat/fitODBOD")`
[](https://lbesson.mit-license.org/)
[](http://joss.theoj.org/papers/388fc2f4d7c1e0ae83cf0de13ac038a4)
[](https://doi.org/10.5281/zenodo.3265356)
<!-- badges: end -->
## How to engage with "fitODBOD" the first time ?
```{r fitODBOD from GitHub or CRAN,eval=FALSE}
## Installing the package from GitHub
devtools::install_github("Amalan-ConStat/fitODBOD")
## Installing the package from CRAN
install.packages("fitODBOD")
```
The previous version of "fitODBOD", version 1.4.1 is available in the github repository as [R-fitODBOD](https://github.com/Amalan-ConStat/R-fitODBOD).
## Key Phrases
* BOD (Binomial Outcome Data)
* Over Dispersion
* Under Dispersion
* FBMD (Family of Binomial Mixture Distributions)
* ABD (Alternate Binomial Distributions)
* PMF (Probability Mass Function)
* CPMF (Cumulative Probability Mass Function)
## What does "fitODBOD" ?
You can understand BMD & ABD with PMF & CPMF. Further, BOD can be modeled using these Distributions
## Distributions
| Alternate Binomial Distributions | Binomial Mixture Distributions |
|:--------------------------------|:-------------------------------|
|1.Additive Binomial Distribution|1.Uniform Binomial Distribution |
|2.Beta-Correlated Binomial Distribution|2.Triangular Binomial Distribution|
|3.COM Poisson Binomial Distribution|3.Beta-Binomial Distribution|
|4.Correlated Binomial Distribution|4.Kumaraswamy Binomial Distribution|
|5.Multiplicative Binomial Distribution|5.Gaussian Hypergeometric Generalized Beta-Binomial Distribution|
|6.Lovinson Multiplicative Binomial Distribution|6.McDonald Generalized Beta-Binomial Distribution|
||7.Gamma Binomial Distribution|
||8.Grassia II Binomial Distribution|
## Modelling
To demonstrate the process the Alcohol Consumption Data, which is the most commonly used data-set by the researchers to explain Over-dispersion will be taken {lemmens1988}.
In this data-set, the number of alcohol consumption days in two reference weeks is separately self-reported by a randomly selected sample of $399$ respondents from the Netherlands in $1983$.
Here, the number of days a given individual consumes alcohol out of seven days a week can be treated as a Binomial variable.
The collection of all such variables from all respondents would be defined as "Binomial Outcome Data".
## fitODBODRshiny
An Rshiny application and package named [*fitODBODRshiny*](https://amalan-con-stat.shinyapps.io/fitODBODRshiny/) is available to fit a few selected Binomial Outcome data through Binomial Mixture and Alternate Binomial distributions.
### Step 1
The Alcohol consumption data is already in the necessary format to apply steps $2$ to $5$ and hence, step $1$ can be avoided. The steps $2$ to $5$ can be applied only if the data-set is in the form of a frequency table as follows.
```{r Step_1_0}
library("fitODBOD"); library("flextable",quietly = TRUE) ## Loading packages
print(Alcohol_data) ## print the alcohol consumption data set
sum(Alcohol_data$week1) ## No of respondents or N
Alcohol_data$Days ## Binomial random variables or x
```
Suppose your data-set is not a frequency table as shown in the following data-set called `datapoints`. Then the function `BODextract` can be used to prepare the appropriate format as follows.
```{r Step_1_1}
datapoints <- sample(0:7, 340, replace = TRUE) ## creating a set of raw BOD
head(datapoints) ## first few observations of datapoints dataset
## extracting and printing BOD in a usable way for the package
new_data <- BODextract(datapoints)
matrix(c(new_data$RV, new_data$Freq), ncol=2, byrow = FALSE)
```
### Step 2
As in the second step we test whether the Alcohol Consumption data follows the Binomial distribution based on the hypothesis given below:
Null Hypothesis : The data follows Binomial Distribution.
Alternate Hypothesis : The data does not follow Binomial Distribution.
Alcohol Consumption data consists of frequency information for two weeks but only the first week is considered for computation.
By doing so the researcher can verify if the results acquired from the functions are similar to the results acquired from previous researchers work.
```{r Step_2_1}
BinFreq <- fitBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1)
print(BinFreq)
```
Looking at the p-value it is clear that null hypothesis is rejected at $5%$ significance level.
This indicates that data does not fit the Binomial distribution.
The reason for a warning message is that one of the expected frequencies in the results is less than five.
Now we compare the actual and the fitting Binomial variances.
```{r Step_2_2}
## Actual variance of observed frequencies
var(rep(Alcohol_data$Days, times = Alcohol_data$week1))
## Calculated variance for frequencies of fitted Binomial distribution
var(rep(BinFreq$bin.ran.var, times = fitted(BinFreq)))
```
The variance of observed frequencies and the variance of fitting frequencies are `r var(rep(Alcohol_data$Days, times = Alcohol_data$week1))` and `r var(rep(BinFreq$bin.ran.var, times = fitted(BinFreq)))` respectively, which indicates Over-dispersion.
### Step 3 and 4
Since the Over-dispersion exists in the data now it is necessary to fit the Binomial Mixture distributions Triangular Binomial, Beta-Binomial, Kumaraswamy Binomial (omitted because this is time consuming), Gamma Binomial, Grassia II Binomial, GHGBB and McGBB using the package, and select the best-fitting distribution using Negative Log likelihood value, p-value and by comparing observed and expected frequencies. Modelling these distributions are given in the next sub-sections.
#### a) Triangular Binomial distribution.
Maximizing the log likelihood value or in our case minimizing the negative log likelihood is used in the `EstMLExxx` functions.
The estimation of the `mode` parameter can be done by using the `EstMLETriBin` function, and then the estimated value has to be applied to `fitTriBin` function to check whether the data fit the Triangular Binomial distribution.
```{r Step_3_a}
## estimating the mode
modeTB <- EstMLETriBin(x=Alcohol_data$Days,freq=Alcohol_data$week1)
coef(modeTB) ## printing the estimated mode
## printing the Negative log likelihood value which is minimized
NegLLTriBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,mode=modeTB$mode)
```
To fit the Triangular Binomial distribution for estimated mode parameter the following hypothesis is used
Null Hypothesis : The data follows Triangular Binomial Distribution.
Alternate Hypothesis : The data does not follow Triangular Binomial Distribution.
```{r Step_4_a}
## fitting the Triangular Binomial Distribution for the estimated mode value
fTB <- fitTriBin(x=Alcohol_data$Days,obs.freq=Alcohol_data$week1,mode=modeTB$mode)
print(fTB)
AIC(fTB)
var(rep(fTB$bin.ran.var, times = fitted(fTB)))
```
Since the `p-value` is `r fTB$p.value` which is less than $0.05$ it is clear that the null hypothesis is rejected, and the estimated `Over-dispersion` is `r Overdispersion(fTB)`.
Therefore, it is necessary to fit a better flexible distribution than the Triangular Binomial distribution.
#### b) Beta-Binomial distribution.
To estimate the two shape parameters of the Beta-Binomial distribution Methods of Moments or Maximum Likelihood estimation can be used.
Using the function `EstMLEBetaBin`(wrapper function of `mle2` from package `bbmle`) the Negative Log likelihood value will be minimized.
In order to estimate the shape parameters `a` and `b`, initial shape parameter values have to be given by the user to this function.
These initial values have to be in the domain of the shape parameters.
Below given is the pair of estimates for initial values where `a=0.1` and `b=0.1`.
```{r Step_3_b}
## estimating the shape parameters a, b
estimate <- EstMLEBetaBin(x=Alcohol_data$Days,freq = Alcohol_data$week1,a=0.1,b=0.1)
estimate@min ## extracting the minimized Negative log likelihood value
## extracting the estimated shape parameter a, b
a1 <- bbmle::coef(estimate)[1] ; b1 <- bbmle::coef(estimate)[2]
print(c(a1,b1)) ## printing the estimated shape parameters
```
To fit the Beta-Binomial distribution for estimated (Maximum Likelihood Estimation method) shape parameters the following hypothesis is used
Null Hypothesis : The data follows Beta-Binomial Distribution by the Maximum Likelihood Estimates.
Alternate Hypothesis: The data does not follow Beta-Binomial Distribution by the Maximum Likelihood Estimates.
```{r Step_4_b}
## fitting Beta Binomial Distribution for estimated shape parameters
fBB1 <- fitBetaBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=a1,b=b1)
print(fBB1)
AIC(fBB1)
var(rep(fBB1$bin.ran.var, times = fitted(fBB1)))
```
The `p-value` of `r fBB1$p.value` $> 0.05$ indicates that the null hypothesis is not rejected.
Current estimated shape parameters fit the Beta-Binomial distribution.
Note that the estimated `Over-dispersion` parameter is `r Overdispersion(fBB1)`.
Function `EstMGFBetaBin` is used as below to estimate shape parameters `a` and `b` using Methods of Moments.
```{r Step_3_aa}
## estimating the shape parameter a, b
estimate <- EstMGFBetaBin(Alcohol_data$Days, Alcohol_data$week1)
print(c(estimate$a, estimate$b)) ## printing the estimated parameters a, b
## finding the minimized negative log likelihood value
NegLLBetaBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=estimate$a,b=estimate$b)
```
To fit the Beta-Binomial distribution for estimated (Method of Moments) shape parameters the following hypothesis is used
Null Hypothesis : The data follows Beta-Binomial Distribution by the Method of Moments.
Alternate Hypothesis: The data does not follow Beta-Binomial Distribution by the Method of Moments.
```{r Step_3_bb}
## fitting Beta-Binomial Distribution to estimated shape parameters
fBB2 <- fitBetaBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=estimate$a,b=estimate$b)
print(fBB2)
AIC(fBB2)
var(rep(fBB2$bin.ran.var, times = fitted(fBB2)))
```
Results from Method of Moments to estimate the parameters have led to a `p-value` of `r fBB2$p.value` which is greater than $0.05$ indicates that the null hypothesis is not rejected.
The parameters estimated through Method of Moments fit the Beta-Binomial distribution for an estimated `Over-dispersion` of `r Overdispersion(fBB2)`.
#### c) Gamma Binomial distribution.
The shape parameters `c` and `l` are estimated and fitted below. Suppose the selected input parameters are `c=10.1` and `l=5.1`.
```{r Step_3_c}
## estimating the shape parameters
estimate <- EstMLEGammaBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,c=10.1,l=5.1)
estimate@min ## extracting the minimized negative log likelihood value
## extracting the shape parameter c and l
c1 <- bbmle::coef(estimate)[1] ; l1 <- bbmle::coef(estimate)[2]
print(c(c1, l1)) ## print shape parameters
```
To fit the Gamma Binomial distribution for estimated shape parameters the following hypothesis is used
Null Hypothesis : The data follows Gamma Binomial Distribution.
Alternate Hypothesis : The data does not follow Gamma Binomial Distribution.
```{r Step_4_c}
## fitting Gamma Binomial Distribution to estimated shape parameters
fGB <- fitGammaBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,c=c1,l=l1)
print(fGB)
AIC(fGB)
var(rep(fGB$bin.ran.var, times = fitted(fGB)))
```
The null hypothesis is not rejected at $5%$ significance level (`p-value=` `r fGB$p.value`) for the estimated parameters $c=$ `r fGB$c`, `r fGB$l` and the estimated `Over-dispersion` of `r Overdispersion(fGB)`.
#### d) Grassia II Binomial distribution.
The shape parameters `a` and `b` are estimated and fitted below using the `EstMLEGammaBin` function.
Suppose the selected input parameters are `a=1.1` and `b=5.1`.
```{r Step_3_d}
## estimating the shape parameters
estimate <- EstMLEGrassiaIIBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=1.1,b=5.1)
estimate@min ## extracting the minimized negative log likelihood value
# extracting the shape parameter a and b
a1 <- bbmle::coef(estimate)[1] ; b1 <- bbmle::coef(estimate)[2]
print(c(a1, b1)) #print shape parameters
```
To fit the Grassia II Binomial distribution for estimated shape parameters the following hypothesis is used
Null Hypothesis : The data follows Grassia II Binomial Distribution.
Alternate Hypothesis : The data does not follow Grassia II Binomial Distribution.
```{r Step_4_d}
#fitting Grassia II Binomial Distribution to estimated shape parameters
fGB2 <- fitGrassiaIIBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=a1,b=b1)
print(fGB2)
AIC(fGB2)
var(rep(fGB2$bin.ran.var, times = fitted(fGB2)))
```
The null hypothesis is not rejected at $5%$ significance level (`p-value=` `r fGB2$p.value`) for the estimated parameters $a=$ `r fGB2$a`, $b=$ `r fGB2$b` and the estimated `Over-dispersion` `r Overdispersion(fGB2)`.
#### e) GHGBB distribution.
Now we estimate the shape parameters and fit the GHGBB distribution for the first set of randomly selected initial input shape parameters of `a=10.1`, `b=1.1` and `c=5`.
```{r Step_3_e}
#estimating the shape parameters
estimate <- EstMLEGHGBB(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=10.1,b=1.1,c=5)
estimate@min #extracting the minimized negative log likelihood value
#extracting the shape parameter a, b and c
a1 <- bbmle::coef(estimate)[1] ; b1 <- bbmle::coef(estimate)[2] ; c1 <- bbmle::coef(estimate)[3]
print(c(a1, b1, c1)) #printing the shape parameters
```
To fit the GHGBB distribution for estimated shape parameters the following hypothesis is used.
Null Hypothesis : The data follows Gaussian Hypergeometric Generalized Beta-Binomial Distribution.
Alternate Hypothesis : The data does not follow Gaussian Hypergeometric Generalized Beta-Binomial Distribution.
```{r Step_4_e}
#fitting GHGBB distribution for estimated shape parameters
fGG <- fitGHGBB(Alcohol_data$Days, Alcohol_data$week1, a1, b1, c1)
print(fGG)
AIC(fGG)
var(rep(fGG$bin.ran.var, times = fitted(fGG)))
```
The null hypothesis is not rejected at $5%$ significance level (`p-value=` `r fGG$p.value`).
The estimated shape parameters are $a=$ `r fGG$a`, $b=$ `r fGG$b` and $c=$ `r fGG$c`, where the estimated `Over-dispersion` of `r Overdispersion(fGG)`.
#### f) McGBB distribution.
Given below is the results generated for the randomly selected initial input parameters where `a=1.1`, `b=5` and `c=10`.
```{r Step_3_f}
#estimating the shape parameters
estimate <- EstMLEMcGBB(x = Alcohol_data$Days,freq = Alcohol_data$week1,a = 1.1, b = 5, c = 10)
estimate@min #extracting the negative log likelihood value which is minimized
#extracting the shape parameter a, b and c
a1 <- bbmle::coef(estimate)[1] ; b1 <- bbmle::coef(estimate)[2] ; c1 <- bbmle::coef(estimate)[3]
print(c(a1, b1, c1)) #printing the shape parameters
```
To fit the McGBB distribution for estimated shape parameters the following hypothesis is used
Null Hypothesis : The data follows McDonald Generalized Beta-Binomial Distribution.
Alternate Hypothesis : The data does not follow McDonald Generalized Beta-Binomial Distribution.
```{r Step_4_f}
#fitting the MCGBB distribution for estimated shape parameters
fMB <- fitMcGBB(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=a1,b=b1,c=c1)
print(fMB)
AIC(fMB)
var(rep(fMB$bin.ran.var, times = fitted(fMB)))
```
The null hypothesis is not rejected at $5%$ significance level (`p-value=` `r fMB$p.value` $> 0.05$).
The estimated shape parameters are $a=$ `r fMB$a`, $b=$ `r fMB$b` and $c=$ `r fMB$c`, and the estimated `Over-dispersion` of `r Overdispersion(fMB)`.
### Step 5
Below table presents the expected frequencies, p-values, Negative Log Likelihood values, AIC values, Variance and Over-dispersion of the Binomial Mixture distributions obtained above for the Alcohol Consumption data.
```{r Table,warning=FALSE,fig.width=16,fig.height=9}
BMD_Data <- tibble::tibble(w=BinFreq$bin.ran.var,x=BinFreq$obs.freq,y=fitted(BinFreq),z=fitted(fTB),
a=fitted(fBB1),a1=fitted(fBB2),c=fitted(fGG),
d=fitted(fMB),e=fitted(fGB),f=fitted(fGB2))
names(BMD_Data) <- c("Bin_RV","Actual_Freq","EstFreq_BinD","EstFreq_TriBinD","EstFreq_BetaBinD(MLE)","EstFreq_BetaBinD(MGF)",
"EstFreq_GHGBBD","EstFreq_McGBBD","EstFreq_GammaBinD","EstFreq_GrassiaIIBinD")
BMD_Total<-colSums(BMD_Data[,-1])
BMD_Variance<-c(var(rep(BinFreq$bin.ran.var,times=BinFreq$obs.freq)),
var(rep(BinFreq$bin.ran.var,times=BinFreq$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fTB$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fBB1$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fBB2$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fGG$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fMB$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fGB$exp.freq)),
var(rep(BinFreq$bin.ran.var,times=fGB2$exp.freq))
)
BMD_Variance<-round(BMD_Variance,4)
BMD_p_value<-c(BinFreq$p.value,fTB$p.value,fBB1$p.value,fBB2$p.value,
fGG$p.value,fMB$p.value,fGB$p.value,fGB2$p.value)
BMD_NegLL<-c(fTB$NegLL,fBB1$NegLL,fBB2$NegLL,fGG$NegLL,
fMB$NegLL,fGB$NegLL,fGB2$NegLL)
BMD_NegLL<-round(BMD_NegLL,4)
BMD_AIC<-c(AIC(fTB),AIC(fBB1),AIC(fBB2),AIC(fGG),
AIC(fMB),AIC(fGB),AIC(fGB2))
BMD_AIC<-round(BMD_AIC,4)
C_of_diff_Values<-c(length(BMD_Data$Actual_Freq),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_BinD) <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_TriBinD) <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$'EstFreq_BetaBinD(MLE)') <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$'EstFreq_BetaBinD(MGF)') <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_GHGBBD) <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_McGBBD) <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_GammaBinD) <= 5),
sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_GrassiaIIBinD) <= 5))
Overdispersion_BMD<-c(Overdispersion(fTB),Overdispersion(fBB1),Overdispersion(fBB2),
Overdispersion(fGG),Overdispersion(fMB),
Overdispersion(fGB),Overdispersion(fGB2))
Variance_difference<-c(abs(BMD_Variance[1]-BMD_Variance))
Variance_difference<-round(Variance_difference,4)
rbind(BMD_Data,
c("Total",BMD_Total),
c("Variance",BMD_Variance),
c("p-value","-",BMD_p_value),
c("Negative Log Likelihood Value","-","-",BMD_NegLL),
c("AIC","-","-",BMD_AIC),
c("Count of difference Values",C_of_diff_Values),
c("Variance difference",Variance_difference))->BMD_flexed_Data
flextable(data=BMD_flexed_Data,
col_keys = c("Bin_RV","Actual_Freq","EstFreq_BinD",
"EstFreq_TriBinD","EstFreq_BetaBinD(MLE)",
"EstFreq_BetaBinD(MGF)","EstFreq_GHGBBD",
"EstFreq_McGBBD","EstFreq_GammaBinD","EstFreq_GrassiaIIBinD")) |>
theme_box() |> autofit()|>
fontsize(i=c(1:15),j=c(1:10),size = 15,part = "body") |>
fontsize(i=1,j=c(1:10),size = 16,part = "header") |>
bold(i=1,part = "header") |>
bold(i=c(9:15),j=1,part = "body") |>
align(i=c(1:15),j=c(1:10),align = "center") |>
set_header_labels(values = c(Bin_RV="Binomial Random Variable",
Actual_Freq="Observed Frequencies",
EstFreq_BinD="Binomial Distribution",
EstFreq_TriBinD="Triangular Binomial Distribution",
'EstFreq_BetaBinD(MLE)'="Beta Binomial Distribution(MLE)",
'EstFreq_BetaBinD(MGF)'="Beta Binomial Distribution(MGF)",
EstFreq_GHGBBD="Gaussian Hypergeometric Generalized Beta Binomial Distribution",
EstFreq_McGBBD="McDonald Generalized Beta Binomial Distribution",
EstFreq_GammaBinD="Gamma Binomial Distribution",
EstFreq_GrassiaIIBinD="Grassia II Binomial Distribution")) |>
align(i=1,part = "header",align = "center") |>
gen_grob(scaling = "fixed", fit = "width", just = "center")->Final_plot
plot(Final_plot)
```
## Conclusion
The best-fitting distribution is chosen by comparing five main measurements shown in the above table which are p-value, Negative Log Likelihood value, the count of difference between expected and observed frequencies in the range of $+/-5$, absolute variance difference and AIC values.
Then the following five criteria will be considered for the selection procedure
1. The `p-value` $> 0.05$ from the hypothesis test.
2. The Negative Log Likelihood value.
3. The AIC value.
4. The number of difference values within the range of $+/-5$.
5. The absolute variance difference between expected and observed frequency.
Triangular Binomial and Binomial distributions cannot be fitted since its p-value $< 0.05$.
The Negative Log Likelihood values of GHGBB and McGBB distributions are the lowest and are quite similar.
Similarly AIC values are lowest for GHGBB and McGBB, also highest AIC value is for Triangular Binomial distribution.
Based on the count of difference values for the Beta-Binomial distribution it is four out of eight and similar for distributions Gamma Binomial and Grassia II Binomial.
But for the McGBB distribution it is seven out of eight counts.
Further, Over-dispersion parameters of all three fitted distributions are same for the second decimal point (Over-dispersion = $0.43$) except Triangular Binomial and Grassia II Binomial distributions where they are similar for the first decimal point (Over-dispersion = $0.2$).
Clearly variance difference is highest for Binomial distribution and lowest for GHGBB distribution, while others are significant only from the second decimal point.
The best-fitting distribution GHGBB has the highest p-value of $0.8642$, the lowest Negative Log Likelihood value of $809.2767$ and AIC value of $1624.5534$, the count of difference values is eight out of eight and indicates an estimated Over-dispersion of $0.4324875$. The variance difference between observed and expected frequencies of GHGBB leads to the smallest value of $0.0045$.
#### Thank You
[](https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2FAmalan-ConStat%2FfitODBOD)
[  ]( https://www.linkedin.com/in/amalan-mahendran-72b86b37/)
[  ]( https://www.researchgate.net/profile/Amalan_Mahendran )