diff --git a/inst/WORDLIST b/inst/WORDLIST index 240456a0..9c83082f 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -64,7 +64,6 @@ behaviour bimodality burrIII cdf -cdf's cdfs cdot checkr @@ -92,7 +91,6 @@ et fitburrlioz fitdistrplus fitdists -fit’ forall frac funder @@ -166,7 +164,6 @@ se selectable shinyssdtools simeq -solution’ specfying ssd ssddata @@ -185,4 +182,3 @@ weibull widehat wqg xb -’ diff --git a/vignettes/distributions.Rmd b/vignettes/distributions.Rmd index f4c51bce..eb0b8d24 100644 --- a/vignettes/distributions.Rmd +++ b/vignettes/distributions.Rmd @@ -148,7 +148,7 @@ The TMB version of `ssdtools` now includes the option of fitting two mixture dis These can be fitted using `ssdtools` by supplying the strings "llogis_llogis" and/or "lnorm_lnorm" to the *dists* argument in the *ssd_fit_dists* call. The underlying code for these mixtures has three components: the likelihood function required for TMB; exported R functions to allow the usual methods for a distribution to be called (p, q and r); and a set of supporting R functions (see @fox_methodologies_2022 Appendix D for more details). -Both mixtures have five parameters - two parameters for each of the component distributions and a mixing parameter (pmix) that defines the weighting of the two distributions in the ‘mixture.’ +Both mixtures have five parameters - two parameters for each of the component distributions and a mixing parameter (pmix) that defines the weighting of the two distributions in the 'mixture.' ```{r echo=FALSE,fig.align='center',fig.width=9,fig.height=5, fig.cap="Sample lognormal lognormal mixture probability density (A) and cumulative probability (B) functions.", fig.alt="A two panel plot showing several realisations of the lognormal lognormal mixture probability density function on the left panel and the cumulative density function of distribution on the right panel."} par(mfrow = c(1, 2)) @@ -203,7 +203,7 @@ While there is a variety of distributions available in `ssdtools`, the inclusion By default, `ssdtools` uses the (corrected) Akaike Information Criterion for small sample size (AICc) as a measure of relative quality of fit for different distributions and as the basis for calculating the model-averaged weights. However, the choice of distributions used to fit a model-averaged SSD can have a profound effect on the estimated *HCx* values. -Deciding on a final default set of distributions to adopt using the model averaging approach is non-trivial, and we acknowledge that there is probably no definitive ‘solution’ to this issue. +Deciding on a final default set of distributions to adopt using the model averaging approach is non-trivial, and we acknowledge that there is probably no definitive 'solution' to this issue. However, the default set should be underpinned by a guiding principle of parsimony, i.e., the set should be as large as is necessary to cover a wide variety of distributional shapes and contingencies but no bigger. Further, the default set should result in model-averaged estimates of *HCx* values that: 1) minimise bias; 2) have actual coverages of confidence intervals that are close to the nominal level of confidence; 3) estimated *HCx* and confidence intervals of *HCx* are robust to small changes in the data; and 4) represent a positively continuous distribution that has both right and left tails. diff --git a/vignettes/model-averaging.Rmd b/vignettes/model-averaging.Rmd index ccd61d86..9088be06 100644 --- a/vignettes/model-averaging.Rmd +++ b/vignettes/model-averaging.Rmd @@ -71,7 +71,7 @@ require(ggplot2) > Many authors have noted that there is no guiding theory in ecotoxicology to justify any particular distributional form for the SSD other than that its domain be restricted to the positive real line [@newman_2000], [@Zajdlik_2005], [@chapman_2007], [@fox_2016]. Indeed, [@chapman_2007] described the identification of a suitable probability model as one of the most important and difficult choices in the use of SSDs. Compounding this lack of clarity about the functional form of the SSD is the omnipresent, and equally vexatious issue of small sample size, meaning that any plausible candidate model is unlikely to be rejected [@fox_recent_2021]. -The ssdtools R package uses a model averaging procedure to avoid the need to a-priori select a candidate distribution and instead uses a measure of ‘fit’ for each model to compute weights to be applied to an initial set of candidate distributions. +The ssdtools R package uses a model averaging procedure to avoid the need to a-priori select a candidate distribution and instead uses a measure of 'fit' for each model to compute weights to be applied to an initial set of candidate distributions. The method, as applied in the SSD context is described in detail in [@fox_recent_2021], and potentially provides a level of flexibility and parsimony that is difficult to achieve with a single SSD distribution. [@fox_methodologies_2022] @@ -201,7 +201,7 @@ The reason for this can be explained mathematically as follows (*if your not int The correct expression for a model-averaged SSD is: $$G\left( x \right) = \sum\limits_{i = 1}^k {{w_i}} {F_i}\left( x \right)$$ where ${F_i}\left( \cdot \right)$ is the *i^th^* component SSD (i.e. *cdf*) and *w~i~* is the weight assigned to ${F_i}\left( \cdot \right)$.
Notice that the function $G\left( x \right)$ is a proper *cumulative distribution function* (*cdf*) which means for a given quantile, *x*, $G\left( x \right)$ returns the *cumulative probability*: $$P\left[ {X \leqslant x} \right]$$ -
Now, the *incorrect* approach takes a weighted sum of the component *inverse cdf's*, that is: +
Now, the *incorrect* approach takes a weighted sum of the component *inverse cdfs*, that is: $$H\left( p \right) = \sum\limits_{i = 1}^k {{w_i}} {F_i}^{ - 1}\left( p \right)$$ where ${F_i}^{ - 1}\left( \cdot \right)$ is the *i^th^* *inverse cdf*.