diff --git a/docs/methods.md b/docs/methods.md index e225849a..ce2d3158 100644 --- a/docs/methods.md +++ b/docs/methods.md @@ -57,11 +57,11 @@ The available discrete-time algorithms are the `inside_outside` and `maximizatio They have the following advantages and disadvantages: Pros -: allows any shape for the distributions of times +: Methods allow any shape for the distributions of times : Currently require just a single upwards and downward pass through the edges Cons -: Choice of grid timpoints is somewhat arbitrary (but reasonable defaults picked +: Choice of grid timepoints is somewhat arbitrary (but reasonable defaults are picked based on the conditional coalescent) : Inferred times are imprecise due to discretization: a denser timegrid can increase precision, but also increases computational cost (quadratic with number of timepoints) @@ -71,10 +71,14 @@ Cons ### Inside Outside vs Maximization The `inside_outside` approach has been shown to perform better empirically, but -suffers from the theoretical problem of "loopy belief propagation". Occasionally -it also has issues with numerical stability, although this is commonly indicative +in theory the appraoch used does not properly account for cycles in the underlying +genealogical network when updating posterior probabilities (a potential solution +would be to implement a "loopy belief propagation" algorithm as in the continuous-time +[`variational_gamma`](sec_methods_continuous_time_vgamma) method, below). +Occasionally the `inside_outside` method also +has issues with numerical stability, although this is commonly indicative of pathological combinations of tree sequence topology and mutation patterns. -Issues like this are often caused by long regions of the genome that +Problems like this are often caused by long regions of the genome that have no mapped mutations (e.g. in the centromere), which can be removed by {ref}`preprocessing`. @@ -95,7 +99,7 @@ Pros with number of timepoints : Old nodes do not suffer from time-discretisation issues caused by forcing bounds on the oldest times -: Iterative updating theoretically solves the "loopy belief propagation" problem +: Iterative updating properly accounts for cycles in the genealogy Cons : Assumes posterior times can be reasonably modelled by gamma distributions @@ -105,6 +109,8 @@ Cons : Numerical stability issues are more common (but often indicate pathological of otherwise problematic tree sequences) +(sec_methods_continuous_time_vgamma)= + ### The variational gamma method The `variational_gamma` method approximates times by fitting separate gamma @@ -112,7 +118,15 @@ distributions for each node, in a similar spirit to {cite:t}`schweiger2023ultra` The directed graph that represents the genealogy can (in its undirected form) contain cycles, so a technique called "expectation propagation" is used, in which local estimates to each gamma distribution are iteratively refined until -they converge to a stable solution. +they converge to a stable solution. This comes under a class of approaches +sometimes known as "loopy belief propagation". + +:::{todo} +Add details about [numerical instability](sec_usage_real_data_stability), +describing expected errors (e.g. about non-convergence of a hypergeometric series), +and detailing potential workarounds using the `max_shape` option to constrain the +gamma variance. +::: :::{note} As a result of testing, the default priors used for this method are @@ -121,6 +135,7 @@ of all the conditional coalescent priors for all nodes. See {ref}`sec_priors_conditional_coalescent` for details. ::: + #### Expectation propagation We are in the process of writing a formal description of the algorithm, but in diff --git a/docs/priors.md b/docs/priors.md index 9c20d11b..1f68f2f8 100644 --- a/docs/priors.md +++ b/docs/priors.md @@ -20,7 +20,7 @@ kernelspec: ## Basic usage -The {func}`build_prior_grid` and `build_parameter_grid` functions allow you to create a bespoke prior +The {func}`build_prior_grid` and {func}`build_parameter_grid` functions allow you to create a bespoke prior for the {ref}`sec_methods_discrete_time` and {ref}`sec_methods_continuous_time`, respectively. This can be passed in to {func}`date` using the `priors` argument. It provides a tuneable alternative to passing the {ref}`population size` diff --git a/docs/python-api.md b/docs/python-api.md index ae170ef8..61c74b61 100644 --- a/docs/python-api.md +++ b/docs/python-api.md @@ -31,6 +31,7 @@ This page provides formal documentation for the `tsdate` Python API. ```{eval-rst} .. autofunction:: tsdate.build_prior_grid +.. autofunction:: tsdate.build_parameter_grid .. autoclass:: tsdate.base.NodeGridValues ``` diff --git a/docs/usage.md b/docs/usage.md index 6e7db23f..95e90303 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -297,9 +297,9 @@ The {func}`tsdate.preprocess_ts()` function can help remove topology from these regions. See the documentation for that function for details on how to increase or decrease its stringency. -The `variational_gamma` method is more prone to instability, and switching to -another method may help. Note, however, that this is usually a sign that -you should re-inspect the original tree sequence, which is likely to +The [`variational_gamma`](sec_methods_continuous_time_vgamma) method is more prone to +instability, and switching to another method may help. Note, however, that this is usually +a sign that you should re-inspect the original tree sequence, which is likely to have poorly inferred topologies. (sec_usage_real_data_simplify)=