Skip to content

Commit

Permalink
Merge pull request #342 from hyanwong/extra-doc
Browse files Browse the repository at this point in the history
Properly describe "loopy belief propagation"
  • Loading branch information
hyanwong authored Nov 17, 2023
2 parents e24af53 + fb5bdea commit 20b5475
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 11 deletions.
29 changes: 22 additions & 7 deletions docs/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ The available discrete-time algorithms are the `inside_outside` and `maximizatio
They have the following advantages and disadvantages:

Pros
: allows any shape for the distributions of times
: Methods allow any shape for the distributions of times
: Currently require just a single upwards and downward pass through the edges

Cons
: Choice of grid timpoints is somewhat arbitrary (but reasonable defaults picked
: Choice of grid timepoints is somewhat arbitrary (but reasonable defaults are picked
based on the conditional coalescent)
: Inferred times are imprecise due to discretization: a denser timegrid can increase
precision, but also increases computational cost (quadratic with number of timepoints)
Expand All @@ -71,10 +71,14 @@ Cons
### Inside Outside vs Maximization

The `inside_outside` approach has been shown to perform better empirically, but
suffers from the theoretical problem of "loopy belief propagation". Occasionally
it also has issues with numerical stability, although this is commonly indicative
in theory the appraoch used does not properly account for cycles in the underlying
genealogical network when updating posterior probabilities (a potential solution
would be to implement a "loopy belief propagation" algorithm as in the continuous-time
[`variational_gamma`](sec_methods_continuous_time_vgamma) method, below).
Occasionally the `inside_outside` method also
has issues with numerical stability, although this is commonly indicative
of pathological combinations of tree sequence topology and mutation patterns.
Issues like this are often caused by long regions of the genome that
Problems like this are often caused by long regions of the genome that
have no mapped mutations (e.g. in the centromere), which can be removed by
{ref}`preprocessing<sec_usage_real_data_stability>`.

Expand All @@ -95,7 +99,7 @@ Pros
with number of timepoints
: Old nodes do not suffer from time-discretisation issues caused by forcing
bounds on the oldest times
: Iterative updating theoretically solves the "loopy belief propagation" problem
: Iterative updating properly accounts for cycles in the genealogy

Cons
: Assumes posterior times can be reasonably modelled by gamma distributions
Expand All @@ -105,14 +109,24 @@ Cons
: Numerical stability issues are more common (but often indicate pathological
of otherwise problematic tree sequences)

(sec_methods_continuous_time_vgamma)=

### The variational gamma method

The `variational_gamma` method approximates times by fitting separate gamma
distributions for each node, in a similar spirit to {cite:t}`schweiger2023ultra`.
The directed graph that represents the genealogy can (in its undirected form) contain
cycles, so a technique called "expectation propagation" is used, in which
local estimates to each gamma distribution are iteratively refined until
they converge to a stable solution.
they converge to a stable solution. This comes under a class of approaches
sometimes known as "loopy belief propagation".

:::{todo}
Add details about [numerical instability](sec_usage_real_data_stability),
describing expected errors (e.g. about non-convergence of a hypergeometric series),
and detailing potential workarounds using the `max_shape` option to constrain the
gamma variance.
:::

:::{note}
As a result of testing, the default priors used for this method are
Expand All @@ -121,6 +135,7 @@ of all the conditional coalescent priors for all nodes.
See {ref}`sec_priors_conditional_coalescent` for details.
:::


#### Expectation propagation

We are in the process of writing a formal description of the algorithm, but in
Expand Down
2 changes: 1 addition & 1 deletion docs/priors.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ kernelspec:

## Basic usage

The {func}`build_prior_grid` and `build_parameter_grid` functions allow you to create a bespoke prior
The {func}`build_prior_grid` and {func}`build_parameter_grid` functions allow you to create a bespoke prior
for the {ref}`sec_methods_discrete_time` and {ref}`sec_methods_continuous_time`, respectively.
This can be passed in to {func}`date` using the `priors` argument. It provides
a tuneable alternative to passing the {ref}`population size<sec_usage_popsize>`
Expand Down
1 change: 1 addition & 0 deletions docs/python-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ This page provides formal documentation for the `tsdate` Python API.

```{eval-rst}
.. autofunction:: tsdate.build_prior_grid
.. autofunction:: tsdate.build_parameter_grid
.. autoclass:: tsdate.base.NodeGridValues
```
Expand Down
6 changes: 3 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,9 +297,9 @@ The {func}`tsdate.preprocess_ts()` function can help remove topology from these
regions. See the documentation for that function for details on how to increase
or decrease its stringency.

The `variational_gamma` method is more prone to instability, and switching to
another method may help. Note, however, that this is usually a sign that
you should re-inspect the original tree sequence, which is likely to
The [`variational_gamma`](sec_methods_continuous_time_vgamma) method is more prone to
instability, and switching to another method may help. Note, however, that this is usually
a sign that you should re-inspect the original tree sequence, which is likely to
have poorly inferred topologies.

(sec_usage_real_data_simplify)=
Expand Down

0 comments on commit 20b5475

Please sign in to comment.