Skip to content

Commit

Permalink
Timescale normalisation and other fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
nspope committed May 2, 2024
1 parent 9490955 commit 47a4048
Show file tree
Hide file tree
Showing 21 changed files with 2,055 additions and 1,734 deletions.
22 changes: 7 additions & 15 deletions docs/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ each timepoint).
Continuous-time approaches approximate the posterior by a continuous
univariate distribution (e.g. a gamma distribution).

In tests, we find that the continuous-time `variational_gamma` approach is
the most accurate (but can suffer from {ref}`numerical instability<sec_usage_real_data_stability>`).
The discrete-time `inside_outside` approach is slightly less accurate, especially for older times,
but is more numerically robust, and the discrete-time `maximization` approach is
always stable but is the least accurate.
In tests, we find that the continuous-time `variational_gamma` approach is the
most accurate. The discrete-time `inside_outside` approach is slightly less
accurate, especially for older times, but is more numerically robust, and the
discrete-time `maximization` approach is always stable but is the least
accurate.

Changing the method is very simple:

Expand All @@ -43,13 +43,13 @@ import tskit
import tsdate
input_ts = tskit.load("data/basic_example.trees")
ts = tsdate.date(input_ts, method="variational_gamma", population_size=100, mutation_rate=1e-8)
ts = tsdate.date(input_ts, method="variational_gamma", mutation_rate=1e-8)
```

Alternatively each method can be called directly as a separate function:

```{code-cell} ipython3
ts = tsdate.variational_gamma(input_ts, population_size=100, mutation_rate=1e-8)
ts = tsdate.variational_gamma(input_ts, mutation_rate=1e-8)
```

Currently the default is `inside_outside`, but this may change in future releases.
Expand Down Expand Up @@ -127,13 +127,6 @@ local estimates to each gamma distribution are iteratively refined until
they converge to a stable solution. This comes under a class of approaches
sometimes known as "loopy belief propagation".

:::{todo}
Add details about [numerical instability](sec_usage_real_data_stability),
describing expected errors (e.g. about non-convergence of a hypergeometric series),
and detailing potential workarounds using the `max_shape` option to constrain the
gamma variance.
:::

:::{note}
As a result of testing, the default priors used for this method are
identical for all nodes (i.e. a "global" prior is used), based on a composite
Expand Down Expand Up @@ -166,6 +159,5 @@ ts = tsdate.date(
input_ts,
method="variational_gamma",
progress=True,
population_size=100,
mutation_rate=1e-8)
```
5 changes: 0 additions & 5 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,11 +308,6 @@ The {func}`tsdate.preprocess_ts()` function can help remove topology from these
regions. See the documentation for that function for details on how to increase
or decrease its stringency.

The [`variational_gamma`](sec_methods_continuous_time_vgamma) method is more prone to
instability, and switching to another method may help. Note, however, that this is usually
a sign that you should re-inspect the original tree sequence, which is likely to
have poorly inferred topologies.

(sec_usage_real_data_simplify)=

### Simplification and unary nodes
Expand Down
Loading

0 comments on commit 47a4048

Please sign in to comment.