Skip to content

Commit

Permalink
Merge pull request #385 from hyanwong/doc-vg-params
Browse files Browse the repository at this point in the history
Document (and undocument) standard params & tidy calling conventions
  • Loading branch information
hyanwong authored Jun 4, 2024
2 parents 446f4ff + a33fd2b commit 6eacd25
Show file tree
Hide file tree
Showing 10 changed files with 132 additions and 67 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@
- Json metadata for mean time and variance in the mutation and node tables is now saved
with a suitable schema. This means `json.loads()` is no longer needed to read it.

- The `mutation_rate` and `population_size` parameters are now keyword-only, and
therefore these parameter names need to be explicitly typed out.

## [0.1.6] - 2024-01-07

**Breaking changes**
Expand Down
2 changes: 1 addition & 1 deletion docs/historical_samples.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ data which includes *historical samples*,
whose time is older that the current generation (i.e. sample nodes with
times > 0).

The output of [`tsinfer`](tsinfer:sec_introduction) is valid regardless
The output of [`tsinfer`](https://tskit.dev/tsinfer/) is valid regardless
of the inclusion of historical samples, but *dating* such a tree sequence
is more complicated. This is because the time scale of a tsinferred
tree sequence is uncalibrated, so it is unclear where in time to
Expand Down
13 changes: 9 additions & 4 deletions docs/priors.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,14 @@ prior1 = tsdate.build_prior_grid(ts, population_size=N)
prior2 = tsdate.build_prior_grid(ts, population_size=N, timepoints=40)
prior3 = tsdate.build_prior_grid(ts, population_size=N, prior_distribution="gamma")
ts1 = tsdate.date(ts, mu, priors=prior1) # Identical to tsdate.date(ts, mu, population_size=N)
ts2 = tsdate.date(ts, mu, priors=prior2) # Uses a denser timegrid
ts3 = tsdate.date(ts, mu, priors=prior3) # Approximates the discrete-time priors with a gamma
# Equiv to tsdate.date(ts, mutation_rate=mu, population_size=N)
ts1 = tsdate.date(ts, mutation_rate=mu, priors=prior1)
# Uses a denser timegrid
ts2 = tsdate.date(ts, mutation_rate=mu, priors=prior2)
# Approximate discrete-time priors with a gamma
ts3 = tsdate.date(ts, mutation_rate=mu, priors=prior3)
```

See below for more explanation of the interpretation of the parameters passed to
Expand Down Expand Up @@ -76,7 +81,7 @@ be larger than this number. For instance
```{code-cell} ipython3
timepoints = 10
prior = tsdate.build_prior_grid(ts, population_size=N, timepoints=timepoints)
dated_ts = tsdate.date(ts, mu, priors=prior)
dated_ts = tsdate.date(ts, mutation_rate=mu, priors=prior)
print(
f"`timepoints`={timepoints}, producing a total of {len(prior.timepoints)}",
Expand Down
2 changes: 1 addition & 1 deletion docs/python-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ This page provides formal documentation for the `tsdate` Python API.
.. autofunction:: tsdate.date
.. autodata:: tsdate.core.estimation_methods
:no-value:
.. autofunction:: tsdate.variational_gamma
.. autofunction:: tsdate.inside_outside
.. autofunction:: tsdate.maximization
.. autofunction:: tsdate.variational_gamma
```

## Prior and Time Discretisation Options
Expand Down
12 changes: 8 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,15 +215,19 @@ per-bp per-year rate, you will also need to modify the effective population size
```{code-cell} ipython3
import numpy as np
popsize = 100 # Diploid population size
mutation_rate_per_gen = 1e-8
mu = 1e-8 # per generation
# By default, dates are in generations
ts_generations = tsdate.date(ts, mutation_rate_per_gen, popsize)
ts_generations = tsdate.date(ts, mutation_rate=mu, population_size=popsize)
# To infer dates in years, adjust both the rates and the population size:
generation_time = 30 # Years
mutation_rate_per_year = mutation_rate_per_gen / generation_time
mutation_rate_per_year = mu / generation_time
ts_years = tsdate.date(
ts, mutation_rate_per_year, popsize * generation_time, time_units="years")
ts,
mutation_rate=mutation_rate_per_year,
population_size=popsize * generation_time,
time_units="years"
)
# Check that the inferred node times are identical, just on different scales
assert np.allclose(ts_generations.nodes_time, ts_years.nodes_time / generation_time, 5)
Expand Down
4 changes: 2 additions & 2 deletions docs/variable_popsize.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ get a poor fit to the true times:

```{code-cell} ipython3
import tsdate
redated_ts = tsdate.date(ts, mutation_rate, population_size=1e4)
redated_ts = tsdate.date(ts, mutation_rate=mutation_rate, population_size=1e4)
plot_real_vs_tsdate_times(ts.nodes_time, redated_ts.nodes_time, ts, redated_ts, delta=1000, alpha=0.1)
```

Expand All @@ -88,7 +88,7 @@ gives a much better fit to the true times:

```{code-cell} ipython3
prior = tsdate.build_prior_grid(ts, popsize)
redated_ts = tsdate.date(ts, mutation_rate, priors=prior)
redated_ts = tsdate.date(ts, mutation_rate=mutation_rate, priors=prior)
plot_real_vs_tsdate_times(ts.nodes_time, redated_ts.nodes_time, ts, redated_ts, delta=1000, alpha=0.1)
```

Expand Down
4 changes: 2 additions & 2 deletions tests/test_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2040,7 +2040,7 @@ def test_historical_samples(self):

samples = tsinfer.formats.SampleData.from_tree_sequence(ts)
inferred = tsinfer.infer(samples).simplify(filter_sites=False)
dated = date(inferred, 10000, 1e-8)
dated = date(inferred, mutation_rate=1e-8, population_size=10000)
sites_time = tsdate.sites_time_from_ts(dated)
# Add in the original individual times
ind_dated_sd = samples.copy()
Expand Down Expand Up @@ -2074,7 +2074,7 @@ def test_sampledata(self):
ts, use_sites_time=False
)
inferred = tsinfer.infer(samples).simplify()
dated = date(inferred, 10000, 1e-8)
dated = date(inferred, population_size=10000, mutation_rate=1e-8)
sites_time = tsdate.sites_time_from_ts(dated)
sites_bound = samples.min_site_times(individuals_only=True)
check_sites_time = np.maximum(sites_time, sites_bound)
Expand Down
4 changes: 2 additions & 2 deletions tests/test_inference.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2021-23 Tskit Developers
# Copyright (c) 2021-24 Tskit Developers
# Copyright (c) 2020 University of Oxford
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
Expand Down Expand Up @@ -78,7 +78,7 @@ def test_bad_population_size(self):

def test_both_ne_and_population_size_specified(self):
ts = utility_functions.two_tree_mutation_ts()
with pytest.raises(ValueError, match="may be specified"):
with pytest.raises(ValueError, match="Only provide one of Ne"):
tsdate.date(
ts, mutation_rate=None, population_size=PopulationSizeHistory(1), Ne=1
)
Expand Down
5 changes: 3 additions & 2 deletions tsdate/cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# MIT License
#
# Copyright (c) 2020 University of Oxford
# Copyright (c) 2024 Tskit Developers
# Copyright (c) 2020-2024 University of Oxford
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -281,7 +282,7 @@ def run_date(args):
params["ignore_oldest_root"] = args.ignore_oldest # For backwards compat
# TODO: remove and error out if ignore_oldest_root is set,
# see https://github.com/tskit-dev/tsdate/issues/262
dated_ts = tsdate.date(ts, args.mutation_rate, **params)
dated_ts = tsdate.date(ts, mutation_rate=args.mutation_rate, **params)
dated_ts.dump(args.output)


Expand Down
Loading

0 comments on commit 6eacd25

Please sign in to comment.