Releases · tskit-dev/tskit

27 Nov 14:27

github-actions

1.0.0

f1b139e

Python 1.0.0 Latest

Latest

🎉 Version 1 is here!!! 🥳

tskit development doesn't end here, but this marks the point at which:

Breaking changes will not be made except where it is unavoidable to correct incorrect behaviour or where they are forced by external factors such as dependencies

Full credit for this release and for tskit generally goes to the wonderful community of contributors, who you can see here: https://tskit.dev/software/tskit.html

Full changelog:

Breaking changes

The reference_sequence argument to TreeSequence.alignments is now
required to be the same length as the tree sequence. Previously it was
required to be the length of the requested interval.
(@benjeffery, #3317)
TreeSequence.tables now returns a zero-copy immutable view of the tables.
To get a mutable copy, use TreeSequence.dump_tables().
(@benjeffery, #3288, #760)
For a tree sequence to be valid, the mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence() will raise a _tskit.LibraryError if this
is not the case.
(@benjeffery, #2729, #2732, #3212).
Drop Python 3.9 support and require Python >= 3.10.
(#3267, @benjeffery)
ltrim, rtrim, trim and shift raise an error if they are
used on a tree sequence containing a reference sequence.
(@hyanwong, #3210, #2091)

Features

Add tskit.jit.numba.jitwrap and NumbaTreeSequence to allow simplified
use and development of Numba-jitted functions with tree sequences. See the
documentation <https://tskit.dev/tskit/docs/stable/numba.html>_ for details.
(@andrewkern, #3295, #3294)
TreeSequence.map_to_vcf_model now also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173)
draw_svg() methods now associate tree branches with edge IDs.
(@hyanwong, #3193, #557)
draw_svg() methods now allow the y-axis to be placed on the right-hand side
using y_axis="right". (@hyanwong, #3201)
Add contig_id and isolated_as_missing to VcfModelMapping
(@benjeffery, #3219, #3177).
Add TreeSequence.mutations_edge, which returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189)
Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and
TreeSequence.mutations_inherited_state properties to return the ancestral state of sites,
the derived state of mutations and the inherited state of mutations as NumPy arrays of
the new NumPy 2.0 StringDType.
(@benjeffery, #3228, #2632, #3276, #2631)
Tskit now requires NumPy version 2 or later. However, you can still use
tskit with NumPy 1.x by building tskit from source with NumPy 1.x using
pip install tskit --no-binary tskit. With NumPy 1.x, any use of the new
StringDType properties will result in a RuntimeError. If you try to
use another Python module that was compiled against NumPy 1.x with NumPy 2.x
you may see the error "A module that was compiled using NumPy 1.x cannot be
run in NumPy 2.0.0 as it may crash.". If no newer version of the module is
available you will have to use the NumPy 1.x build as above.
Add Mutation.inherited_state property which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631)
Add all_mutations and all_edges options to TreeSequence.union,
allowing greater flexibility in "disjoint union" situations.
(@hyanwong, @petrelharp, #3181)
Add TreeSequence.divergence_matrix, which was previously undocumented.
TreeSequence.variants, .genotype_matrix, .haplotypes, and .alignments methods
now fully support isolated_as_missing behaviour with internal nodes. .alignments is
also around 10% faster.
(@benjeffery, #3313, #3317, #1896)

Bugfixes

In some tables with mutations out-of-order TableCollection.sort did not re-order
the mutations so they formed a valid TreeSequence. TableCollection.sort and
TableCollection.canonicalise now sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253)
Fix bug in TreeSequence.genetic_relatedness_vector that previously ignored
span_normalise: previously, span_normalise was always set to False;
now the default is True in agreement with other statistics, so the returned
values will change. (@petrelharp, #3300, #3241)
Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175)
Fix metadata schemas that are equal but have different byte representations not
being considered equal when using TableCollection.assert_equals and
Table.assert_equals.
(@benjeffery, #3246, #3244)
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons for TreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055)
Fix UnboundLocalError in draw_svg() when using numeric max_time
values with mutations over roots.
(@benjeffery, #3274, #3273)
Prevent iterating over a TopologyCounter.
(@benjeffery, #3202, #1462)
Fix TreeSequence.concatenate() to work with internal samples by using the
all_mutations and all_edges parameters in union().
(@hyanwong, #3283, #3181)

Assets 2

15 Oct 12:27

github-actions

1.0.0b3

20d630b

Python 1.0.0b3

Breaking Changes

TreeSequence.tables now returns a zero-copy immutable view of the tables.
To get a mutable copy, use TreeSequence.dump_tables().
(@benjeffery, #3288, #760)
For a tree seqeunce to be valid mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence() will raise a _tskit.LibraryError if this
is not the case.
(@benjeffery, #2729, #2732, #3212).
Drop Python 3.9 support, require Python >= 3.10 (#3267, @benjeffery)

Features

TreeSequence.map_to_vcf_model now also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173)
draw_svg() methods now associate tree branches with edge IDs
(@hyanwong, #3193, #557)
draw_svg() methods now allow the y-axis to be placed on the right-hand side
using y_axis="right" (@hyanwong, #3201)
Add contig_id and isolated_as_missing to VcfModelMapping
(@benjeffery, #3219, #3177)
Add TreeSequence.mutations_edge which returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189)
Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and
TreeSequence.mutations_inherited_state properties to return the ancestral state of sites,
derived state of mutations and inherited state of mutations as NumPy arrays of
the new numpy 2.0 StringDType.
(@benjeffery, #3228, #2632, #3276, #2631)
Tskit now distributes with a requirement of numpy version 2 or greater. However, you can still use
tskit with numpy 1.X by building tskit from source with numpy 1.X using pip install tskit --no-binary tskit.
With numpy 1.X, any use of the new StringDType properties will result in a RuntimeError.
If you try to use another python module that was compiled against numpy 1.X with numpy 2.X you may see
the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.".
If no newer version of the module is available you will have to use the Numpy 1.X build as above.
Add Mutation.inherited_state property which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631)

Bugfixes

In some tables with mutations out-of-order TableCollection.sort did not re-order
the mutations so they formed a valid TreeSequence. TableCollection.sort and
TableCollection.canonicalise now sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253)
Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175)
Fix metadata schemas that are equal but have different byte representations not being equal
when using TableCollection.assert_equals and Table.assert_equals.
(@benjeffery, #3246, #3244)
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons for TreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055)
Fix UnboundLocalError in draw_svg() when using numeric max_time
values with mutations over roots.
(@benjeffery, #3274, #3273)
Prevent iterating over a TopologyCounter
(@benjeffery , #3202, #1462)

Breaking changes

ltrim, rtrim, trim and shift raise an error if used on a tree sequence
containing a reference sequence (@hyanwong, #3210, #2091)

Assets 2

24 Sep 16:22

github-actions

1.0.0b2

647ad96

Python 1.0.0b2

Packaging bugfix release

Assets 2

24 Sep 13:25

github-actions

1.0.0b1

d3f7642

Python 1.0.0b1

Breaking Changes

For a tree seqeunce to be valid mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence() will raise a _tskit.LibraryError if this
is not the case.
(@benjeffery, #2729, #2732, #3212).
Drop Python 3.9 support, require Python >= 3.10 (#3267, @benjeffery)

Features

TreeSequence.map_to_vcf_model now also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173)
draw_svg() methods now associate tree branches with edge IDs
(@hyanwong, #3193, #557)
draw_svg() methods now allow the y-axis to be placed on the right-hand side
using y_axis="right" (@hyanwong, #3201)
Add contig_id and isolated_as_missing to VcfModelMapping
(@benjeffery, #3219, #3177)
Add TreeSequence.mutations_edge which returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189)
Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and
TreeSequence.mutations_inherited_state properties to return the ancestral state of sites,
derived state of mutations and inherited state of mutations as NumPy arrays of
the new numpy 2.0 StringDType.
(@benjeffery, #3228, #2632, #3276, #2631)
Tskit now distributes with a requirement of numpy version 2 or greater. However, you can still use
tskit with numpy 1.X by building tskit from source with numpy 1.X using pip install tskit --no-binary tskit.
With numpy 1.X, any use of the new StringDType properties will result in a RuntimeError.
If you try to use another python module that was compiled against numpy 1.X with numpy 2.X you may see
the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.".
If no newer version of the module is available you will have to use the Numpy 1.X build as above.
Add Mutation.inherited_state property which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631)

Bugfixes

In some tables with mutations out-of-order TableCollection.sort did not re-order
the mutations so they formed a valid TreeSequence. TableCollection.sort and
TableCollection.canonicalise now sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253)
Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175)
Fix metadata schemas that are equal but have different byte representations not being equal
when using TableCollection.assert_equals and Table.assert_equals.
(@benjeffery, #3246, #3244)
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons for TreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055)
Fix UnboundLocalError in draw_svg() when using numeric max_time
values with mutations over roots.
(@benjeffery, #3274, #3273)

Breaking changes

ltrim, rtrim, trim and shift raise an error if used on a tree sequence
containing a reference sequence (@hyanwong, #3210, #2091)

Assets 2

24 Sep 13:07

github-actions

C_1.2.0

d3f7642

C API C_1.2.0

Breaking changes

Remove tsk_diff_iter_t and associated functions.
(@benjeffery, #3221, #2797).
tsk_treeseq_init now requires that mutation parents in the table collection
are correct and consistent with the topology of the tree at each mutation site.
Returns TSK_ERR_BAD_MUTATION_PARENT if this is not the case, or
TSK_ERR_MUTATION_PARENT_AFTER_CHILD if the mutations are not in an order
compatible with the correct mutation parent.
(@benjeffery, #2729, #2732, #3212).

Features

Add TSK_TS_INIT_COMPUTE_MUTATION_PARENTS to tsk_treeseq_init
to compute mutation parents from the tree sequence topology.
Note that the mutations must be in the correct order.
(@benjeffery, #2757, #3212).
Add TSK_CHECK_MUTATION_PARENTS option to tsk_table_collection_check_integrity
to check that mutation parents are consistent with the tree sequence topology.
This option implies TSK_CHECK_TREES.
(@benjeffery, #2729, #2732, #3212).
Add the TSK_NO_CHECK_INTEGRITY option to tsk_table_collection_compute_mutation_parents
to skip the integrity checks that are normally run when computing mutation parents.
This is useful for speeding up the computation of mutation parents when the
tree sequence is certainly known to be valid.
(@benjeffery, #3212).
Mutations returned by tsk_treeseq_get_mutation now include pre-computed
inherited_state and inherited_state_length fields. The inherited state
is computed during tree sequence initialization and represents the state that
existed at the site before each mutation occurred (either the ancestral state
if the mutation is the root mutation or the derived state of the parent mutation).
Note that this breaks ABI compatibility due to the addition of these fields
to the tsk_mutation_t struct.
(@benjeffery, #3277, #2631).

Assets 4

21 May 18:14

github-actions

0.6.4

eee766c

Python 0.6.4

Breaking changes

TreeSequence.write_vcf now filters non-sample nodes from individuals
by default, instead of raising an error. These nodes can be included using the
new include_non_sample_nodes argument.
By default individual names (sample IDs) in VCF output are now of the form
tsk_{individual.id} Previously these were always
"tsk_{j}" for j in range(num_individuals). This may break some downstream
code if individuals are specified. To fix, manually specify individual_names
to the required pattern.
(@benjeffery, #3163)

Features

Add TreeSequence.sample_nodes_by_ploidy method to return the sample nodes
in a tree sequence, grouped by a ploidy value.
(@benjeffery, #3157)
Add TreeSequence.individuals_nodes attribute to return the nodes
associated with each individual as a numpy array.
(@benjeffery, #3153)
Add shift method to both TableCollection and TreeSequence classes
allowing the coordinate system to be shifted, and TreeSequence.concatenate
so a set of tree sequence can be added to the right of an existing one.
(@hyanwong, #3165, #3164)
Add TreeSequence.map_to_vcf_model method to return a mapping of
the tree sequence to the VCF model.
(@benjeffery, #3163)
Use a thin space as the thousands separator in HTML output,
and a comma in CLI output.
(@hossam26644, #3167, #2951)

Fixes

Correct assertion message when tables are compared with metadata ignored.
(@benjeffery, #3162, #3161)

Assets 2

28 Apr 16:12

github-actions

0.6.3

4f532bd

Python 0.6.3

Bugfixes

TreeSequence.draw_svg(path=...) was failing due to a missing
import xml.dom.minidom (@petrelharp, #3144, #3145)

Assets 2

01 Apr 16:55

github-actions

0.6.2

ecece30

Python 0.6.2

Bugfixes

Meatdata.schema was returning a modified schema, this is fixed to return a copy of
the original schema instead (@benjeffery, #3129, #3130)

Assets 2

31 Mar 15:59

github-actions

0.6.1

d0b470d

Python 0.6.1

Bugfixes

Fix to TreeSequence.pair_coalescence_counts output dimension when
provided with time windows containing no nodes (@nspope,
#3046, #3058)
Fix to TreeSequence.pair_coalescence_counts to normalise by non-missing
span if span_normalise=True. This resolves a bug where
TreeSequence.pair_coalescence_rates would return incorrect values for
intervals with missing trees. (@natep, #3053, #3059)
Fix to TreeSequence.pair_coalescence_rates causing an
assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)

Features

Add support for fixed-length arrays in metadata struct codec using the length property.
(@benjeffery, #3088,#3090)
Add a new TreeSequence.pca method that uses randomized linear algebra
to find the top eigenvectors/values of the genetic relatedness matrix
(@hanbin973, @petrelharp, #3008)
Add methods on TreeSequence to efficiently get table metadata as a
numpy structured array. (@benjeffery, #3098)
Add Python 3.13 support (@benjeffery, #3107)
Add a preamble argument to draw_svg() methods to allow adding arbitrary extra
graphics (e.g. legends) to SVG plots (@hyanwong, issue:3086`, #3121)

Assets 2

31 Mar 15:59

github-actions

C_1.1.4

d0b470d

C API C_1.1.4

Changes

Added the TSK_TRACE_ERRORS macro to enable tracing of errors in the C library.
This is useful for debugging as errors will print to stderr when set.
(@jeromekelleher, #3095).

Assets 4

Releases: tskit-dev/tskit

Python 1.0.0

🎉 Version 1 is here!!! 🥳

Uh oh!

Python 1.0.0b3

Uh oh!

Python 1.0.0b2

Uh oh!

Python 1.0.0b1

Uh oh!

C API C_1.2.0

Uh oh!

Python 0.6.4

Uh oh!

Python 0.6.3

Uh oh!

Python 0.6.2

Uh oh!

Python 0.6.1

Uh oh!

C API C_1.1.4

Uh oh!