Releases: tskit-dev/tskit
Python 1.0.0
🎉 Version 1 is here!!! 🥳
tskit development doesn't end here, but this marks the point at which:
Breaking changes will not be made except where it is unavoidable to correct incorrect behaviour or where they are forced by external factors such as dependencies
Full credit for this release and for tskit generally goes to the wonderful community of contributors, who you can see here: https://tskit.dev/software/tskit.html
Full changelog:
Breaking changes
-
The
reference_sequenceargument toTreeSequence.alignmentsis now
required to be the same length as the tree sequence. Previously it was
required to be the length of the requested interval.
(@benjeffery, #3317) -
TreeSequence.tablesnow returns a zero-copy immutable view of the tables.
To get a mutable copy, useTreeSequence.dump_tables().
(@benjeffery, #3288, #760) -
For a tree sequence to be valid, the mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence()will raise a_tskit.LibraryErrorif this
is not the case.
(@benjeffery, #2729, #2732, #3212). -
Drop Python 3.9 support and require Python >= 3.10.
(#3267, @benjeffery) -
ltrim,rtrim,trimandshiftraise an error if they are
used on a tree sequence containing a reference sequence.
(@hyanwong, #3210, #2091)
Features
-
Add
tskit.jit.numba.jitwrapandNumbaTreeSequenceto allow simplified
use and development of Numba-jitted functions with tree sequences. See the
documentation <https://tskit.dev/tskit/docs/stable/numba.html>_ for details.
(@andrewkern, #3295, #3294) -
TreeSequence.map_to_vcf_modelnow also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173) -
draw_svg()methods now associate tree branches with edge IDs.
(@hyanwong, #3193, #557) -
draw_svg()methods now allow the y-axis to be placed on the right-hand side
usingy_axis="right". (@hyanwong, #3201) -
Add
contig_idandisolated_as_missingtoVcfModelMapping
(@benjeffery, #3219, #3177). -
Add
TreeSequence.mutations_edge, which returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189) -
Add
TreeSequence.sites_ancestral_state,TreeSequence.mutations_derived_stateand
TreeSequence.mutations_inherited_stateproperties to return the ancestral state of sites,
the derived state of mutations and the inherited state of mutations as NumPy arrays of
the new NumPy 2.0StringDType.
(@benjeffery, #3228, #2632, #3276, #2631) -
Tskit now requires NumPy version 2 or later. However, you can still use
tskit with NumPy 1.x by building tskit from source with NumPy 1.x using
pip install tskit --no-binary tskit. With NumPy 1.x, any use of the new
StringDTypeproperties will result in aRuntimeError. If you try to
use another Python module that was compiled against NumPy 1.x with NumPy 2.x
you may see the error "A module that was compiled using NumPy 1.x cannot be
run in NumPy 2.0.0 as it may crash.". If no newer version of the module is
available you will have to use the NumPy 1.x build as above. -
Add
Mutation.inherited_stateproperty which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631) -
Add
all_mutationsandall_edgesoptions toTreeSequence.union,
allowing greater flexibility in "disjoint union" situations.
(@hyanwong, @petrelharp, #3181) -
Add
TreeSequence.divergence_matrix, which was previously undocumented. -
TreeSequence.variants,.genotype_matrix,.haplotypes, and.alignmentsmethods
now fully supportisolated_as_missingbehaviour with internal nodes..alignmentsis
also around 10% faster.
(@benjeffery, #3313, #3317, #1896)
Bugfixes
-
In some tables with mutations out-of-order
TableCollection.sortdid not re-order
the mutations so they formed a valid TreeSequence.TableCollection.sortand
TableCollection.canonicalisenow sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253) -
Fix bug in
TreeSequence.genetic_relatedness_vectorthat previously ignored
span_normalise: previously,span_normalisewas always set toFalse;
now the default isTruein agreement with other statistics, so the returned
values will change. (@petrelharp, #3300, #3241) -
Fix bug in
TreeSequence.pair_coalescence_countswhenspan_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175) -
Fix metadata schemas that are equal but have different byte representations not
being considered equal when usingTableCollection.assert_equalsand
Table.assert_equals.
(@benjeffery, #3246, #3244) -
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons forTreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055) -
Fix
UnboundLocalErrorindraw_svg()when using numericmax_time
values with mutations over roots.
(@benjeffery, #3274, #3273) -
Prevent iterating over a
TopologyCounter.
(@benjeffery, #3202, #1462) -
Fix
TreeSequence.concatenate()to work with internal samples by using the
all_mutationsandall_edgesparameters inunion().
(@hyanwong, #3283, #3181)
Python 1.0.0b3
Breaking Changes
-
TreeSequence.tablesnow returns a zero-copy immutable view of the tables.
To get a mutable copy, useTreeSequence.dump_tables().
(@benjeffery, #3288, #760) -
For a tree seqeunce to be valid mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence()will raise a_tskit.LibraryErrorif this
is not the case.
(@benjeffery, #2729, #2732, #3212). -
Drop Python 3.9 support, require Python >= 3.10 (#3267, @benjeffery)
Features
-
TreeSequence.map_to_vcf_modelnow also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173) -
draw_svg()methods now associate tree branches with edge IDs
(@hyanwong, #3193, #557) -
draw_svg()methods now allow the y-axis to be placed on the right-hand side
usingy_axis="right"(@hyanwong, #3201) -
Add
contig_idandisolated_as_missingtoVcfModelMapping
(@benjeffery, #3219, #3177) -
Add
TreeSequence.mutations_edgewhich returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189) -
Add
TreeSequence.sites_ancestral_state,TreeSequence.mutations_derived_stateand
TreeSequence.mutations_inherited_stateproperties to return the ancestral state of sites,
derived state of mutations and inherited state of mutations as NumPy arrays of
the new numpy 2.0 StringDType.
(@benjeffery, #3228, #2632, #3276, #2631) -
Tskit now distributes with a requirement of numpy version 2 or greater. However, you can still use
tskit with numpy 1.X by building tskit from source with numpy 1.X usingpip install tskit --no-binary tskit.
With numpy 1.X, any use of the new StringDType properties will result in aRuntimeError.
If you try to use another python module that was compiled against numpy 1.X with numpy 2.X you may see
the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.".
If no newer version of the module is available you will have to use the Numpy 1.X build as above. -
Add
Mutation.inherited_stateproperty which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631)
Bugfixes
-
In some tables with mutations out-of-order
TableCollection.sortdid not re-order
the mutations so they formed a valid TreeSequence.TableCollection.sortand
TableCollection.canonicalisenow sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253) -
Fix bug in
TreeSequence.pair_coalescence_countswhenspan_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175) -
Fix metadata schemas that are equal but have different byte representations not being equal
when usingTableCollection.assert_equalsandTable.assert_equals.
(@benjeffery, #3246, #3244) -
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons forTreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055) -
Fix
UnboundLocalErrorindraw_svg()when using numericmax_time
values with mutations over roots.
(@benjeffery, #3274, #3273) -
Prevent iterating over a
TopologyCounter
(@benjeffery , #3202, #1462)
Breaking changes
Python 1.0.0b2
Packaging bugfix release
Python 1.0.0b1
Breaking Changes
-
For a tree seqeunce to be valid mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence()will raise a_tskit.LibraryErrorif this
is not the case.
(@benjeffery, #2729, #2732, #3212). -
Drop Python 3.9 support, require Python >= 3.10 (#3267, @benjeffery)
Features
-
TreeSequence.map_to_vcf_modelnow also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173) -
draw_svg()methods now associate tree branches with edge IDs
(@hyanwong, #3193, #557) -
draw_svg()methods now allow the y-axis to be placed on the right-hand side
usingy_axis="right"(@hyanwong, #3201) -
Add
contig_idandisolated_as_missingtoVcfModelMapping
(@benjeffery, #3219, #3177) -
Add
TreeSequence.mutations_edgewhich returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189) -
Add
TreeSequence.sites_ancestral_state,TreeSequence.mutations_derived_stateand
TreeSequence.mutations_inherited_stateproperties to return the ancestral state of sites,
derived state of mutations and inherited state of mutations as NumPy arrays of
the new numpy 2.0 StringDType.
(@benjeffery, #3228, #2632, #3276, #2631) -
Tskit now distributes with a requirement of numpy version 2 or greater. However, you can still use
tskit with numpy 1.X by building tskit from source with numpy 1.X usingpip install tskit --no-binary tskit.
With numpy 1.X, any use of the new StringDType properties will result in aRuntimeError.
If you try to use another python module that was compiled against numpy 1.X with numpy 2.X you may see
the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.".
If no newer version of the module is available you will have to use the Numpy 1.X build as above. -
Add
Mutation.inherited_stateproperty which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631)
Bugfixes
-
In some tables with mutations out-of-order
TableCollection.sortdid not re-order
the mutations so they formed a valid TreeSequence.TableCollection.sortand
TableCollection.canonicalisenow sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253) -
Fix bug in
TreeSequence.pair_coalescence_countswhenspan_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175) -
Fix metadata schemas that are equal but have different byte representations not being equal
when usingTableCollection.assert_equalsandTable.assert_equals.
(@benjeffery, #3246, #3244) -
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons forTreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055) -
Fix
UnboundLocalErrorindraw_svg()when using numericmax_time
values with mutations over roots.
(@benjeffery, #3274, #3273)
Breaking changes
C API C_1.2.0
Breaking changes
-
Remove
tsk_diff_iter_tand associated functions.
(@benjeffery, #3221, #2797). -
tsk_treeseq_initnow requires that mutation parents in the table collection
are correct and consistent with the topology of the tree at each mutation site.
ReturnsTSK_ERR_BAD_MUTATION_PARENTif this is not the case, or
TSK_ERR_MUTATION_PARENT_AFTER_CHILDif the mutations are not in an order
compatible with the correct mutation parent.
(@benjeffery, #2729, #2732, #3212).
Features
-
Add
TSK_TS_INIT_COMPUTE_MUTATION_PARENTStotsk_treeseq_init
to compute mutation parents from the tree sequence topology.
Note that the mutations must be in the correct order.
(@benjeffery, #2757, #3212). -
Add
TSK_CHECK_MUTATION_PARENTSoption totsk_table_collection_check_integrity
to check that mutation parents are consistent with the tree sequence topology.
This option impliesTSK_CHECK_TREES.
(@benjeffery, #2729, #2732, #3212). -
Add the
TSK_NO_CHECK_INTEGRITYoption totsk_table_collection_compute_mutation_parents
to skip the integrity checks that are normally run when computing mutation parents.
This is useful for speeding up the computation of mutation parents when the
tree sequence is certainly known to be valid.
(@benjeffery, #3212). -
Mutations returned by
tsk_treeseq_get_mutationnow include pre-computed
inherited_stateandinherited_state_lengthfields. The inherited state
is computed during tree sequence initialization and represents the state that
existed at the site before each mutation occurred (either the ancestral state
if the mutation is the root mutation or the derived state of the parent mutation).
Note that this breaks ABI compatibility due to the addition of these fields
to thetsk_mutation_tstruct.
(@benjeffery, #3277, #2631).
Python 0.6.4
Breaking changes
TreeSequence.write_vcfnow filters non-sample nodes from individuals
by default, instead of raising an error. These nodes can be included using the
newinclude_non_sample_nodesargument.
By default individual names (sample IDs) in VCF output are now of the form
tsk_{individual.id}Previously these were always
"tsk_{j}" for j in range(num_individuals). This may break some downstream
code if individuals are specified. To fix, manually specifyindividual_names
to the required pattern.
(@benjeffery, #3163)
Features
-
Add
TreeSequence.sample_nodes_by_ploidymethod to return the sample nodes
in a tree sequence, grouped by a ploidy value.
(@benjeffery, #3157) -
Add
TreeSequence.individuals_nodesattribute to return the nodes
associated with each individual as a numpy array.
(@benjeffery, #3153) -
Add
shiftmethod to bothTableCollectionandTreeSequenceclasses
allowing the coordinate system to be shifted, andTreeSequence.concatenate
so a set of tree sequence can be added to the right of an existing one.
(@hyanwong, #3165, #3164) -
Add
TreeSequence.map_to_vcf_modelmethod to return a mapping of
the tree sequence to the VCF model.
(@benjeffery, #3163) -
Use a thin space as the thousands separator in HTML output,
and a comma in CLI output.
(@hossam26644, #3167, #2951)
Fixes
- Correct assertion message when tables are compared with metadata ignored.
(@benjeffery, #3162, #3161)
Python 0.6.3
Bugfixes
TreeSequence.draw_svg(path=...)was failing due to a missing
import xml.dom.minidom(@petrelharp, #3144, #3145)
Python 0.6.2
Bugfixes
- Meatdata.schema was returning a modified schema, this is fixed to return a copy of
the original schema instead (@benjeffery, #3129, #3130)
Python 0.6.1
Bugfixes
-
Fix to
TreeSequence.pair_coalescence_countsoutput dimension when
provided with time windows containing no nodes (@nspope,
#3046, #3058) -
Fix to
TreeSequence.pair_coalescence_countsto normalise by non-missing
span ifspan_normalise=True. This resolves a bug where
TreeSequence.pair_coalescence_rateswould return incorrect values for
intervals with missing trees. (@natep, #3053, #3059) -
Fix to
TreeSequence.pair_coalescence_ratescausing an
assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)
Features
-
Add support for fixed-length arrays in metadata struct codec using the
lengthproperty.
(@benjeffery, #3088,#3090) -
Add a new
TreeSequence.pcamethod that uses randomized linear algebra
to find the top eigenvectors/values of the genetic relatedness matrix
(@hanbin973, @petrelharp, #3008) -
Add methods on
TreeSequenceto efficiently get table metadata as a
numpy structured array. (@benjeffery, #3098) -
Add Python 3.13 support (@benjeffery, #3107)
-
Add a
preambleargument todraw_svg()methods to allow adding arbitrary extra
graphics (e.g. legends) to SVG plots (@hyanwong,issue:3086`, #3121)
C API C_1.1.4
Changes
- Added the TSK_TRACE_ERRORS macro to enable tracing of errors in the C library.
This is useful for debugging as errors will print to stderr when set.
(@jeromekelleher, #3095).