Skip to content

Releases: tskit-dev/tskit

Python 1.0.0

27 Nov 14:27
f1b139e

Choose a tag to compare

🎉 Version 1 is here!!! 🥳

tskit development doesn't end here, but this marks the point at which:

Breaking changes will not be made except where it is unavoidable to correct incorrect behaviour or where they are forced by external factors such as dependencies

Full credit for this release and for tskit generally goes to the wonderful community of contributors, who you can see here: https://tskit.dev/software/tskit.html

Full changelog:

Breaking changes

  • The reference_sequence argument to TreeSequence.alignments is now
    required to be the same length as the tree sequence. Previously it was
    required to be the length of the requested interval.
    (@benjeffery, #3317)

  • TreeSequence.tables now returns a zero-copy immutable view of the tables.
    To get a mutable copy, use TreeSequence.dump_tables().
    (@benjeffery, #3288, #760)

  • For a tree sequence to be valid, the mutation parents in the table collection
    must be correct and consistent with the topology of the tree at each mutation site.
    TableCollection.tree_sequence() will raise a _tskit.LibraryError if this
    is not the case.
    (@benjeffery, #2729, #2732, #3212).

  • Drop Python 3.9 support and require Python >= 3.10.
    (#3267, @benjeffery)

  • ltrim, rtrim, trim and shift raise an error if they are
    used on a tree sequence containing a reference sequence.
    (@hyanwong, #3210, #2091)

Features

  • Add tskit.jit.numba.jitwrap and NumbaTreeSequence to allow simplified
    use and development of Numba-jitted functions with tree sequences. See the
    documentation <https://tskit.dev/tskit/docs/stable/numba.html>_ for details.
    (@andrewkern, #3295, #3294)

  • TreeSequence.map_to_vcf_model now also returns the transformed positions and
    contig length. (@benjeffery, #3174, #3173)

  • draw_svg() methods now associate tree branches with edge IDs.
    (@hyanwong, #3193, #557)

  • draw_svg() methods now allow the y-axis to be placed on the right-hand side
    using y_axis="right". (@hyanwong, #3201)

  • Add contig_id and isolated_as_missing to VcfModelMapping
    (@benjeffery, #3219, #3177).

  • Add TreeSequence.mutations_edge, which returns the edge ID for each mutation's
    edge. (@benjeffery, #3226, #3189)

  • Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and
    TreeSequence.mutations_inherited_state properties to return the ancestral state of sites,
    the derived state of mutations and the inherited state of mutations as NumPy arrays of
    the new NumPy 2.0 StringDType.
    (@benjeffery, #3228, #2632, #3276, #2631)

  • Tskit now requires NumPy version 2 or later. However, you can still use
    tskit with NumPy 1.x by building tskit from source with NumPy 1.x using
    pip install tskit --no-binary tskit. With NumPy 1.x, any use of the new
    StringDType properties will result in a RuntimeError. If you try to
    use another Python module that was compiled against NumPy 1.x with NumPy 2.x
    you may see the error "A module that was compiled using NumPy 1.x cannot be
    run in NumPy 2.0.0 as it may crash.". If no newer version of the module is
    available you will have to use the NumPy 1.x build as above.

  • Add Mutation.inherited_state property which returns the inherited state
    for a single mutation. (@benjeffery, #3277, #2631)

  • Add all_mutations and all_edges options to TreeSequence.union,
    allowing greater flexibility in "disjoint union" situations.
    (@hyanwong, @petrelharp, #3181)

  • Add TreeSequence.divergence_matrix, which was previously undocumented.

  • TreeSequence.variants, .genotype_matrix, .haplotypes, and .alignments methods
    now fully support isolated_as_missing behaviour with internal nodes. .alignments is
    also around 10% faster.
    (@benjeffery, #3313, #3317, #1896)

Bugfixes

  • In some tables with mutations out-of-order TableCollection.sort did not re-order
    the mutations so they formed a valid TreeSequence. TableCollection.sort and
    TableCollection.canonicalise now sort mutations by site, then time (if known),
    then the mutation's node's time, then number of descendant mutations
    (ensuring that parent mutations occur before children), then node, then
    their original order in the tables. (@benjeffery, #3257, #3253)

  • Fix bug in TreeSequence.genetic_relatedness_vector that previously ignored
    span_normalise: previously, span_normalise was always set to False;
    now the default is True in agreement with other statistics, so the returned
    values will change. (@petrelharp, #3300, #3241)

  • Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True
    and a window breakpoint falls within an internal missing interval.
    (@nspope, #3176, #3175)

  • Fix metadata schemas that are equal but have different byte representations not
    being considered equal when using TableCollection.assert_equals and
    Table.assert_equals.
    (@benjeffery, #3246, #3244)

  • k-way statistics no longer require k sample sets, allowing in particular
    "self" comparisons for TreeSequence.genetic_relatedness. This changes the
    error code returned in some situations.
    (@andrewkern, @petrelharp, #3235, #3055)

  • Fix UnboundLocalError in draw_svg() when using numeric max_time
    values with mutations over roots.
    (@benjeffery, #3274, #3273)

  • Prevent iterating over a TopologyCounter.
    (@benjeffery, #3202, #1462)

  • Fix TreeSequence.concatenate() to work with internal samples by using the
    all_mutations and all_edges parameters in union().
    (@hyanwong, #3283, #3181)

Python 1.0.0b3

15 Oct 12:27

Choose a tag to compare

Breaking Changes

  • TreeSequence.tables now returns a zero-copy immutable view of the tables.
    To get a mutable copy, use TreeSequence.dump_tables().
    (@benjeffery, #3288, #760)

  • For a tree seqeunce to be valid mutation parents in the table collection
    must be correct and consistent with the topology of the tree at each mutation site.
    TableCollection.tree_sequence() will raise a _tskit.LibraryError if this
    is not the case.
    (@benjeffery, #2729, #2732, #3212).

  • Drop Python 3.9 support, require Python >= 3.10 (#3267, @benjeffery)

Features

  • TreeSequence.map_to_vcf_model now also returns the transformed positions and
    contig length. (@benjeffery, #3174, #3173)

  • draw_svg() methods now associate tree branches with edge IDs
    (@hyanwong, #3193, #557)

  • draw_svg() methods now allow the y-axis to be placed on the right-hand side
    using y_axis="right" (@hyanwong, #3201)

  • Add contig_id and isolated_as_missing to VcfModelMapping
    (@benjeffery, #3219, #3177)

  • Add TreeSequence.mutations_edge which returns the edge ID for each mutation's
    edge. (@benjeffery, #3226, #3189)

  • Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and
    TreeSequence.mutations_inherited_state properties to return the ancestral state of sites,
    derived state of mutations and inherited state of mutations as NumPy arrays of
    the new numpy 2.0 StringDType.
    (@benjeffery, #3228, #2632, #3276, #2631)

  • Tskit now distributes with a requirement of numpy version 2 or greater. However, you can still use
    tskit with numpy 1.X by building tskit from source with numpy 1.X using pip install tskit --no-binary tskit.
    With numpy 1.X, any use of the new StringDType properties will result in a RuntimeError.
    If you try to use another python module that was compiled against numpy 1.X with numpy 2.X you may see
    the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.".
    If no newer version of the module is available you will have to use the Numpy 1.X build as above.

  • Add Mutation.inherited_state property which returns the inherited state
    for a single mutation. (@benjeffery, #3277, #2631)

Bugfixes

  • In some tables with mutations out-of-order TableCollection.sort did not re-order
    the mutations so they formed a valid TreeSequence. TableCollection.sort and
    TableCollection.canonicalise now sort mutations by site, then time (if known),
    then the mutation's node's time, then number of descendant mutations
    (ensuring that parent mutations occur before children), then node, then
    their original order in the tables. (@benjeffery, #3257, #3253)

  • Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True
    and a window breakpoint falls within an internal missing interval.
    (@nspope, #3176, #3175)

  • Fix metadata schemas that are equal but have different byte representations not being equal
    when using TableCollection.assert_equals and Table.assert_equals.
    (@benjeffery, #3246, #3244)

  • k-way statistics no longer require k sample sets, allowing in particular
    "self" comparisons for TreeSequence.genetic_relatedness. This changes the
    error code returned in some situations.
    (@andrewkern, @petrelharp, #3235, #3055)

  • Fix UnboundLocalError in draw_svg() when using numeric max_time
    values with mutations over roots.
    (@benjeffery, #3274, #3273)

  • Prevent iterating over a TopologyCounter
    (@benjeffery , #3202, #1462)

Breaking changes

  • ltrim, rtrim, trim and shift raise an error if used on a tree sequence
    containing a reference sequence (@hyanwong, #3210, #2091)

Python 1.0.0b2

24 Sep 16:22

Choose a tag to compare

Packaging bugfix release

Python 1.0.0b1

24 Sep 13:25

Choose a tag to compare

Breaking Changes

  • For a tree seqeunce to be valid mutation parents in the table collection
    must be correct and consistent with the topology of the tree at each mutation site.
    TableCollection.tree_sequence() will raise a _tskit.LibraryError if this
    is not the case.
    (@benjeffery, #2729, #2732, #3212).

  • Drop Python 3.9 support, require Python >= 3.10 (#3267, @benjeffery)

Features

  • TreeSequence.map_to_vcf_model now also returns the transformed positions and
    contig length. (@benjeffery, #3174, #3173)

  • draw_svg() methods now associate tree branches with edge IDs
    (@hyanwong, #3193, #557)

  • draw_svg() methods now allow the y-axis to be placed on the right-hand side
    using y_axis="right" (@hyanwong, #3201)

  • Add contig_id and isolated_as_missing to VcfModelMapping
    (@benjeffery, #3219, #3177)

  • Add TreeSequence.mutations_edge which returns the edge ID for each mutation's
    edge. (@benjeffery, #3226, #3189)

  • Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and
    TreeSequence.mutations_inherited_state properties to return the ancestral state of sites,
    derived state of mutations and inherited state of mutations as NumPy arrays of
    the new numpy 2.0 StringDType.
    (@benjeffery, #3228, #2632, #3276, #2631)

  • Tskit now distributes with a requirement of numpy version 2 or greater. However, you can still use
    tskit with numpy 1.X by building tskit from source with numpy 1.X using pip install tskit --no-binary tskit.
    With numpy 1.X, any use of the new StringDType properties will result in a RuntimeError.
    If you try to use another python module that was compiled against numpy 1.X with numpy 2.X you may see
    the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.".
    If no newer version of the module is available you will have to use the Numpy 1.X build as above.

  • Add Mutation.inherited_state property which returns the inherited state
    for a single mutation. (@benjeffery, #3277, #2631)

Bugfixes

  • In some tables with mutations out-of-order TableCollection.sort did not re-order
    the mutations so they formed a valid TreeSequence. TableCollection.sort and
    TableCollection.canonicalise now sort mutations by site, then time (if known),
    then the mutation's node's time, then number of descendant mutations
    (ensuring that parent mutations occur before children), then node, then
    their original order in the tables. (@benjeffery, #3257, #3253)

  • Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True
    and a window breakpoint falls within an internal missing interval.
    (@nspope, #3176, #3175)

  • Fix metadata schemas that are equal but have different byte representations not being equal
    when using TableCollection.assert_equals and Table.assert_equals.
    (@benjeffery, #3246, #3244)

  • k-way statistics no longer require k sample sets, allowing in particular
    "self" comparisons for TreeSequence.genetic_relatedness. This changes the
    error code returned in some situations.
    (@andrewkern, @petrelharp, #3235, #3055)

  • Fix UnboundLocalError in draw_svg() when using numeric max_time
    values with mutations over roots.
    (@benjeffery, #3274, #3273)

Breaking changes

  • ltrim, rtrim, trim and shift raise an error if used on a tree sequence
    containing a reference sequence (@hyanwong, #3210, #2091)

C API C_1.2.0

24 Sep 13:07

Choose a tag to compare

Breaking changes

  • Remove tsk_diff_iter_t and associated functions.
    (@benjeffery, #3221, #2797).

  • tsk_treeseq_init now requires that mutation parents in the table collection
    are correct and consistent with the topology of the tree at each mutation site.
    Returns TSK_ERR_BAD_MUTATION_PARENT if this is not the case, or
    TSK_ERR_MUTATION_PARENT_AFTER_CHILD if the mutations are not in an order
    compatible with the correct mutation parent.
    (@benjeffery, #2729, #2732, #3212).

Features

  • Add TSK_TS_INIT_COMPUTE_MUTATION_PARENTS to tsk_treeseq_init
    to compute mutation parents from the tree sequence topology.
    Note that the mutations must be in the correct order.
    (@benjeffery, #2757, #3212).

  • Add TSK_CHECK_MUTATION_PARENTS option to tsk_table_collection_check_integrity
    to check that mutation parents are consistent with the tree sequence topology.
    This option implies TSK_CHECK_TREES.
    (@benjeffery, #2729, #2732, #3212).

  • Add the TSK_NO_CHECK_INTEGRITY option to tsk_table_collection_compute_mutation_parents
    to skip the integrity checks that are normally run when computing mutation parents.
    This is useful for speeding up the computation of mutation parents when the
    tree sequence is certainly known to be valid.
    (@benjeffery, #3212).

  • Mutations returned by tsk_treeseq_get_mutation now include pre-computed
    inherited_state and inherited_state_length fields. The inherited state
    is computed during tree sequence initialization and represents the state that
    existed at the site before each mutation occurred (either the ancestral state
    if the mutation is the root mutation or the derived state of the parent mutation).
    Note that this breaks ABI compatibility due to the addition of these fields
    to the tsk_mutation_t struct.
    (@benjeffery, #3277, #2631).

Python 0.6.4

21 May 18:14

Choose a tag to compare

Breaking changes

  • TreeSequence.write_vcf now filters non-sample nodes from individuals
    by default, instead of raising an error. These nodes can be included using the
    new include_non_sample_nodes argument.
    By default individual names (sample IDs) in VCF output are now of the form
    tsk_{individual.id} Previously these were always
    "tsk_{j}" for j in range(num_individuals). This may break some downstream
    code if individuals are specified. To fix, manually specify individual_names
    to the required pattern.
    (@benjeffery, #3163)

Features

  • Add TreeSequence.sample_nodes_by_ploidy method to return the sample nodes
    in a tree sequence, grouped by a ploidy value.
    (@benjeffery, #3157)

  • Add TreeSequence.individuals_nodes attribute to return the nodes
    associated with each individual as a numpy array.
    (@benjeffery, #3153)

  • Add shift method to both TableCollection and TreeSequence classes
    allowing the coordinate system to be shifted, and TreeSequence.concatenate
    so a set of tree sequence can be added to the right of an existing one.
    (@hyanwong, #3165, #3164)

  • Add TreeSequence.map_to_vcf_model method to return a mapping of
    the tree sequence to the VCF model.
    (@benjeffery, #3163)

  • Use a thin space as the thousands separator in HTML output,
    and a comma in CLI output.
    (@hossam26644, #3167, #2951)

Fixes

Python 0.6.3

28 Apr 16:12

Choose a tag to compare

Bugfixes

  • TreeSequence.draw_svg(path=...) was failing due to a missing
    import xml.dom.minidom (@petrelharp, #3144, #3145)

Python 0.6.2

01 Apr 16:55

Choose a tag to compare

Bugfixes

  • Meatdata.schema was returning a modified schema, this is fixed to return a copy of
    the original schema instead (@benjeffery, #3129, #3130)

Python 0.6.1

31 Mar 15:59

Choose a tag to compare

Bugfixes

  • Fix to TreeSequence.pair_coalescence_counts output dimension when
    provided with time windows containing no nodes (@nspope,
    #3046, #3058)

  • Fix to TreeSequence.pair_coalescence_counts to normalise by non-missing
    span if span_normalise=True. This resolves a bug where
    TreeSequence.pair_coalescence_rates would return incorrect values for
    intervals with missing trees. (@natep, #3053, #3059)

  • Fix to TreeSequence.pair_coalescence_rates causing an
    assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)

Features

  • Add support for fixed-length arrays in metadata struct codec using the length property.
    (@benjeffery, #3088,#3090)

  • Add a new TreeSequence.pca method that uses randomized linear algebra
    to find the top eigenvectors/values of the genetic relatedness matrix
    (@hanbin973, @petrelharp, #3008)

  • Add methods on TreeSequence to efficiently get table metadata as a
    numpy structured array. (@benjeffery, #3098)

  • Add Python 3.13 support (@benjeffery, #3107)

  • Add a preamble argument to draw_svg() methods to allow adding arbitrary extra
    graphics (e.g. legends) to SVG plots (@hyanwong, issue:3086`, #3121)

C API C_1.1.4

31 Mar 15:59

Choose a tag to compare

Changes

  • Added the TSK_TRACE_ERRORS macro to enable tracing of errors in the C library.
    This is useful for debugging as errors will print to stderr when set.
    (@jeromekelleher, #3095).