Releases: tskit-dev/tskit
C API 0.99.8
Minor feature release
New features
-
Add
tsk_treeseq_genetic_relatednessfor calculating genetic relatedness between
pairs of sets of nodes (@brieuclehmann, #1021, #1023, #974, #973, #898). -
Exposed
tsk_table_collection_set_indexesto the API
(@benjeffery, #870, #921).
Breaking changes
-
Added an
optionsargument totsk_table_collection_equals
and table equality methods to allow for more flexible equality criteria
(e.g., ignore top-level metadata and schema or provenance tables).
Existing code should add an extra final parameter0to retain the
current behaviour (@mufernando, @jeromekelleher,
#896, #897, #913, #917). -
Changed default behaviour of
tsk_table_collection_clearto not clear
provenances and addedoptionsargument to optionally clear provenances
and schemas (@benjeffery, #929, #1001). -
Renamed
tsk_treeseq_trait_regressiontotsk_treeseq_trait_linear_model.
Python 0.3.2
Minor feature release
Breaking changes
- Change several methods (
simplify(),trees(),Tree()) so most parameters
are keyword only, not positional. This allows reordering of parameters, so
that deprecated parameters can be moved, and the parameter order in similar functions,
e.g.TableCollection.simplifyandTreeSequence.simplify()can be made
consistent (@hyanwong, #374, #846, #851)
Features
-
Tree accessor functions (e.g.
ts.first(),ts.at()pass extra parameters such as
sample_indexesto the underlyingTreeconstructor; alsoroot_thresholdcan
be specified when callingts.trees()(@hyanwong, #847, #848) -
Genomic intervals returned by python functions are now namedtuples, allowing
.left
.rightand.spanusage (@hyanwong, #784, #786, #811) -
Added
include_terminalparameter to edge diffs iterator, to output the last edges
at the end of a tree sequence (@hyanwong, #783, #787) -
#832 - Add
metadata_bytesmethod to allow access to raw
TableCollection metadata (@benjeffery, #842) -
tskit.is_unknown_timecan now check arrays. (@benjeffery, #857).
C API 0.99.7
Minor feature release
-
Added
TSK_INCLUDE_TERMINALoption totsk_diff_iter_initto output the last edges
at the end of a tree sequence (@hyanwong, #783, #787) -
Added
tsk_bug_assertfor assertions that should be compiled into release binaries
(@benjeffery, #860)
Python 0.3.1
Minor bugfix release
Bugfixes
-
#823 - Fix mutation time error when using
simplify(keep_input_roots=True)(@petrelharp, #823). -
#821 - Fix mutation rows with unknown time never being equal (@petrelharp, #822).
C API 0.99.6
Bugfixes
- #823 - Fix mutation time error when using
tsk_table_collection_simplifywithTSK_KEEP_INPUT_ROOTS(@petrelharp, #823).
Python 0.3.0
Major feature release
This release adds metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others. This release also comes with wheels for windows, osx and linux.
❤️ Many thanks go to the tskit community and contributors for their awesome work on this release. ❤️
Breaking changes
-
The default display order for tree visualisations has been changed to
minlex(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree". -
File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError. Loading from an empty file now raises andEOFError. -
Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequenceobject which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709). -
The
TableCollection objectno longer implements the iterator protocol. Previouslylist(tables)returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_mapandTreeSequence.tables_dictattributes, which perform the same function (@jeromekelleher, #500, #694). -
The arguments to
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantsmust now be keyword arguments, not positional. This is to support the change fromimpute_missing_datatoisolated_as_missingin the arguments to these methods (@benjeffery, #716, #794).
New features
-
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subsetsubsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.unionforms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623). -
Mutations now have an optional double-precision floating-point
timecolumn. If not specified, this defaults to a particularNaNvalue (tskit.UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see mutation requirements. Also added functionTableCollection.compute_mutation_times. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672). -
Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
-
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
-
Tables with a metadata column now have a
metadata_schemathat is used to validate and encode metadata that is passed toadd_rowand decode metadata on calls totable[j]and e.g.tree_sequence.node(j)See metadata (@benjeffery, #491, #542, #543, #601). -
The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
-
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()andtskit.TreeSequence.draw_svg()methods. This also fixes #467 for duplicate SVG entityids in Jupyter notebooks (@hyanwong, #555). -
Add a
to_nexusfunction that outputs a tree sequence in Nexus format (@saunack, #550). -
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance(@daniel-goldstein, #548). -
Add an optional node traversal order in
tskit.Treethat uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder") adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411). -
Add an
orderargument to the tree visualisation functions which supports two node orderings:"tree"(the previous default) and"minlex"which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"(@brianzhang01, @jeromekelleher, #389, #566). -
Add
_repr_html_to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514). -
Remove support for
kc_distanceon trees with unary nodes (@daniel-goldstein, #508). -
Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
-
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
-
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
-
Allow sites with missing data to be output by the
haplotypesmethod, by default replacing with-. Errors are no longer raised for missing data withisolated_as_missing=True; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryErrorto TypeError, or ValueError if the missing data character clashes (@hyanwong, #426). -
Access the number of children of a node in a tree directly using
tree.num_children(u)(@hyanwong, #436). -
User specified allele mapping for genotypes in
variantsandgenotype_matrix(@jeromekelleher, #430). -
New
root_thresholdoption for the Tree class, which allows us to efficiently iterate over 'real' roots when we have missing data (@jeromekelleher, #462). -
Add
tree.as_dict_of_dicts()function to enable use with networkx. See the tutorial (@winni2k, #457). -
Add
tree_sequence.to_macs()function to convert tree sequence to MACS format (@winni2k, #727). -
Add a
keep_input_rootsoption to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
- #453 - Fix LibraryError when
tree.newick()is called with large node time values (@jeromekelleher, #637).
Deprecated
- The
sample_countsfeature has been deprecated and is now ignored. Sample counts are now always computed. - For
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantstheimpute_missing_dataargument i...
C API 0.99.5
Breaking changes
- The macro
TSK_IMPUTE_MISSING_DATAis renamed toTSK_ISOLATED_NOT_MISSING(@benjeffery, #716, #794).
New features
- Add a
TSK_KEEP_INPUT_ROOTSoption to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Python 0.3.0beta2
BETA PRE-RELEASE
Second beta of 0.3.0
Changes from beta 1
-
Mutation times can be a mixture of known and unknown as long as for each individual site they are either all known or all unknown (@benjeffery, #761).
-
Metadata and schemas are stored as canonical JSON to aid byte-wise comparison. Metadata schemas have improved equality methods. (@benjeffery, #764).
Bugfixes
- Fix too small buffer for newick, causing
LibraryErrorfortree.newick()(@jeromekelleher, #754).
C API 0.99.4
Note
- The
TSK_VERSION_PATCHmacro was incorrectly set to4for 0.99.3, so both
0.99.4 and 0.99.3 have the same value.
Changes
- Mutation times can be a mixture of known and unknown as long as for each
individual site they are either all known or all unknown (@benjeffery, #761).
Bugfixes
- Fix for including core.h under C++ (@petrelharp, #755).
Python 0.3.0.beta1
BETA PRE-RELEASE
Major feature release for metadata schemas, set-like operations, mutation times,
SVG drawing improvements and many others. This release comes with wheels for windows, os and linux.
Breaking changes
-
The default display order for tree visualisations has been changed to
minlex(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree". -
File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError. Loading from an empty file now raises andEOFError. -
Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequenceobject which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709). -
The
TableCollection objectno longer implements the iterator protocol. Previouslylist(tables)returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_mapandTreeSequence.tables_dictattributes, which perform the same function (@jeromekelleher, #500, #694).
New features
-
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subsetsubsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.unionforms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623). -
Mutations now have an optional double-precision floating-point
timecolumn. If not specified, this defaults to a particularNaNvalue (tskit.UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see :ref:sec_mutation_requirements. Also added functionTableCollection.compute_mutation_times. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672). -
Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
-
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
-
Tables with a metadata column now have a
metadata_schemathat is used to validate and encode metadata that is passed toadd_rowand decode metadata on calls totable[j]and e.g.tree_sequence.node(j)See :ref:sec_metadata(@benjeffery, #491, #542, #543, #601). -
The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
-
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()andtskit.TreeSequence.draw_svg()methods. This also fixes #467 for duplicate SVG entityids in Jupyter notebooks (@hyanwong, #555). -
Add a
nexusfunction that outputs a tree sequence in Nexus format (@saunack, #550). -
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance(@daniel-goldstein, #548). -
Add an optional node traversal order in
tskit.Treethat uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder") adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411). -
Add an
orderargument to the tree visualisation functions which supports two node orderings:"tree"(the previous default) and"minlex"which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"(@brianzhang01, @jeromekelleher, #389, #566). -
Add
_repr_html_to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514). -
Remove support for
kc_distanceon trees with unary nodes (@daniel-goldstein, #508). -
Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
-
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
-
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
-
Allow sites with missing data to be output by the
haplotypesmethod, by default replacing with-. Errors are no longer raised for missing data withimpute_missing_data=False; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryErrorto TypeError, or ValueError if the missing data character clashes (@hyanwong, #426). -
Access the number of children of a node in a tree directly using
tree.num_children(u)(@hyanwong, #436). -
User specified allele mapping for genotypes in
variantsandgenotype_matrix(@jeromekelleher, #430). -
New
root_thresholdoption for the Tree class, which allows us to efficiently iterate over 'real' roots when we have missing data (@jeromekelleher, #462). -
Add
tree.as_dict_of_dicts()function to enable use with networkx. See :ref:sec_tutorial_networkx(@winni2k, #457).
Bugfixes
- #453 - Fix LibraryError when
tree.newick()is called with large node time values (@jeromekelleher, #637).
Deprecated
- The
sample_countsfeature has been deprecated and is now ignored. Sample counts are now always computed.