twisst2

twisst2 is a tool for topology weighting. Topology weighting summarises genealogies in terms of the relative abundance of different sub-tree topologies. It can be used to explore processes like introgression and it can aid the identification of trait-associated loci.

twisst2 has a number of important improvements over the original twisst tool. Most importantly, twisst2 incorporates inference of the ancestral recombination graph (ARG) or tree sequence - local genealogies and their breakpoints along the chromosome. It does this using sticcs. sticcs is a model-free approach and it does not require phased data, so twisst2 can run on unphased genotypes of any ploidy.

The recommended way to run twisst2 is to start from polarised genotype data. This means you either need to know the ancestral allele at each site, or you need an appropriate outgroup(s) to allow inference of the derived allele.

An alternative way to run it is by first inferring the ancestral recombination graph (ARG) tree sequence using a different tool like Relate or tsinfer. However, this typically requires phased genotypes, and my tests suggest that twisst2+sticcs is more accurate than other methods anyway.

Publications

The general concept of topology weighting is described by Martin and Van Belleghem 2017.
Combining genealogy inference with sticcs and topology weighting with twisst2 is described by Martin 2025.

Installation

First install sticcs by following the intructions there.

If you would like to analyse tree sequence objects from tools like msprime and tsinfer, you will also need to install tskit yourself. To install twisst2:

git clone https://github.com/simonhmartin/twisst2.git

cd twisst2

pip install -e .

Command line tool

Starting from unphased (or phased) genotypes

To perform tree inference and topology weighting, twisst2 takes as input a modified vcf file that contains a DC field, giving the count of derived alleles for each individual at each site.

Once you have a vcf file for your genotype data, make the modified version using sticcs (this needs to be installed, see above):

sticcs prep -i <input vcf> -o <output vcf>  --outgroup <outgroup sample ID>

If the vcf file already has the ancestral allele (provided in the AA field in the INFO section), then you do not need to specifiy outrgoups for polarising.

Now you can run the twisst2 to count sub-tree topologies:

twisst2 sticcstack -i <input_vcf> -o <output_prefix> --max_subtrees 512 --ploidy 2 --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_file

Starting from pre-inferred trees or ARG (e.g. Relate, tsinfer, argweaver, Singer)

twisst2 trees -i <input_file> -o <output_prefix> --groups <groupname1> <groupname2> <groupname3> <groupname4> --groups_file

Output

<output_prefix>.topocounts.tsv.gz gives the count of each group tree topology for each interval.
<output_prefix>.intervals.tsv.gz gives the chromosome, start and end position of each interval.

Full command options

Topology weighting from VCF: `twisst2 sticcstack`

Command Option	Value type	Description	Required?
`-i`, `--input_vcf`	FILE	Input VCF file with DC field (make thsi with sticcs prep)	Required
`-o`, `--out_prefix`	CHAR	Output file prefix	Required
`--group_names`	CHAR	Name for each group (separated by spaces)	Required unless `--groups_file` is specified
`--groups`	CHAR	Sample IDs (separated by commas) for each group (separated by spaces)	Required unless `--groups_file` is specified
`--groups_file`	FILE	Optional file with a column for sample ID and group	Required unless `--groups` and `--group_names` are specified
`--max_subtrees`	INT	Maximum number of subtrees to consider; we recommend 512 as a good compromise between speed and accuracy	Required
`--ploidy`	INT	Sample ploidy if all the same. Use `--ploidy_file` if samples differ.	Defaults to 2 if not specified and `--ploidy_file` not specified
`--ploidy_file`	FILE	File with two columns for sample ID and ploidy, separated by whitespace	Required if ploidy differs among individuals
`--output_topos`	FILE	Output file for topologies used	Optional
`--variant_range_only`		Do not extend first and last trees to the ends of the chromosome; use this for region-specific analyses	Optional
`--unrooted`		Unroot topologies (results in fewer topologies and therefore fewer counts in the output)	Optional
`--no_second_chances`		Do not consider SNPs that are separated by incompatible SNPs	Optional, not recommend
`--single_pass`		Single pass when building trees. Only relevant for ploidy > 1	Optional, not recommended
`--verbose`		Verbose output	Optional

Topology weighting from tree sequence: `twisst2 trees`

Command Option	Value type	Description	Required?
`-i`, `--input_file`	FILE	Input trees file	Required
`-f`, `--input_format`	CHAR	Input file format {tskit, argweaver, newick}	Required
`-o`, `--out_prefix`	CHAR	Output file prefix	Required
`--chrom_name`	CHAR	Chromosome name for output intervals file	Defaults to 'unknown_chrom'
`--unrooted`		Unroot topologies (results in fewer topologies and therefore fewer counts in the output)	Optional
`--output_topos`	FILE	Output file for topologies used	Optional
`--group_names`	CHAR	Name for each group (separated by spaces)	Required unless `--groups_file` is specified
`--groups`	CHAR	Sample IDs (separated by commas) for each group (separated by spaces)	Required unless `--groups_file` is specified
`--groups_file`	FILE	Optional file with a column for sample ID and group	Required unless `--groups` and `--group_names` are specified
`--verbose`		Verbose output	Optional

R functions for plotting

Some functions for importing and plotting are provided in the plot_twisst/plot_twisst.R script. For examples of how to use these functions, see the plot_twisst/example_plot.R script.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
example		example
plot_twisst		plot_twisst
twisst2		twisst2
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twisst2

Publications

Installation

Command line tool

Starting from unphased (or phased) genotypes

Starting from pre-inferred trees or ARG (e.g. Relate, tsinfer, argweaver, Singer)

Output

Full command options

Topology weighting from VCF: `twisst2 sticcstack`

Topology weighting from tree sequence: `twisst2 trees`

R functions for plotting

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

twisst2

Publications

Installation

Command line tool

Starting from unphased (or phased) genotypes

Starting from pre-inferred trees or ARG (e.g. Relate, tsinfer, argweaver, Singer)

Output

Full command options

Topology weighting from VCF: twisst2 sticcstack

Topology weighting from tree sequence: twisst2 trees

R functions for plotting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Topology weighting from VCF: `twisst2 sticcstack`

Topology weighting from tree sequence: `twisst2 trees`

Packages