layout | title | categories | usemathjax |
---|---|---|---|
page |
Species tree inference |
jekyll update |
true |
Species tree estimation is mainly based on the multispecies coalescent model (MSC; Liu et al. 2021). This model accomodates gene trees within species trees, while allowing for incomplete lineage sorting (ILS).
Modified from (Liu et al. 2021)Species tree inference methods can be broadly classified into summary (also termed "heuristic") and full-likelihood approaches. The first class reduce the information in the sequences to summary statistics, while the second perform estimations directly from the alignments. As a result, the summary-based approaches are much more faster than full-likelihood methods. Here, we will use two summary-based methods, one of them takes previously reconstructed gene trees as input data (ASTRAL), and the other one (SVDquartets) can work with single-nucleotide-polymorphism (SNPs) datasets directly (i.e., gene tree inference is not needed).
astral belongs to a family of species tree methods known as two-step because it uses estimated gene trees from sequence alignments. Here, we will use the maximum likelihood trees inferred from the 388 alignments.
Download the software from GitHub, or in Unix you can type in the terminal:
git clone https://github.com/smirarab/ASTRAL.git
This program is written in Java, so you need to install Java first. To run the sofware, execute:
java -jar astral.5.7.8.jar
This will print the list of available options. If no errors are printed, the installation was succesfull. The following command estimates a species tree from input gene trees:
java -jar astral.5.7.8.jar -i monitors_trees.tre -o monitor_sptree.tre
-i
: file containing input gene trees in newick format (a single file where each gene tree is in a different line)-o
: filename for storing the output species tree
As this method estimates an unrooted tree, it is advisable to include a known outgroup species.
svdquartets
is an algorithm that computes species trees directly from SNP data. However, it is not a full-likelihood approach since the data is summarized as pooled site-pattern counts. This algorithm is implemented in PAUP* (it can be downloaded from http://phylosolutions.com/paup-test/).
We will use a SNPs matrix collected through RADseq from species belonging to the Liolaemus kingii group. This group comprises lizards distributed in the Patagonian Steppe and is characterized by a complex diversification history as a result of rapid diversifications and gene flow between species (Sánchez et al. 2023).
There are three ways to use PAUP:
- Interactively from a GUI (Graphical User Interface; i.e. an interface that uses icons, menus, and the mouse)
- Interactively in the terminal (CLI, command line interface); i.e. entering commands one by one to read the data and execute the analysis
- Including all the necessary commands in the sequence file and calling this file from the terminal to automatically read and execute the analysis
Windows and Linux users can use the the GUI version that gives you an friendly interface (command line users, see below). Open PAUP and load the file liolaemus_snps.nex
containing the SNPs (go to File → Open
). Next, define the outgroup sequence, it is lineomaculatus
; go to Data → Define outgroup
and select this taxa. To perform the SVDq analysis go to Analysis → SVDQuartets
, the following window must appear:
Select the same options and execute the analysis, it should be finish quickly.
A species tree with support values will appear on the screen, if you want to save this tree to a file go to Trees → Save trees to file
, specify a name for the file and select the Newick format at the bottom of the window.
In case of looking to use a command line options, then simply open paup executable (double click), and type:
cd /my/path/to/data
execute liolaemus_snps.nex
This will load the input data into PAUP. Then, define outgroup, run SVDquartets including 100 bootstrap pseudoreplicates and save the tree using the commands:
outgroup lineomaculatus_0 lineomaculatus_1;
svdq evalQuartets=random nquartets=100000 taxpartition=species bootstrap=standard nreps=100 nthreads=2;
rootTrees rootMethod=outgroup;
savetrees file=SVDquartets.tre format=Newick brlens=yes
outgroup
: define outgroup samples in the matrixsvdq
: calls the SVDquartets algorithmevalQuartets
: use "x" random quartets (number specified in the next flag)nquartets
: number of quartets to sampletaxpartition
: this is the partition that specifies the individual-species associations (already included at the bottom of the.nex
matrix, you can check this in a text editor, e.g. Notepad)bootstrap
: perform standard bootstrapnreps
: number of pseudoreplicates for bootstrap supportnthreads
: number of threads to run in parallelrootTrees
: root tree using the outgroupsavetrees
: save trees underSVDquartets.tre
name