layout | title | categories | usemathjax |
---|---|---|---|
page |
Standard tree inference |
jekyll update |
true |
For gene tree reconstructions, we will use a dataset of Australasian monitor lizards (genus Varanus) from Pavón-Vázquez et al. (2021). It consists of 388 nuclear loci obtained through anchored hybrid enrchment, a technique for capturing orthologous regions of the genome.
To estimate trees from these loci, we will rely on RAxML, a program for efficient tree inference based on maximum likelihood (ML). We will also estimate node supports based on bootstrap calculations.
Download the latest version from GitHub. Alternatively, if you use UNIX (Linux or Mac) and have git
installed, open the terminal and type:
git clone https://github.com/stamatak/standard-RAxML.git
to download the repository. Uncompress the .zip
and move to the newly created folder.
Windows executables are already included in the folder. To run the software, open the command prompt (cmd.exe
) and type the path to the executable
# use the `cd` command to change the directory
cd \path\to\raxml\WindowsExecutables_v8.2.10
.\raxmlHPC
# The following error message must appear:
# Error, you must specify a model of substitution with the "-m" option
Be aware that Windows uses the backslash \
as the path-component separator, while Unix uses the forward slash /
.
First, we need to compile the software before it can be used.
Install the standard version:
make -f Makefile.gcc # install the standard version
Alternatively, you can install the multicore version that allows the use of multiple CPU processors:
rm *.o # remove previously compiled files if you installed a different version
make -f Makefile.PTHREADS.gcc
This will create an executable raxmlHPC
(or raxmlHPC-PTHREADS
, depending on the compiled version) that can be called:
./raxmlHPC
# The following error message must appear:
# Error, you must specify a model of substitution with the "-m" option
We will estimate a tree of one locus based on the
The following code allows to infer a ML tree and estimate node supports in a single run:
./raxmlHPC -s locus177.phylip -n 177.boot -m GTRGAMMA -f a -N 100 -p 2334 -x 563454
-s
: name of the sequence file (include the path to the file if it is located in a different folder)-n
: name of the output files (the files generated during the run will have.177.stand
appended to the end)-m
: substitution model-f
: Specify one of the different algorithms available in RAxML. If nothing is specified (like in our first run), by default it executes the standard hill climbing algorithm to perform the tree search (which is equivalent to-f d
). Thea
option tells RAxML to conduct a rapid Bootstrap analysis and search for the best-scoring ML tree in a single run-N
: number of bootstrap pseudoreplicates-p
: random number seed to generate a parsimony starting tree (can be any integer)-x
: specify an integer number (random seed) and turn on rapid bootstrapping
Further command options are detailed in the software manual, or can be explored using:
./raxmlHPC -help
The maximum likelihood tree is printed in the RAxML_bestTree.1.stand
file. We can visualize the tree in FigTree (download from here) and, optionally, export in any image format. To visualize this tree and the support values open the file in FigTree. On the left-hand side of the screen select: Branch Labels → Display → label
.
Note: If we want output files to be written in a specific folder, we have to execute RAxML in that folder.
Suppose that I want output files in a folder calledoutput/
:
cd output/
path/to/raxmlHPC -s path/to/locus177.phylip -n 177.boot -m GTRGAMMA -f a -N 100 -p 2334 -x 563454
To simplify this command, you can add the RAxML executable to the path (follow this guide). This allows to execute the program from any directory.
Let's estimate a tree for a different locus:
./raxmlHPC -s locus256.phylip -n 256.boot -m GTRGAMMA -f a -N 100 -p 2334 -x 563454
Visualize both trees (RAxML_bipartitions.177.boot
and RAxML_bipartitions.256.boot
). Some of the phylogenetic relationships are different, what could be the reason/s?
Varanus komodoensis is the famous Komodo dragon.
It is possible to automatically set a run for all 388 gene trees using the code for a loop. Note that all .phy
in the dataset folder are named L_1.phy
, L_2.phy
... L_388.phy
. Thus, we can set a loop with an iterator i
taking values from 1 to 388 to call all the input .phy
into RAxML:
for i in {1..388}
do
./raxmlHPC -s L_$i.phy -n $i.boot -m GTRGAMMA -f a -N 100 -p 2334 -x 563454
done