Skip to content

Validation test

Merly Escalona edited this page Feb 12, 2018 · 6 revisions

Phylogenetic reconstruction from simulated alignments

To test whether NGSPhy is working as expected we performed several sanity checks and test runs. Here we describe a particular experiment to check that the simulated alignments have in fact evolved under the user-defined gene tree. The simulation process started from the gene tree in Figure 1, using the tip 1_0_0 as anchor (i.e., providing a known sequence corresponding to that tip).

Validation test: Tree

Figure 1: Gene-tree with five tips used for the validation. Numbers above the branches represent branch lengths in expected number of substitutions.

We ran 100 replicates of NGSphy in inputmode 3 (single gene tree with user-defined anchor sequence). The sequence alignments were simulated under a JC69 model (Jukes and Cantor, 1969), equal base frequencies and a length of 1000 bp. The simulated alignments were used to reconstruct maximum likelihood (ML) trees with raxml-ng, using the (known) JC69 model. Ten heuristic searches were performed per alignment, starting on maximum parsimony trees. The Robinson-Foulds (RF) (Robinson and Foulds, 1981) and Branch Score distances (BSD) (Kuhner and Felsenstein, 1994) were used to compare the input gene tree and the estimated ML trees respect to topology and branch lengths, respectively. All RF scores were always zero, while the BSD were always minimal (mean = 0.0555, standard deviation = 0.0175), suggesting that the alignment simulation is correct.

In the input mode used for this test, the anchor tip is used to re-root the tree, and then used by indelible-ngsphy to generate the locus alignment. This process involves the generation of a zero-branch-length between the anchor tip and what is considered the root node by indelible. To show that this re-rooting process was not leading to any error and that the generated anchor sequence is identical to the one defined by the user as anchor, we measured the p-distance between them. In all cases this distance was zero.

Execution

Script to run this test is under under ngsphy/manuscript/supp.material/scripts/supp.test1.sh

To run this test you need the following files:

To execute this script it is required to have installed:

  1. Following the script file, first we have to state:
    • Where the NGSphy repository is located
    • Where will the output be written
    • The name of the output folder
    • A random seed number 2.Organize the data, generate output folders and copy it from the repository to the corresponding output folders.
  2. Assuming NGSphy and its dependencies are properly installed, we call for 100 NGSphy replicates.
  3. Call raxml-ng, for each of the generated replicate alignments.
  4. We generate 2 files with the paths for the ML trees from raxml-ng, and for the rerooted trees generated in NGSphy
  5. Finally, using the R code at the end of the script, it will be possible to generate a plot with the branch scores and RF distances.

References

  • Jukes, T.H. and Cantor, C.R. (1969) Evolution of Protein Molecules. In, Mammalian Protein Metabolism., pp. 21–132.
  • Kuhner, M.K. and Felsenstein, J. (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol., 11, 459–468.
  • Robinson, D.F. and Foulds,L.R. (1981) Comparison of phylogenetic trees. Math. Biosci., 53, 131–147.