Skip to content

Latest commit

 

History

History
148 lines (107 loc) · 6.33 KB

03_network.md

File metadata and controls

148 lines (107 loc) · 6.33 KB
layout title categories usemathjax
page
Network estimation
jekyll update
true
<script type="text/javascript" charset="utf-8" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML, https://vincenttam.github.io/javascripts/MathJaxLocal.js"></script>

Phylogenetic networks are an extension of phylogenetic trees used to model gene flow events between species or populations (see the figure). Specifically, these events are modelled by reticulation edges that summarizes gene flow that might have occurred over a period of time into a single instantaneous event. These edges have a parameter associated ($\varphi$) that represents the proportion of alleles transferred during the entire period of gene flow.

A popular method to estimate phylogenetic networks is PhyloNetworks (Solís-Lemus et al. 2017). It is based on a pseudolikelihood function over concordance factors (CFs) of quartets of taxa (four taxa), wich increase computational tractability. CFs are calculated as the proportion of gene trees supporting the three possible splits in a given quartet:

CFs calculation (Olave and Meyer 2020)

A CF table has to be obtained first, either from gene trees or SNP datasets. Here, we will go through both options.

Getting started: install Julia and PhyloNetworks

PhyloNetworks is a package of Julia, an interactive programming language (like R). To install Julia, go to http://julialang.org/downloads/.

To install the package, first open a terminal and type:

julia

Now we are inside Julia. To install the package type:

Pkg.add("PhyloNetworks");
using PhyloNetworks; # to load the package, analogous to R's `library(package)`

Let's load a toy tree with the readTopology() function to check that the package is correctly installed

net = readTopology("(A,(B,(C,D)));");
net
# HybridNetwork, Rooted Network
# 6 edges
# 7 nodes: 4 tips, 0 hybrid nodes, 3 internal tree nodes.
# tip labels: A, B, C, D
# (A,(B,(C,D)));

We loaded a tree but is says it is a network. This is not an error, by definition, trees are networks without reticulation edges.

Calculation of Concordance Factors using gene trees

To compute CF from gene trees we will use the 388 gene trees we estimated using RAxML. The file monitors_trees.tre has all 388 gene trees in Newick format. Load them into Julia using:

cd("my/path/here")
genetrees = readMultiTopology("monitors_trees.tre")

Now we can compute the CFs using

CF = readTrees2CF("monitors_trees.tre")

This function automatically computes the CF table from the gene trees and exports a file called tableCF.txt

Calculation of Concordance Factors from SNP data

It is also possible to get a CF table bypassing gene tree reconstructions, by simply having SNP data as input. Note that each SNP is assumed to be unlinked. Thus, if you have a dataset such as RADseq you want to randomly sample one SNP per locus. For whole genome sequences, sample one SNP separated far enough to reduce linkage, for example 1 SNP every 10 Kb or 50 Kb.

Different R functions were developed to compute the table of CFs directly from a SNPs matrix (Olave and Meyer 2020), which we will be using here on the Liolaemus dataset.

First, clone the contents from here and install R and Rstudio, if you don't have them yet.

Now, open Rstudio and run this code

source("my/path/to/functions.R")
setwd("my/path/to/working/directory")

#install packages
install.packages("foreach", "doMC")
library("foreach", "doMC")

# function to convert vcf into phylip format
vcf2phylip(wd=getwd(), vcf.name="liolaemus_snps.vcf", total.SNPs=8645, random.phase = T, replace.missing = T, output.name=NULL, cores=1)

#calculate CF
SNPs2CF(seqMatrix = "liolaemus_snps.phy", ImapName = "Imap.txt", outgroupSp = "lineomaculatus",
         indels.as.fifth.state = FALSE,  
         bootstrap = TRUE, boots.rep = 100, 
         outputName = "SNPs2CF.csv",
         n.quartets = 100, between.sp.only = TRUE,
         save.progress = FALSE,
         cores = 1)

Once you got the CF table, now you are ready to get a network using PhyloNetworks! Return to Julia and run

CF = readTableCF("SNPs2CF.csv")

Network estimation

Now we return to Julia. We will load the CF table and an estimate of the species tree. This can be the tree inferred with svdquartets

sppTree = readTopology("SVDquartets.tre");

Now we are ready to estimate a network. This will be accomplished with the snaq!() function. It estimates a network that fits observed quartet concordance factors (CFs) using maximum pseudo-likelihood. The argument hmax determines the maximum number of hybridizations allowed.

net_h0 = snaq!(sppTree, CF, hmax = 0, filename = "net0", runs = 1)
net_h1 = snaq!(net_h0, CF, hmax = 1, filename = "net1", runs = 1)

Let's take a look to the obtained network

Pkg.add("PhyloPlots") # to visualize the networks
using PhyloPlots
plot(net_h1, showgamma = true, style = :majortree, arrowlen = 0.2)

The networks returned by the method are not rooted, so it is convinient to include an outgroup species in the datset to properly root the networks after the estimation. In our dataset, the outgroup species is Liolaemus lineomaculatus:

net_h1.names # explore the names of the terminals
rootatnode!(net_h1, "lineomaculatus") # root at the outgroup node
plot(net_h1, showgamma = true, style = :majortree, arrowlen = 0.2) # plot again

Compute Bootstrap on networks

To asses uncertainty, we can compute bootstrap calculation based on the confidence intervals at the CF table

using DataFrames, CSV
CF = DataFrame(CSV.File("SNPs2CF.csv"); copycols = false)
bootnet = bootsnaq(net_h1, CF, hmax = 1, filename = "bootstrap_nets")