Skip to content

10. Reticulation Analysis

Waldir M. Berbel-Filho edited this page Jul 20, 2022 · 5 revisions

To infer phylogenetic networks using the PhyloNetworks package v. 0.14.12 in julia, we needed an SNP-based concordant factor file extracted from SNPs2CFs v. 1. 4. (Olave & Meyer, 2020). SNPs2CFs require a phased .phylip file containing unlinked SNPs. Using the .vcf file generated in Dataset III as input, we used Beagle v. 5.2 to phase the data.

To phase a .vcf file using Beagle:

java -Xmx8g -jar beagle.28Jun21.220.jar gt=ES-Article--AllSamples_NoKbraNoKgra_WithWGSs_SNPs.Subset out=ES-Article--AllSamples_NoKbraNoKgra_WithWGSs_SNPs.Subset__PHASED impute=FALSE

To convert a .vcf file into a .phylip (in R):

library(pegas)
vcf <- read.vcf("ES-Article--AllSamples_NoKbraNoKgra_WithWGSs_SNPs.Subset__PHASED.vcf", to=5813)
phased.vcf2phylip(vcf.name="ES-Article--AllSamples_NoKbraNoKgra_WithWGSs_SNPs.Subset__PHASED.vcf", total.SNPs=5813, output.name=NULL)

To run SNPs2CFs (in R):

library("pegas")
source("functions.R")
output <- SNPs2CF(seqMatrix="ES-Article--AllSamples_NoKbraNoKgra_WithWGSs_SNPs.Subset__PHASED.vcf.phy", ImapName = "Imap.txt", outputName="SNPs2CF_nquartets1k.csv", between.sp.only = T, save.progress = TRUE, cores=5, bootstrap = TRUE, boots.rep = 1000, n.quartets=1000)

The SNP-based concordant factors generated using SNPs2CF were then used as input for our reticulation analysis using Phylonetworks v. 0.14.12. Using the tree generated in Dataset IV as our input tree for no reticulation events (hmax=0), we tested a range of 1-7 reticulation events as follows:

using PhyloNetworks
using PhyloPlots

#load .csv file with CF generates in SNPs2CF
SNPs2CF_KryptoTree = joinpath("SNPs2CF_nquartets1k.csv")
#Print SNPs2CF_KryptoTree
SNPs2CF_KryptoTree = readTableCF(SNPs2CF_KryptoTree)

#load .tree file with tree file as the starting point to look for Networks
KryptoTreeFile = joinpath("Krypto_OnlyMangrove.tree")
KryptoTree_tree = readMultiTopology(KryptoTreeFile)[1]
rootatnode!(KryptoTree_tree,"Koce")
# Plot the tree
plot(KryptoTree_tree, :R)

# Run PhyloNetworks for hmax=0 and KryptoTree_tree as the starting tree (hmax =1)
 net0 = snaq!(KryptoTree_tree, SNPs2CF_KryptoTree,hmax=0, filename="net0", seed=1234, runs=20)

# Run PhyloNetworks for hmax=1 and KryptoTree_tree as the starting tree (hmax =1)
 net1 = snaq!(net0, SNPs2CF_KryptoTree,hmax=1, filename="net1", seed=4567, runs=20)
 
# Run PhyloNetworks for hmax=2 and KryptoTree_tree as the starting tree (hmax =2)
 net2 = snaq!(net1, SNPs2CF_KryptoTree,hmax=2, filename="net2", seed=2345, runs=20)
 
 # Run PhyloNetworks for hmax=3 and KryptoTree_tree as the starting tree (hmax =3)
 net3 = snaq!(net2, SNPs2CF_KryptoTree,hmax=3, filename="net3", seed=3456, runs=20)
 
 # Run PhyloNetworks for hmax=4 and KryptoTree_tree as the starting tree (hmax =4)
 net4 = snaq!(net3, SNPs2CF_KryptoTree,hmax=4, filename="net4", seed=4567, runs=20)
 
 # Run PhyloNetworks for hmax=5 and KryptoTree_tree as the starting tree (hmax =5)
 net5 = snaq!(net4, SNPs2CF_KryptoTree,hmax=5, filename="net5", seed=5678, runs=20)
 
 # Run PhyloNetworks for hmax=6 and KryptoTree_tree as the starting tree (hmax =6)
 net6 = snaq!(net5, SNPs2CF_KryptoTree,hmax=6, filename="net6", seed=6789, runs=20)
 
 # Run PhyloNetworks for hmax=7 and KryptoTree_tree as the starting tree (hmax =7)
 net7 = snaq!(net6, SNPs2CF_KryptoTree,hmax=7, filename="net7", seed=7891, runs=20)

We then plotted the log-likelihood values for each reticulation value (hmax) considered.