HomoplasyFinder is an open-source tool designed to identify homoplasies on a phylogeny and its nucleotide alignment. HomoplasyFinder uses the consistency index to identify sites in the nucleotide alignment that are inconsistent with the phylogeny provided. The current R package was written to allow easy use of the Java code (which HomoplasyFinder uses) in R. Full documentation is provided on the HomoplasyFinder wiki.
install.packages("devtools")
devtools::install_github("JosephCrispell/homoplasyFinder")
devtools::install_github("JosephCrispell/basicPlotteR") # Makes annotated plotted phylogeny prettier :-)
library(homoplasyFinder)
# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
# Get the current working directory
workingDirectory <- paste0(getwd(), "/")
# Run the HomoplasyFinder jar tool
inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile,
fastaFile=fastaFile,
path=workingDirectory)
# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")
# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)
# Read in the annotated tree
tree <- readAnnotatedTree(workingDirectory)
# Plot the annotated tree
plotAnnotatedTree(tree, inconsistentPositions, fastaFile)
You should get the following plot:
HomoplasyFinder can now calculate the consistency of INDELs (or any regions) on a phylogeny. To do this simply replace the FASTA file with a CSV formatted table reporting the presence/absence of regions. Here is an example of a format:
start,end,isolateA,isolateB,isolateC
34802,35208,0,1,0
39068,39069,0,0,1
Test it out using the following:
# Find the FASTA and tree files attached to package
presenceAbsenceFile <- system.file("extdata", "presenceAbsence_INDELs.csv", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
# Get the current working directory
workingDirectory <- paste0(getwd(), "/")
# Run the HomoplasyFinder jar tool
inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile,
presenceAbsenceFile=presenceAbsenceFile,
path=workingDirectory)
# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")
# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)
Java source code is available here and R package (wrapper) code here.
If you use HomoplasyFinder in your research, it would be great if you could cite the following article: Crispell, J., Balaz, D., & Gordon, S. V. (2019). HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny. Microbial Genomics. https://doi.org/10.1099/mgen.0.000245