A strobemer-Based Profile Hidden Markov Model (sPHMM) approach to Taxonomic Identification of DNA Barcode Data.
To run this pipeline, the following files are needed in the same working directory in R/RStudio (R version 4.4.1 or higher):
sPHMM_Pipeline_Train&CV.RsPHMM_Functions.Rstrobemer_extract.cppstrobemer_chop.cppstrobemer_extract.hstrobemer_filter.cppstrobemer_filter_unique.cpp
HMMER3 must also be installed prior to using this pipeline (v 3.3.2 or higher)
Working directory must also include a folder containing the fasta files to be run on this pipeline ex: seqData/
A small barcode dataset (Branchiopoda) from BOLD (https://boldsystems.org/) has been added to run tests with the pipeline.
source("sPHMM_Functions.R")
The following files were renamed from the original repository:
strobemer_extract.h(formerlystrobemer.h, unmodified)strobemer_extract.cpp(formerlystrobemer.cpp, unmodified)
Original repo: https://github.com/BGI-Qingdao/strobemer_cpptest
License: GPLv3