species identification in rockfishes

19 December, 2022

Preliminaries
Data files
Data filtering

This is the repository with data and analyses for rockfish species identification. This project arose from ambiguous identifications of larval rockfishes in Monterey Bay, California, with expanded coverage of southern and more northern species thanks to collaborators within NOAA science centers.

We use amplicon-sequencing on our MiSeq instrument to generate genotype data for samples. Using these samples, we created a vcf file that contains variant sites from 54 species of rockfishes, mostly from the Northeast Pacific.

sebastes_sppID_combined_filtered.recode.vcf

Preliminaries

In order to perform Sebastes species ID…

Process the GTseq run with the samples you wish to identify using the combined Sebastes vcf in MICROHAPLOT. Scripts for processing raw data through microhaplot are available here:

https://github.com/AFSC-Genetics/GTseq_microhaplot

Open .rds files in the microhaplot shiny app. Download diploid genotype tables to import into R.

Data files

.csv files created by the R software program MICROHAPLOT should be placed in the Rmd_AFSC/microhaplot_outputs directory, these data are read into R in 02-test-pca-w-unknowns.Rmd.
The MiSeq sample sheets associated with these genotype data should be placed in data/sample_sheets. This is the means by which sample ID numbers from the sequencing runs can be connected back to NMFS DNA ID as recorded in meta data and the repository.

Data filtering

Remove the 6 loci from the baseline that were most prone to missing data and/or often have more than 3 haplotypes passing filter across individuals:

tag_id_1166 (often >2 haplotypes)
tag_id_934 (missing data and sometimes >2 haplotypes)
tag_id_2513 (appears to be some sort of repetitive element for some species outside of KGBC)
tag_id_1871 (failed in many individuals)
tag_id_1399 (failed in many individuals)
tag_id_914 (failed in many individuals)

Typically we institute a missing data threshold of ~10% for excluding samples, but such a criterion doesn’t make sense given the dramatic ascertainment bias from kelp rockfish to less closely related species across phylogenetic distance. We can explore the missing data criterion in more depth, since there might be fixed alleles that make even a small number of loci valid for species ID.

Species assignment

An example of species identification of a mixture of samples using rubias is outlined in Rmd_AFSC/05-unknown-species-id-template.Rmd.
High and low z-scores are indicative of species (or populations of species) that were not included in the reference baseline.

Updating the baseline

The reference baseline can be updated by genotyping additional species with the same set of primers, using the same bioinformatic workflow, and then merging VCF files from the prior baseline VCF.
The newly merged VCF file can then be called for analysis in microhaplot.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
R-main		R-main
R		R
Rmd		Rmd
Rmd_AFSC		Rmd_AFSC
Rmd_new_baseline		Rmd_new_baseline
data		data
data_AFSC		data_AFSC
new_baseline_data/processed		new_baseline_data/processed
.DS_Store		.DS_Store
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
krista_cryptic_05012023.csv		krista_cryptic_05012023.csv
microhaplot_local.R		microhaplot_local.R
rockfish-species-id.Rproj		rockfish-species-id.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

species identification in rockfishes

Preliminaries

Data files

Data filtering

Species assignment

Updating the baseline

About

Releases

Packages

Languages

anita-wray/rockfish-species-id

Folders and files

Latest commit

History

Repository files navigation

species identification in rockfishes

Preliminaries

Data files

Data filtering

Species assignment

Updating the baseline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages