This repo contains scripts used for de novo tule elk genome assembly and SNP calling as described in Titus' blog post.
Software versions (also listed in hpcc.modules file):
- FastQC v0.11.3
- Trimmomatic v0.30
- khmer v2.0
- bwa v0.7.7.r441
- SAMTools v1.2
- MEGAHIT v1.0.5 (installed by cloning the github repo)
Assembly steps:
- Quality Evaluation using FastQC (no script for this, just
fastqc *.fq.gz
in the same directory as the files) - Trimming using trimmomatic - interleave-to-before-asm.sh (great script name, I know)
- Interleaving reads using khmer - interleave-to-before-asm.sh
- Assembly using MEGAHIT - megahit-asm.sh (I was initially using velvet but found it too slow, but the scripts are still up)
- Assembly evaluation using QUAST - quast.sh (Results are available for viewing here)
SNP calling and determining heterozygous sites:
- Using assembly file, mapping using bwa and Samtools and call polymorphic sites using freebayes - Example for one elk: 1339-mapping-snp-calling.sh
- Optional - generate mapping stats - mapping-stats.sh
Because this was my first time mapping, I made a diagram for my own reference that might be helpful to others in a similar position:
The tule elk (Cervus elaphus nannodes) is a California-endemic subspecies that underwent a major genetic bottleneck when its numbers were reduced to as few as 3 individuals in the 1870s (McCullough 1969; Meredith et al. 2007). Since then, the population has grown to an estimated 4,300 individuals which currently occur in 22 distinct herds (Hobbs 2014). Despite their higher numbers today, the historical loss of genetic diversity combined with the increasing fragmentation of remaining habitat pose a significant threat to the health and management of contemporary populations. As populations become increasingly fragmented by highways, reservoirs, and other forms of human development, risks intensify for genetic impacts associated with inbreeding. By some estimates, up to 44% of remaining genetic variation could be lost in small isolated herds in just a few generations (Williams et al. 2004). For this reason, the Draft Elk Conservation and Management Plan and California Wildlife Action Plan prioritize research aimed at facilitating habitat connectivity, as well as stemming genetic diversity loss and habitat fragmentation (Hobbs 2014; CDFW 2015).
You can read more about this on Titus' blog until we get the paper written.