-
Notifications
You must be signed in to change notification settings - Fork 12
4. PREPROCESSING
shandley edited this page Mar 16, 2021
·
3 revisions
Heactomb is designed to perform rigorous quality control prior to assembly and taxonomic assignment. This rigor is justified primarily by the philosophy of Garbage In, Garbage out (GIGO). More specifically, the following issues are dealt with to ensure that non-contaminat biological seqeunce is reserved for downstream analysis.
1. Non-biological sequence removal (primers, adapters)
2. Host sequence removal
3. Removal of redundant sequences (clustering)
- Creation of sequence count table
- Calculation of sequence properties (e.g. GC content, tetramer frequencies)
4. Assembly
- Sample assembly
- Population assembly
- Contig abundance esitmation
The preprocessing rule also goes ahead and does assembly as contigs are an important prerequisite for many downstream analysis.
During the library production
- [Official Snakemake documentation](https://snakemake.readthedocs.io/en/stable/)
- [bbtools](https://jgi.doe.gov/data-and-tools/bbtools/)
- [seqkit](https://bioinf.shenwei.me/seqkit/)
- [minimap2](https://github.com/lh3/minimap2)
- [mmseqs2 GitHub](https://github.com/soedinglab/MMseqs2)
- [Megahit GitHub](https://github.com/voutcn/megahit)
- [Flye Github](https://github.com/fenderglass/Flye)