-
Notifications
You must be signed in to change notification settings - Fork 12
6. DATABASES
shandley edited this page Jun 16, 2021
·
1 revision
Examples of common non-biological contaminant found in virome data:
- Fasta file with any amplification primers. (ex. primerB)
- Fasta file with library adapter sequences (ex. NebNext Adapters)
- Fasta file with reverse complement of primerB and adapter truncation (ex. PrimerB + 6)
- Fasta file of vector contaminants (ex. Vector Contaminants. Note this is a zipped Fasta file)
Host reference genomes masked of viral and other (see: http://seqanswers.com/forums/showthread.php?t=42552) sequences.
- Human Masked Reference Genome
- Mouse Masked Reference Genome
- Rhesus macaque Masked Reference Genome
MMseqs2 databases
- mmseqs_pviral_aa.sh: Uniprot viral proteins clustered at 99% identity
- mmseqs_pviral_aa_check.sh: Uniclust30 + Uniprot Viral
- mmseqs_pviral_nt.sh: Refseq viral genomes + nearest neighbor
- mmseqs_pviral_nt_check.sh: 1,520 Gut Bacterial Genomes masked of viral sequences
- Phage Taxonomic Lineages