Skip to content

6. DATABASES

shandley edited this page Jun 16, 2021 · 1 revision

Contaminant Removal Databases

Examples of common non-biological contaminant found in virome data:

  1. Fasta file with any amplification primers. (ex. primerB)
  2. Fasta file with library adapter sequences (ex. NebNext Adapters)
  3. Fasta file with reverse complement of primerB and adapter truncation (ex. PrimerB + 6)
  4. Fasta file of vector contaminants (ex. Vector Contaminants. Note this is a zipped Fasta file)

Virus+ Masked Host Reference Genomes

Host reference genomes masked of viral and other (see: http://seqanswers.com/forums/showthread.php?t=42552) sequences.

  1. Human Masked Reference Genome
  2. Mouse Masked Reference Genome
  3. Rhesus macaque Masked Reference Genome

MMseqs2 databases

  1. mmseqs_pviral_aa.sh: Uniprot viral proteins clustered at 99% identity
  2. mmseqs_pviral_aa_check.sh: Uniclust30 + Uniprot Viral
  3. mmseqs_pviral_nt.sh: Refseq viral genomes + nearest neighbor
  4. mmseqs_pviral_nt_check.sh: 1,520 Gut Bacterial Genomes masked of viral sequences

Additional files

  1. Phage Taxonomic Lineages