This repo is part of the Microflora Danica (MFD) project and it reproduces the results from "Section VI: MAGs from Danish habitats double the known species fraction in metagenomes" of the MFD manuscript.
Please download the input file from Zenodo.
The SingleM metapackakge with GTDB R214 supplemented with other genomes can be downloaded from this Zenodo.
The SingleM metapackakge with GTDB R214 supplemented with other genomes AND our short-read (SR) MFD MAGs can be downloaded from this Zenodo.
All SingleM condensed profiles from our big batch samples are merged together using the script: /scripts/00-merge_condensed_singlem_outputs.r
.
R scripts used for generating SingleM-related plots (eg., Figure 5a, Figure 5b, Supplementary Figure 6, and Supplementary Figure 7) in the paper are in the folder /scripts
.
Codes for generating Figure 5c can be found in this repo.
SingleM example commands:
Taking "sample1" and metapackage "uhgg_smag_spire_oceans_gems.smpkg" as an example:
singlem pipe
-1 /PATH_TO_FORWARD_READ/sample1_trimmed.fastq.gz
-2 /PATH_TO_REVERSE_READ/sample1_trimmed.fastq.gz
--threads 10
--metapackage /PATH_TO_FILE/uhgg_smag_spire_oceans_gems.smpkg
--otu-table sample1.otu_table.csv
--archive-otu-table sample1.otu_table.csv.archive
--taxonomic-profile sample1.condensed_otu_defaultcutoffs
singlem microbial_fraction
--input-profile sample1.condensed_otu_defaultcutoffs
--output-tsv sample1_readfrac_stdout.tsv
--output-per-taxon-read-fractions sample1_readfrac_pertaxon
-1 /PATH_TO_FORWARD_READ/sample1_trimmed.fastq.gz
-2 /PATH_TO_REVERSE_READ/sample1_trimmed.fastq.gz
--metapackage /PATH_TO_FILE/uhgg_smag_spire_oceans_gems.smpkg
Taking "MAG1" as an example:
singlem pipe
--genome-fasta-files /PATH_TO_FILE/MAG1.fa
--otu-table MAG1_genome.otu_table.csv
--threads 2
--archive-otu-table MAG1_genome.otu_table_archive.csv
--taxonomic-profile MAG1_condensed_otu_defaultcutoffs
--metapackage /PATH_TO_FILE/uhgg_smag_spire_oceans_gems.smpkg
singlem summarise
--input-otu-tables-list /PATH_TO_FILE/ABSOLUTE_FILEPATHS_TO_MAG_OTUTABLES_LIST.txt
--output-otu-table genomes.otu_table_combined.csv
--metapackage /PATH_TO_FILE/uhgg_smag_spire_oceans_gems.smpkg
singlem summarise
--input-otu-tables-list /PATH_TO_FILE/ABSOLUTE_FILEPATHS_TO_METAGENOME_OTUTABLES_LIST.txt
--output-otu-table metagenome_otu_table_combinedotu.txt
--metapackage /PATH_TO_FILE/uhgg_smag_spire_oceans_gems.smpkg
singlem appraise
--metagenome-otu-tables /PATH_TO_FILE/metagenome_otu_table_combinedotu.txt
--genome-otu-tables /PATH_TO_FILE/genomes.otu_table_combined.csv
--metapackage /PATH_TO_FILE/uhgg_smag_spire_oceans_gems.smpkg
--output-binned-otu-table appraise_binned_otu_table.csv
--output-unbinned-otu-table appraise_unbinned_otu_table.csv
--output-unaccounted-for-otu-table appraise_unaccounted_for.csv
--output-found-in &> singlem_appraise.log &
SingleM results on NCBI datasets come from:
SingleM and Sandpiper: Robust microbial taxonomic profiles from metagenomic data., Ben J. Woodcroft, Samuel T. N. Aroney, Rossen Zhao, Mitchell Cunningham, Joshua A. M. Mitchell, Linda Blackall, Gene W. Tyson., bioRxiv 2024.01.30.578060; doi: https://doi.org/10.1101/2024.01.30.578060
The microbial_fraction
module of SingleM:
Eisenhofer, Raphael, Antton Alberdi, and Ben J. Woodcroft. Large-scale estimation of bacterial and archaeal DNA prevalence in metagenomes reveals biome-specific patterns. bioRxiv (2024): 2024-05. https://doi.org/10.1101/2024.05.16.594470
Microflora Danica: the atlas of Danish environmental microbiomes., CM Singleton, TBN Jensen, F Delogu, EA Sørensen, VR Jørgensen, SM Karst, Y Yang, KS Knudsen, M Sereika, F Petriglieri, S Knutsson, SM Dall, RH Kirkegaard, JM Kristensen, BJ Woodcroft, DR Speth, STN Aroney, The Microflora Danica Consortium, M Wagner, MKD Dueholm, PH Nielsen, M Albertsen., bioRxiv 2024.06.27.600767; doi: https://doi.org/10.1101/2024.06.27.600767