On the limits of 16S rRNA gene-based metagenome prediction and functional profiling

Welcome to the GitHub repository dedicated to our study evaluating tools for functional profiling based on 16S rRNA gene sequencing. Within this repository, you will find scripts and resources essential for our thorough assessment of prominent tools like PICRUSt2, Tax4Fun2, PanFP, and MetGEMs toolbox. Our primary objective in this project is to underscore the limitations inherent in these tools when inferring functional profiles from 16S rRNA gene sequencing data.

Datasets Folder

The "Datasets" folder encompasses the output generated by the aforementioned functional inference tools—namely, PICRUSt2, Tax4Fun2, PanFP, and MetGEMs toolbox—for both real and simulated datasets, accompanied by their corresponding mapping files.

Simulation Datasets

We evaluated the performance of these functional inference tools using simulated metagenomic samples obtained from the 2nd Critical Assessment of Metagenome Interpretation (CAMI) Challenge. Within the "simulation" folder under "datasets," Both downloaded shotgun metagenome datasets and derived 16S rRNA full-length sequences for each body site can be found. The data can be downloaded from (https://frl.publisso.de/data/frl:6425518/).

Real Datasets

Individual folders are designated for each population cohort, with each file named according to the respective tool used. For instance, "CRC_PICRUST2_KO.tsv" denotes KO terms retrieved from PICRUSt2, while files with the suffix "REL_KO" indicate relative abundance. Files with "CUSTOM_KO_REL" signify the use of a customized normalization method. This naming convention applies to other cohorts such as POPGEN, FOCUS, and KORA.

Processing Script Folder

Contained within the "processing_script" folder are the scripts utilized to derive KO abundance from the aforementioned four tools, respectively. It also contains the script which is used to filter 16S rrNA gene sequences from simulated metagenome sequences using FilterRead pipeline.

R Script Folder

This folder encompasses R scripts for all downstream analyses, including differential abundance testing for each cohort and simulated datasets

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
R_scripts		R_scripts
datasets		datasets
processing_script		processing_script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the limits of 16S rRNA gene-based metagenome prediction and functional profiling

Datasets Folder

Simulation Datasets

Real Datasets

Processing Script Folder

R Script Folder

About

Releases

Packages

Languages

daisybio/16S-rRNA-gene-Functional_benchmark_profiling

Folders and files

Latest commit

History

Repository files navigation

On the limits of 16S rRNA gene-based metagenome prediction and functional profiling

Datasets Folder

Simulation Datasets

Real Datasets

Processing Script Folder

R Script Folder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages