Skip to content

daisybio/16S-rRNA-gene-Functional_benchmark_profiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

On the limits of 16S rRNA gene-based metagenome prediction and functional profiling

Welcome to the GitHub repository dedicated to our study evaluating tools for functional profiling based on 16S rRNA gene sequencing. Within this repository, you will find scripts and resources essential for our thorough assessment of prominent tools like PICRUSt2, Tax4Fun2, PanFP, and MetGEMs toolbox. Our primary objective in this project is to underscore the limitations inherent in these tools when inferring functional profiles from 16S rRNA gene sequencing data.

Datasets Folder

The "Datasets" folder encompasses the output generated by the aforementioned functional inference tools—namely, PICRUSt2, Tax4Fun2, PanFP, and MetGEMs toolbox—for both real and simulated datasets, accompanied by their corresponding mapping files.

Simulation Datasets

We evaluated the performance of these functional inference tools using simulated metagenomic samples obtained from the 2nd Critical Assessment of Metagenome Interpretation (CAMI) Challenge. Within the "simulation" folder under "datasets," Both downloaded shotgun metagenome datasets and derived 16S rRNA full-length sequences for each body site can be found. The data can be downloaded from (https://frl.publisso.de/data/frl:6425518/).

Real Datasets

Individual folders are designated for each population cohort, with each file named according to the respective tool used. For instance, "CRC_PICRUST2_KO.tsv" denotes KO terms retrieved from PICRUSt2, while files with the suffix "REL_KO" indicate relative abundance. Files with "CUSTOM_KO_REL" signify the use of a customized normalization method. This naming convention applies to other cohorts such as POPGEN, FOCUS, and KORA.

Processing Script Folder

Contained within the "processing_script" folder are the scripts utilized to derive KO abundance from the aforementioned four tools, respectively. It also contains the script which is used to filter 16S rrNA gene sequences from simulated metagenome sequences using FilterRead pipeline.

R Script Folder

This folder encompasses R scripts for all downstream analyses, including differential abundance testing for each cohort and simulated datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published