This repository contains code for data analysis used to generate results in the paper Species-level verification of Phascolarctobacterium association to colorectal cancer. For details see our preprint at medRxiv.
The script used for dataset generation and preprocessing is found here /scripts/data_generation.R and the script used for analyses of the data is found here /scripts/data_analyses.R. Required R packages include tidyverse, rstatix, vegan, enrichplot and MicrobiomeProfiler.
In this study we used four different cohorts to verify a previously reported association between Phascolarctobacterium species and colorectal cancer found in Bucher-Johannessen et al. (2023) and Senthakumaran et al. (2023). One of these cohorts was the publicly available CuratedmetagenomeData.
Pangenome analyses of the CRCbiome samples was generated using a snakemake pipeline with the script pangenome.smk. Average nucleotide identity was estimated using the python script genomes_for_ANI.py.
See our webpage for more information about the CRCbiome and the NORCCAP study