Skip to content

3d-omics/mg_quant

Repository files navigation

Snakemake workflow: mg_quant

Snakemake GitHub actions status

A Snakemake workflow for assessing detection limit from laser-microdissected samples.

Usage

  1. Requirements

    1. miniconda / mamba
    2. snakemake
  2. Clone the repository Clone the repository, and set it as the working directory.

git clone --recursive https://github.com/3d-omics/mg_quant.git
cd mg_quant
  1. Run the pipeline with the test data (takes 5 minutes to download the required software)
snakemake \
    --use-conda \
    --conda-frontend mamba \
    --jobs 8
  1. Edit the following files:

    1. config/samples.tsv: the control file with the sequencing libraries and their location.

      sample_id	library_id	forward_filename	reverse_filename	forward_adapter	reverse_adapter
      sample1	lib1	resources/reads/sample1_1.fq.gz	resources/reads/sample1_2.fq.gz	AGATCGGAAGAGCACACGTCTGAACTCCAGTCA	AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
      sample2	lib1	resources/reads/sample2_1.fq.gz	resources/reads/sample2_2.fq.gz	AGATCGGAAGAGCACACGTCTGAACTCCAGTCA	AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
      
    2. config/features.yml: the references and databases against which to screen the libraries: hosts and MAG catalogues.

      references:  # Reads will be mapped sequentially
         human: resources/reference/human_22_sub.fa.gz
         chicken: resources/reference/chicken_39_sub.fa.gz
      
      mag_catalogues:
         mag1: resources/reference/mags_sub.fa.gz
         # mag2: resources/reference/mags_sub.fa.gz
      
      databases:
         kraken2:
            mock1: resources/databases/kraken2/kraken2_RefSeqV205_Complete_500GB
            # refseq500: resources/databases/kraken2/kraken2_RefSeqV205_Complete_500GB
         singlem: resources/databases/singlem/S3.2.1.GTDB_r214.metapackage_20231006.smpkg.zb
      
    3. config/params.yml: parameters for every program. The defaults are reasonable.

  2. Run the pipeline and go for a walk:

snakemake --use-conda --profile profile/default --jobs 100 --cores 24 `#--executor slurm`

Rulegraph

rulegraph

Brief description

  1. Trim reads and remove adaptors with fastp
  2. Map to human, chicken / pig, mag catalogue:
    1. Map to the reference with bowtie2
    2. Extract the reads that have one of both ends unmapped with samtools
    3. Map those unmapped reads to the next reference
  3. Generate MAG-based statistics with coverm
  4. Generate MAG-independent statistics with singlem and nonpareil
  5. Assign taxonomically reads with kraken2
  6. Generate lots of reports in the reports/ folder

References