Bulk TCR-beta chain sequencing workflow - with DNA as starting material
This includes below steps and scripts
- fastp - QC and pre-processing of fastq files/ multiqc to create a multisample report:
run_fastp_multiqc.sh
- mixcr - alignment and assembly of clonotypes from fastq files:
run_mixcr_v1.sh
,fix_TRBfiles.R
- vdjtools - postprocessing/graphical and text file results for interpretation:
run_vdjtools_single_samples.sh
,run_vdjtools_custom_samples.sh
,vdjtools-patch.sh
Currently fastp, multiqc, mixcr and vdjtools are installed on the galaxy server. But do install fastQC for your user.
conda install -c bioconda fastqc
INFO for a test run:
Test data:
~schavan/projects/bulk_tcr_seq/data/EXP21001376_FFPE`
Input files:
~schavan/projects/bulk_tcr_seq/inputs/samplesheet_EXP21001376.tsv
~schavan/projects/bulk_tcr_seq/inputs/metadataToConvert_EXP21001376_FFPE.txt
Output files for Mixcr:
~schavan/projects/bulk_tcr_seq/inputs/
Output files for VDJTools:
~schavan/projects/bulk_tcr_seq/scripts/batch2.2
- SAMPLSHEET1 create a tab delimited samplesheet per experiment for input to mixcr as below e.g.samplesheet_EXP21001293.tsv. No headers. Specifiy complete and absolute paths to the fastq files
DSCO28-MTC-1 /data/DSCO28-MTC-1_S7_L001_R1_001.fastq.gz /data/DSCO28-MTC-1_S7_L001_R2_001.fastq.gz
DSCO28-MTC-2 /data/DSCO28-MTC-2_S1_L001_R1_001.fastq.gz /data/DSCO28-MTC-2_S1_L001_R2_001.fastq.gz
DSCO28-MTC-3 /data/DSCO28-MTC-3_S3_L001_R1_001.fastq.gz /data/DSCO28-MTC-3_S3_L001_R2_001.fastq.gz
DSCO28-TRF-1 /data/DSCO28-TRF-1_S6_L001_R1_001.fastq.gz /data/DSCO28-TRF-1_S6_L001_R2_001.fastq.gz
DSCO28-TRF-2 /data/DSO28-TRF-2_S4_L001_R1_001.fastq.gz /data/DSCO28-TRF-2_S4_L001_R2_001.fastq.gz
DSCO28-TRF-3 /data/DSCO28-TRF-3_S5_L001_R1_001.fastq.gz /data/DSCO28-TRF-3_S5_L001_R2_001.fastq.gz
FFPE-9G7045 /data/FFPE-9G7045_S2_L001_R1_001.fastq.gz /data/FFPE-9G7045_S2_L001_R2_001.fastq.gz
- SAMPELSHEET2 Create another tab delimited samplesheet for preovide mixcr outputs as input to VDJtools as below e.g. metadataToConvert_EXP21001293.txt. Header present. IMPORTANT: Do not specify complete paths, but place the file in the same folder as the inputs folder because VDJtools expects the TRB files to be in the same folder as the inputs folder (Weird bug!) So if needed, create symbolic links in the inputs folder pointing to the output files. Names should be exactly same as the "file_name" in the below file.
DSCO28-MTC-1analysis.clonotypes.TRB.fixed.txt DSCO28-MTC-1
DSCO28-MTC-2analysis.clonotypes.TRB.fixed.txt DSCO28-MTC-2
DSCO28-MTC-3analysis.clonotypes.TRB.fixed.txt DSCO28-MTC-3
DSCO28-TRF-1analysis.clonotypes.TRB.fixed.txt DSCO28-TRF-1
DSCO28-TRF-2analysis.clonotypes.TRB.fixed.txt DSCO28-TRF-2
DSCO28-TRF-3analysis.clonotypes.TRB.fixed.txt DSCO28-TRF-3
FFPE-9G7045analysis.clonotypes.TRB.fixed.txt FFPE-9G7045
- Make sure a file called metadata.txt gets automatically created by VDJtools, looks like below
VDJtools.HNSCC-15396-1.txt HNSCC-15396-1 conv:MiXcr
VDJtools.HNSCC-15396-2.txt HNSCC-15396-2 conv:MiXcr
VDJtools.HNSCC-15396-3.txt HNSCC-15396-3 conv:MiXcr
VDJtools.HNSCC-6827-1.txt HNSCC-6827-1 conv:MiXcr
VDJtools.HNSCC-6827-2.txt HNSCC-6827-2 conv:MiXcr
VDJtools.HNSCC-6827-3.txt HNSCC-6827-3 conv:MiXcr
- mixcr
- *.TRB.txt
- *.clna
- *.vdjca
- vdjtools (depending on the type of plots, read more in vdjtools documentation)
- *.summary.txt
- *.txt
- metadata.txt
- Logon to Galaxy server and then issue the below commands:
conda create --name tcrbeta
conda activate tcrbeta
The above creates and activates a conda environment called "tcrbeta" for you, then you can install R libaries using conda install commands for specific R libraries like ggplot etc inside this "tcrbeta" so that this setup remains specific to tcrseq only and does not ever conflict with anything else you might use your bash for.
- Install R/The version that I've is R 4.0.5
conda install -c conda-forge r-base
- Install R libraries
conda install -c conda-forge r-ggplot2
conda install -c conda-forge r-gplots
conda install -c conda-forge r-rcolorbrewer
conda install -c conda-forge r-VennDiagram
conda install -c conda-forge r-reshape2
conda install -c conda-forge r-ape
conda install -c conda-forge r-plotrix
Install any missing R library in the above way. And, if you cannot find any library with channel conda-forge, try channel "-c bioconda" instead of "-c conda-forge"
For more understadning read: Overall for more understanding read