Analysis workflow for bulk-tcr-beta sequencing data

Bulk TCR-beta chain sequencing workflow - with DNA as starting material

This includes below steps and scripts

fastp - QC and pre-processing of fastq files/ multiqc to create a multisample report: run_fastp_multiqc.sh
mixcr - alignment and assembly of clonotypes from fastq files: run_mixcr_v1.sh, fix_TRBfiles.R
vdjtools - postprocessing/graphical and text file results for interpretation: run_vdjtools_single_samples.sh, run_vdjtools_custom_samples.sh, vdjtools-patch.sh

Currently fastp, multiqc, mixcr and vdjtools are installed on the galaxy server. But do install fastQC for your user. conda install -c bioconda fastqc

INFO for a test run:

Test data:
s3://seqcore/fastqs/tcrseq/EXP21001376_FFPE

Input files:
~~schavan/projects/bulk_tcr_seq/inputs_trb/samplesheet_EXP21001376.tsv ## Mixcr + fastq only
~schavan/projects/bulk_tcr_seq/inputs/metadataToConvert_EXP21001376_FFPE.txt. ## VDJTools Only

Output files for Mixcr:
~schavan/projects/bulk_tcr_seq/inputs/

Output files for VDJTools:
~schavan/projects/bulk_tcr_seq/scripts/batch2.2

Inputs

SAMPLSHEET1 create a tab delimited samplesheet per experiment for input to mixcr as below e.g.samplesheet_EXP21001293.tsv. No headers. Specifiy complete and absolute paths to the fastq files

DSCO28-MTC-1    /data/DSCO28-MTC-1_S7_L001_R1_001.fastq.gz    /data/DSCO28-MTC-1_S7_L001_R2_001.fastq.gz
DSCO28-MTC-2    /data/DSCO28-MTC-2_S1_L001_R1_001.fastq.gz    /data/DSCO28-MTC-2_S1_L001_R2_001.fastq.gz
DSCO28-MTC-3    /data/DSCO28-MTC-3_S3_L001_R1_001.fastq.gz    /data/DSCO28-MTC-3_S3_L001_R2_001.fastq.gz
DSCO28-TRF-1    /data/DSCO28-TRF-1_S6_L001_R1_001.fastq.gz    /data/DSCO28-TRF-1_S6_L001_R2_001.fastq.gz
DSCO28-TRF-2    /data/DSO28-TRF-2_S4_L001_R1_001.fastq.gz    /data/DSCO28-TRF-2_S4_L001_R2_001.fastq.gz
DSCO28-TRF-3    /data/DSCO28-TRF-3_S5_L001_R1_001.fastq.gz    /data/DSCO28-TRF-3_S5_L001_R2_001.fastq.gz
FFPE-9G7045 /data/FFPE-9G7045_S2_L001_R1_001.fastq.gz /data/FFPE-9G7045_S2_L001_R2_001.fastq.gz

SAMPELSHEET2 Create another tab delimited samplesheet for preovide mixcr outputs as input to VDJtools as below e.g. metadataToConvert_EXP21001293.txt. Header present. IMPORTANT: Do not specify complete paths, but place the file in the same folder as the inputs folder because VDJtools expects the TRB files to be in the same folder as the inputs folder (Weird bug!) So if needed, create symbolic links in the inputs folder pointing to the output files. Names should be exactly same as the "file_name" in the below file.

	DSCO28-MTC-1analysis.clonotypes.TRB.fixed.txt	DSCO28-MTC-1
	DSCO28-MTC-2analysis.clonotypes.TRB.fixed.txt	DSCO28-MTC-2
	DSCO28-MTC-3analysis.clonotypes.TRB.fixed.txt	DSCO28-MTC-3
	DSCO28-TRF-1analysis.clonotypes.TRB.fixed.txt	DSCO28-TRF-1
	DSCO28-TRF-2analysis.clonotypes.TRB.fixed.txt	DSCO28-TRF-2
	DSCO28-TRF-3analysis.clonotypes.TRB.fixed.txt	DSCO28-TRF-3
	FFPE-9G7045analysis.clonotypes.TRB.fixed.txt	FFPE-9G7045

Outputs

Make sure a file called metadata.txt gets automatically created by VDJtools, looks like below

VDJtools.HNSCC-15396-1.txt	HNSCC-15396-1	conv:MiXcr
VDJtools.HNSCC-15396-2.txt	HNSCC-15396-2	conv:MiXcr
VDJtools.HNSCC-15396-3.txt	HNSCC-15396-3	conv:MiXcr
VDJtools.HNSCC-6827-1.txt	HNSCC-6827-1	conv:MiXcr
VDJtools.HNSCC-6827-2.txt	HNSCC-6827-2	conv:MiXcr
VDJtools.HNSCC-6827-3.txt	HNSCC-6827-3	conv:MiXcr

mixcr

*.TRB.txt
*.clna
*.vdjca

vdjtools (depending on the type of plots, read more in vdjtools documentation)

*.pdf
*.summary.txt
*.txt
metadata.txt

Setting up your own user conda environment

Logon to Galaxy server and then issue the below commands:

conda create --name tcrbeta
conda activate tcrbeta

The above creates and activates a conda environment called "tcrbeta" for you, then you can install R libaries using conda install commands for specific R libraries like ggplot etc inside this "tcrbeta" so that this setup remains specific to tcrseq only and does not ever conflict with anything else you might use your bash for.

Install R/The version that I've is R 4.0.5

conda install -c conda-forge r-base

Install R libraries

conda install -c conda-forge r-ggplot2
conda install -c conda-forge r-gplots
conda install -c conda-forge r-rcolorbrewer
conda install -c conda-forge r-VennDiagram
conda install -c conda-forge r-reshape2
conda install -c conda-forge r-ape
conda install -c conda-forge r-plotrix

Install any missing R library in the above way. And, if you cannot find any library with channel conda-forge, try channel "-c bioconda" instead of "-c conda-forge"

For more understadning read: Overall for more understanding read

Extra Info

https://milaboratory.com Pirogov Russian National Research Medical University, Moscow, Russia

MiXCR https://mixcr.readthedocs.io/en/master/ Java a. Align raw sequencing reads to reference V, D, J, and C genes of TCRs b. Assemble clonotypes using alignments based on the region of interest (CDR3) c. Export alignments & clones to human-readable format
VDJtools (post-analysis) https://vdjtools-doc.readthedocs.io/en/master/ Java/R a. Computes a wide set of statistics b. Perform various forms of cross-sample analysis

VDJtools clonotype specification

Count: Number of reads
Frequency: the share of clonotype in the sample
Complementarity determining region 3 nucleotide sequence (CDR3nt). CDR3 starts with Variable region reference point (conserved Cys residue) and ends with Joining segment reference point (conserved PheTrp)
Translated CDR3 sequence (CDR3aa)
Variable (V) segment name.
Diversity (D) segment name for the receptor chains (TRB)
Joining (J) segment name.
Vend, Dstart, Dend, and Jstart marking V, D and J segment boundaries within CDR3 nucleotide sequence (inclusive)

Analysis workflow setup

Turn on VPN, Cisco AnyConnect
Log on to galaxy server (10.0.31.135), create directories in your home folder - inputs, data, scripts
Check installations and versions which mixcr /usr/local/bin/mixcr which vdjtools /usr/local/bin/vdjtools mixcr --version MiXCR v3.0.13 vdjtools --version VDJtools V1.2.1
Transfer fastq files for a given experiment e.g. EXP21001293 from Illumina → local machine (Install Illumina BaseSpace Downloader on your local machine). Then transfer from local machine → galaxy server (10.0.31.135) rsync -r -e ssh user@10.0.31.135:~/projects/bulk_tcr_seq/data/EXP21001293

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
(Example) metadataToConvert.txt		(Example) metadataToConvert.txt
(Example) samplesheet_EXP21001376.tsv		(Example) samplesheet_EXP21001376.tsv
1_run_fastp_multiqc.sh		1_run_fastp_multiqc.sh
2_run_mixcr_v1.sh		2_run_mixcr_v1.sh
3_fix_TRBfiles.R		3_fix_TRBfiles.R
4_run_vdjtools_single_samples.sh		4_run_vdjtools_single_samples.sh
5_run_vdjtools_custom_overlap_samples.sh		5_run_vdjtools_custom_overlap_samples.sh
README.md		README.md
mergelanes.sh		mergelanes.sh
super-relevant-reads.txt		super-relevant-reads.txt
vdjtools-patch.sh		vdjtools-patch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis workflow for bulk-tcr-beta sequencing data

Inputs

Outputs

Setting up your own user conda environment

Extra Info

Analysis workflow setup

About

Releases

Packages

Languages

cogentherapeutics/bulk-tcr-beta

Folders and files

Latest commit

History

Repository files navigation

Analysis workflow for bulk-tcr-beta sequencing data

Inputs

Outputs

Setting up your own user conda environment

Extra Info

Analysis workflow setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages