This repository includes all the scripts used for the paper "SAMURAI: Shallow Analysis of copy nuMber Using a Reproducible And Integrated bioinformatics pipeline".
Case Study 1 - Evaluation of SAMURAI on simulated data: Download and dilution of Test data from Smolander et al.
The original simulated sample files (simulated_L001_R1_001.fastq.gz, simulated_L001_R2_001.fastq.gz) can be downloaded from Zenodo
Align the downloaded FASTQ files to hg38 using BWA-MEM. You can use the following Singularity container for BWA-MEM.
Downsampling is performed using Picard DownsampleSamm. You can install Picard locally or use a Singularity container:
To produce diluted samples, use the following command, changing the parameter P to simulate different coverages (e.g., 0.1, 0.3, 0.5, 0.7):
java -jar picard.jar DownsampleSam \
I=input.bam \
O=downsampled.bam \
P=0.5
Case Study 1 - Evaluation of SAMURAI on simulated data: Dilution of normal samples to build the Panel of normals (PoN) for liquid biopsy test
The script download_normal_gatk.sh can be used to download GATK data to build a simulated panel of normal.
Data need to be downsampled at different coverages.
The script contains the automatic download of three singularity images for sambamba, samtools and bedtools that are needed for the in-silico dilution.
The function Subsample takes as input:
input_bam: Original downloadedBAMnormal file (SM-74NEG.bam)desired_read_count: Desired read count for subsamplingoutput_bam: Final dilutedBAMnormal file
Within the script, you can adjust the following parameters:
CORES: Number of cores to useREAD_COUNT: Number of reads for subsamplingNUM_SAMPLES: Number of samples to generate
The script then converts diluted samples from BAM to fastq format.
You can use the script by launching bash download_normal_gatk.sh after ajusting the parameters as you like. Alternatively, you can download data and singularity images on your own and use the different part of the script separately.