-
Notifications
You must be signed in to change notification settings - Fork 30
Running AltAnalyze on an HPC
To run AltAnalyze from command-line, you will need to have installed AltAnalyze and make sure the source code is in the AltAnalyze program main directory. If you have downloaded the python-source code or Linux version, this will be the case, otherwise, you will need to copy the contents of the folder "Source_code" to the parent AltAnalyze directory. Before supplying the command-line argument to this program, you will need to open a command prompt and change to the directory with the AltAnalyze source code. The below instructions are designed for an LSF cluster.
Generate BAM files from FASTQ files using STAR - ideally with strand predictions
FASTQ1=$1
FASTQ2=${FASTQ1/_read1/_read2}
SAMPLE=$(basename $FASTQ1 .fastq.gz)
DIR=$(pwd)
cat <<EOF
#BSUB -L /bin/bash
#BSUB -W 10:00
#BSUB -n 4
#BSUB -R "span[ptile=4]"
#BSUB -M 98000
#BSUB -J $SAMPLE
cd $DIR
module load STAR/2.6.1
STAR --genomeDir /data/Hs/Grch38-STAR-index --readFilesIn $FASTQ1 $FASTQ2 --readFilesCommand gunzip -c --outFileNamePrefix $DIR/$SAMPLE --runThreadN 4 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --sjdbGTFfile /data/Hs/Star-Index-GRCH38/Homo_sapiens.GRCh38.85.gtf --limitBAMsortRAM 97417671648
EOF
### Run as: for i in *_read1_*.fastq.gz; do ./STARhg38.sh $i | bsub; done
Downloading and installing a species specific database (human)
module load python/2.7.5
python AltAnalyze.py --species Hs --update Official --version EnsMart100 --additional all
Exporting a Junction and Intron BED reference file for BedTools
BAM=$1
SAMPLE=$(basename $BAM .bam)
DIR=$(pwd)
cat <<EOF
#BSUB -L /bin/bash
#BSUB -W 10:00
#BSUB -n 2
#BSUB -R "span[ptile=2]"
#BSUB -M 16000
#BSUB -J $SAMPLE
cd $DIR
module load python/2.7.5
module load samtools
#Export exon-exon junction counts
python /data/AltAnalyze/import_scripts/BAMtoJunctionBED.py --i $BAM --species Hs --r /data/AltAnalyze/AltDatabase/EnsMart100/ensembl/Hs/Hs_Ensembl_exon.txt
#Export exon-intron junction counts
python /data/AltAnalyze/import_scripts/BAMtoExonBED.py --i $BAM --r /data/AltAnalyze/AltDatabase/EnsMart100/ensembl/Hs/Hs_Ensembl_exon.txt --s Hs
EOF
### Run as: for i in *.bam; do BAMtoBEDhg38.sh $i | bsub; done
Create Sample Groups and Comparison Files
See the instructions here. These must have consistent names with the expname noted below (groups. and comps.). Ideally, these should be stored in the same directory as the BAM files to allow for automated SashimiPlot creation
Perform Differential Gene and Splicing Analyses
cat <<EOF
#BSUB -L /bin/bash
#BSUB -W 60:00
#BSUB -n 4
#BSUB -R "span[ptile=4]"
#BSUB -M 96000
module load python/2.7.5
module load R
python /data/AltAnalyze/AltAnalyze.py --species Hs --platform RNASeq --bedDir "/data/experiment/" --groupdir "/data/experiment/groups.TumorsAndControls.txt" --compdir "/data/experiment/comps.TumorsAndControls.txt" --output "/data/experiment" --expname "TumorsAndControls" --GEelitefold 1.5 --GEelitepval 0.05 --GEeliteptype "adjp" --multiProcessing yes
### Run as: ./AltAnalyze.sh | bsub
The primary outputs of AltAnalyze will contain:
- Gene expression quantification as gene-level junction RPKMs (ExpressionInput/exp.TumorsAndControls-steady-state.txt)
- Junction-level counts (ExpressionInput/counts.TumorsAndControls.txt)
- Differential expression analysis results (ExpressionOutput/DATASET-TumorsAndControls.txt)
- Gene-set and pathway enrichment results (GO-Elite) (GO-Elite)
- Transcriptional regulatory networks (GO-Elite/regulated/networks)
- Alternative splicing PSI values (AltResults/AlternativeOutput/Hs_RNASeq_top_alt_junctions-PSI_EventAnnotation.txt)
- Differential splicing results (AltResults/AlternativeOutput/Events-dPSI_0.1_rawp)
- MarkerGenes (DataPlots/MarkerFinder)
- QC results (DataPlots)
- SashimiPlots (SashimiPlots)