-
Notifications
You must be signed in to change notification settings - Fork 2
FAQ
sprokopec edited this page Oct 22, 2021
·
11 revisions
- Why do my jobs keep failing?
- How do I run variant calling on a single chromosome?
- I want to annotate my variants (SNV and INDELs) with OncoKB - how do I do this?
- I want to filter my structural variants against known blacklists - how can I do this?
- I have single-end data; can pipeline-suite process this?
- make sure you are not using --dry-run
- check the available disk space; if there is no space available, jobs cannot write to file and will then fail
- navigate to your PROJECTNAME/logs/run_DNA_pipeline_DATE/ and run the following:
joblist=""
for i in */slurm/s*out; do
jobid=$(basename $i | sed 's/slurm-//' | sed 's/.out//');
joblist="${joblist}${joblist:+,}$jobid";
done
sacct -j $joblist --format='JobID,State,JobName' -P
- from here, you can see which job specifically failed. Navigate to this tools' log directory (PROJECTNAME/TOOL/logs) and view the slurm_job_metrics.out file to see which job failed and why
- if any are OUT_OF_MEMORY or TIMEOUT, simply update the requested time/memory parameters in the dna_pipeline_config.yaml
- otherwise, if any are FAILED or CANCELLED, read the specific slurm output for that job (./*/slurm-JOBID.out)
Other common reasons for failures:
- if no jobs ran successfully, there is probably an error in your YAML files (check the whitespace!)
- if some jobs ran successfully, it is possible that you've tried to run too many jobs at once (especially if running WGS with VarDict/NovoBreak/Pindel/Mavis) - you can only have 10000 jobs submitted at a time. In this case, you may need to run these steps 1 at a time
- it's entirely possible that there is a bug in my code. Reach out and let me know!
- Some tools allow you to specify a set of desired chromosomes within the dna_pipeline_config.yaml file; indicate the desired chromosome, and set seq_type: wgs
- MuTect2, VarScan, GATK:CNV, NovoBreak (in development)
- For any other tool, the easiest approach would be to subset your BAMs to the desired chromosome and create a new data_config.yaml file. Again, be sure to set seq_type: wgs within the config file.
- The OncoKB annotator requires internet access which means it can't be run directory on H4H. However, you can mount H4H and run it locally or transfer your mutation data to Samwise and run it there.
- From H4H or Samwise, you can run the version of OncoKB located in /pughlab/bin.
- Or, download the most recent version from git: oncokb-annotator
- Obtain an OncoKB access key: apiAccess
# move from H4H to Samwise
ssh h4huhndata1
scp /path/to/PROJECTNAME/Report/plots/date_PROJECTNAME_ensemble_mutation_data.tsv xfer:/mnt/work1/users/home/username
exit
# from Samwise
module load python3
python3 /mnt/work1/users/pughlab/bin/oncokb-annotator/MafAnnotator.py \
-t ONCOTREE_CODE \
-b OncoKB key \
-r reference \
-i date_PROJECTNAME_ensemble_mutation_data.tsv \
-o date_PROJECTNAME_ensemble_oncokb_annotated.tsv
# move it back to H4H
ssh xfer
scp date_PROJECTNAME_ensemble_oncokb_annotated.tsv h4huhndata1:/cluster/home/username
exit
You can then pass this data into filter_ensemble_mutations.R to perform filtering and update the SNV landscape plot.
module load R/3.6.1
Rscript /path/to/pipeline-suite/scripts/report/filter_ensemble_mutations.R \
-p PROJECTNAME \
-o /path/to/PROJECTNAME/Report/plots \
-t exome (or wgs) \
-i /path/to/PROJECTNAME/Report/plots/date_PROJECTNAME_ensemble_oncokb_annotated.tsv \
-c /path/to/PROJECTNAME/Report/data/total_bases_covered.tsv \
-m /path/to/PROJECTNAME/Report/data/msi_estimates.tsv
- FusionAnnotator uses the CTAT_HumanFusionLib source to annotate known fusions, as well as providing annotations for known blacklists. See here for more information.
- Below is a sample command using annotations for GRCh38 and filtering fusions from all probable blacklists.
# annotate fusions using FusionAnnotator
module load FusionAnnotator
FusionAnnotator \
--genome_lib_dir /cluster/projects/pughlab/references/STAR-Fusion/GRCh38_gencode_v31_CTAT_lib_Oct012019ctat_genome_lib_build_dir/ \
--annotate /path/to/mavis/output/DATE_PROJECTNAME_sv_data_for_cbioportal.tsv \
--fusion_name_col 34 \
--full > /path/to/mavis/output/DATE_PROJECTNAME_sv_data_for_cbioportal_annotated.tsv
# remove entries annotated with known blacklists
cat DATE_PROJECTNAME_sv_data_for_cbioportal_annotated.tsv | grep -v -e 'Babiceanu' -e 'Greger' -e 'BodyMap' -e 'GTEx_recurrent_StarF2019' -e 'DGD_PARALOGS' -e 'HGNC_GENEFAM' -e 'ConjoinG' > DATE_PROJECTNAME_sv_data_for_cbioportal_annotated_filtered.tsv
- Not all tools included in this pipeline can accept single-end data (see below). To include fastqs with SE reads in the alignment step, see the instructions here: creating your fastq.yaml
- For variant calling, if you have a mix of SE and PE data, note that some tools will fail when encountering SE reads. You can either run as normal and deal with errors as they come, or exclude the SE input when running these tools. If you only have SE reads, set these tools to 'run: N' in the data config to avoid errors.
- Tools that will fail with SE data:
- Manta/Strelka (require paired-end reads)