Skip to content

Installfix #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 189 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
812f07b
Create README.md
bintriz Jan 20, 2018
9fd1b60
Minor fix.
Jan 20, 2018
39f271b
Edit README.
Jan 20, 2018
19fe1f7
a small change in removing files.
Feb 14, 2018
eb333f4
Adjust the threaded option.
Feb 14, 2018
2ede371
Add an analysis script caculating VAF.
Feb 14, 2018
dd2c8be
Add nda related utils.
Feb 14, 2018
41df67f
Add a script for strand bias analysis.
Feb 19, 2018
ecac128
Change VAF script.
Feb 21, 2018
b205469
Refactoring.
Feb 21, 2018
d6e79d9
Bug fix: import sys was omitted.
Feb 22, 2018
ad1ac80
Change p-val format into scientific notation.
Feb 24, 2018
881c6db
Change binomial test into one-sided (smaller).
Apr 16, 2018
757b309
Move library location.
bintriz Jun 17, 2018
c46b413
Add germline_filter.py
bintriz Jun 17, 2018
23e8f5c
Change germline filter script
bintriz Jun 29, 2018
5b9b60a
change location of the library and conf file in genome mapping pipeline
bintriz Jun 30, 2018
99443b2
change library/job_queue.py location
bintriz Jun 30, 2018
ab23d3c
Add configurations for root and cnvnator.
bintriz Jul 11, 2018
69a7b2b
Add variant_calling pipeline.
bintriz Jul 11, 2018
f100c00
find samtools path when config not working
Aug 21, 2018
fdcefcc
Add missing variable.
bintriz Oct 1, 2018
bb5ac5c
Create requirements.txt
Oct 1, 2018
77cca2c
Update requirements.txt
Oct 1, 2018
56a1e3b
Update README.md
Oct 1, 2018
f8c93c8
Update README.md
Oct 1, 2018
b339d7e
Update README.md
Oct 1, 2018
44e99f2
Bug fix.
bintriz Oct 19, 2018
3eb07c9
Add an automatic tool installer.
bintriz Oct 19, 2018
05e9822
Merge pull request #8 from bsmn/kdaily-add-py-requirements
bintriz Oct 19, 2018
c7dd415
Merge pull request #9 from bsmn/kdaily-add-contrib
bintriz Oct 19, 2018
10b1e2e
Merge pull request #10 from bsmn/kdaily-add-setup-notes
bintriz Oct 19, 2018
975470f
Merge branch 'master' of github.com:bsmn/bsmn_pipeline
bintriz Oct 19, 2018
28d358c
Merge pull request #13 from bsmn/tool_installer
Oct 19, 2018
cb07f36
Bug fix: install_tools.sh
bintriz Nov 24, 2018
844f81f
Add a script for resource download
bintriz Nov 24, 2018
a843671
Change configuration system
bintriz Nov 25, 2018
73dfd02
pileup.py upadte as configuration system changes
bintriz Nov 25, 2018
d6b1279
synpase_login refactoring
bintriz Nov 26, 2018
f67e494
Add nda_login
bintriz Nov 26, 2018
475d6f9
Refactoring: config, run_info
bintriz Nov 26, 2018
6ae30d5
Library name change: utils -> misc
bintriz Nov 26, 2018
7787b73
Refactoring sample list parser & Fix sample list format
bintriz Nov 26, 2018
1a8c017
Add synapse login in the download_resources.sh
bintriz Nov 26, 2018
31161f3
Add NDA download funtion
bintriz Nov 26, 2018
213aceb
Minor change in job scripts
bintriz Nov 26, 2018
3d25701
Refactoring: rename analysis_utils -> utils
bintriz Nov 26, 2018
4c1e75c
Add an option for turning off bam upload to synapse
bintriz Nov 28, 2018
aad38a2
Add wrapper scripts for pipeline runners
bintriz Nov 28, 2018
2818e1c
Bug_fix: NDA credential setting
bintriz Nov 28, 2018
9675dea
minor change of the argument help in runners
bintriz Nov 28, 2018
99d7a68
Simplify aws download log
bintriz Nov 28, 2018
936b310
Removing a conflict of tmp collate file names when running bam2fq on …
bintriz Nov 28, 2018
7ea731b
Refactoring: run_info, log_dir
bintriz Dec 4, 2018
e64ab24
Change location to save run_info
bintriz Dec 4, 2018
11f638f
Improve error handling in job_scripts
bintriz Dec 4, 2018
6c379fb
Add user's S3Uri and LocalPath options for data source; Multiple try …
bintriz Dec 4, 2018
380ab5e
Bug fix: Not updating the return code of multiple download tries
bintriz Dec 4, 2018
b449050
Bug fix: Make sample dir before saving run_info
bintriz Dec 4, 2018
1216c22
Adjust threaded and java Xmx to match with m5.12xlarge
bintriz Dec 5, 2018
8f2cf6b
Java memory adjustment
bintriz Dec 9, 2018
3b45165
Update the README file
bintriz Dec 10, 2018
69347c5
Update the README file
bintriz Dec 10, 2018
4468f20
Merge pull request #14 from bsmn/bug_fix
Dec 10, 2018
6e22d2f
Merge pull request #15 from bsmn/pull-request-tj-update
Dec 10, 2018
2b92eb1
Small edits on README.md
attilagk Jan 15, 2019
2a81cf1
Merge pull request #16 from attilagk/edits
bintriz Jan 15, 2019
512ad27
install_tools.sh now downloads and extracts GATK
attilagk Jan 15, 2019
daa03a7
Removed manual download of GATK from README.md
attilagk Jan 15, 2019
3aec4de
Merge pull request #17 from attilagk/edits
bintriz Jan 16, 2019
7085fc5
Recommended install location in README.md
attilagk Jan 16, 2019
d301c36
Bug fix:NDA api url change
bintriz Apr 30, 2019
e19be4b
Bug fix:sample list input parser change
bintriz Apr 30, 2019
4063144
Add templates for new job scripts
bintriz Apr 30, 2019
fad95f0
Small change PON filter
bintriz Apr 30, 2019
e3edcda
Merge pull request #22 from bsmn/pull-request-tj-update
Apr 30, 2019
580673d
Add mutect2 single sample workflow
sean-cho May 6, 2019
f603f9f
Update resources
sean-cho May 6, 2019
467cd40
Merge pull request #23 from sean-cho/scho-updates
bintriz May 8, 2019
3303403
add alt_bq_sum.py with modification of pileup library
bintriz May 15, 2019
b588012
bug fix: pileup
bintriz May 28, 2019
8be53e5
update
douym May 29, 2019
d05e102
update
douym Jun 4, 2019
4dbf3b0
Merge pull request #18 from attilagk/master
bintriz Jun 18, 2019
f090b72
Update README.md
bintriz Jun 18, 2019
0099748
Update the root and cnvnator part of the install script
bintriz Jun 18, 2019
877efa3
Merge branch 'master' of https://github.com/bsmn/bsmn_pipeline
bintriz Jun 18, 2019
035a96a
Merge pull request #24 from douym/douym-update-install
bintriz Jul 31, 2019
cde391b
Fix: root installation
bintriz Aug 1, 2019
c490284
Merge branch 'master' of https://github.com/bsmn/bsmn_pipeline
bintriz Aug 1, 2019
e730c4b
Remove conda dependendy in installing the depending tools of MosaicFo…
bintriz Aug 2, 2019
6b6b0a2
Add codes installing RetroSom dependencies
bintriz Aug 3, 2019
fcfa65a
Modify to skip installing installed tools
bintriz Aug 10, 2019
c2b4ac6
Update README.md
bintriz Aug 12, 2019
351f541
Separate filtering steps from calling.
bintriz Aug 14, 2019
009f0ae
Remove template job scripts not in use
bintriz Aug 14, 2019
07abb8d
Rename the cnvnator job script
bintriz Aug 14, 2019
c984785
Change output directory structure & disable indel quals and original …
bintriz Aug 14, 2019
667ebb2
Add cram conversion step
bintriz Aug 14, 2019
7a58a0c
Add a step extracting unmapped reads
bintriz Aug 14, 2019
7933a10
Add mutect-sinle job scripts
bintriz Aug 14, 2019
d19425b
Add a job submitter for mutect-single
bintriz Aug 14, 2019
672a5cd
Add new gatk hc job scripts implemented using gatk4
bintriz Aug 14, 2019
6d49c49
Create a new job submitter to separate gatk-hc by copying run.py
bintriz Aug 14, 2019
f568766
Merge branch 'master' of https://github.com/bsmn/bsmn_pipeline
bintriz Aug 14, 2019
56c468f
Fix a typo
bintriz Aug 17, 2019
617acec
Rename run_aln_jobs
bintriz Aug 17, 2019
c1f3ac2
Add MASTER_SERVER into run_info
bintriz Aug 17, 2019
e31080e
Change the final file format of output from bam to cram
bintriz Aug 17, 2019
eabd192
Rollback file names: submit_aln_jobs
bintriz Aug 18, 2019
b466ec4
Reflect the rollback of submit_aln_jobs
bintriz Aug 18, 2019
31a20a5
Change upload logic. Allow to run genome mapping together with varian…
bintriz Aug 18, 2019
b5b407b
Bug fix
bintriz Aug 18, 2019
9a8593f
Change the alignment working directory from bam to alignment
bintriz Aug 19, 2019
6dd26b5
Change java path
bintriz Aug 19, 2019
a49dd77
Update wrapper scripts
bintriz Aug 20, 2019
e250db8
Rename wrapper scripts
bintriz Aug 20, 2019
88d921d
Remove blank lines in run_info
bintriz Aug 20, 2019
4b69ff7
Bug fix: download script
bintriz Aug 20, 2019
7ab9ed5
Reorganize job files & some bug fixes
bintriz Aug 20, 2019
4229e29
Bug fixes: wrong variable names
bintriz Aug 20, 2019
76ddb7c
Bug fix: GATK-HC jobs
bintriz Aug 25, 2019
eb5dd64
Rearrange job orders
bintriz Aug 25, 2019
50e8308
Bug fix: upload_cram
bintriz Aug 27, 2019
48ade49
Add run_jid to track submitted jids
bintriz Aug 28, 2019
856c8fd
Add the feature stamping run_status when finishing a job
bintriz Aug 29, 2019
60e3164
Bug fix: run_status stamp
bintriz Aug 31, 2019
fe96642
Add a feature of skipping to submit jobs for already running samples
bintriz Aug 31, 2019
08102d2
Change in printing maessage
bintriz Aug 31, 2019
a896377
Better logging for the download job
bintriz Sep 3, 2019
5496599
Add the limit of concurrent download jobs
bintriz Sep 3, 2019
ad0d0e5
Add con-down-limit
bintriz Sep 6, 2019
0f4df19
Fix bug: Skipping jobs for gatk-hc ploidy_12,50
bintriz Sep 6, 2019
19c7413
Bug fix: symbolic link problem when running in a local cluster
bintriz Sep 6, 2019
21d32dd
Bug fix: ssh StricHostKeyChecking option in the jobs of submitting ot…
bintriz Sep 9, 2019
2ab14a6
Bug fix: run ssh excute remote commands as a login shell
bintriz Sep 9, 2019
a1fca07
Turn on the rerun otpion (-r y) of all jobs
bintriz Sep 9, 2019
898656b
Change threaded option for downloading
bintriz Sep 9, 2019
aa64b64
Bug fix: aln_2.merge error when reruning it
bintriz Sep 12, 2019
717933b
Bug fix: Add a feature of deleting failed download files
bintriz Sep 12, 2019
af407b4
Adjust thread num in download job
bintriz Sep 12, 2019
1b113ab
Adjust default con-down-limit value according to the thread num of th…
bintriz Sep 12, 2019
5f6f159
Increase the java memeory of the markdup job
bintriz Sep 12, 2019
8f11aec
Add trap in gatk-hc jobs
bintriz Sep 20, 2019
98879ed
Bug fix: correct log messages of aln_5.bqsr.sh
bintriz Sep 20, 2019
ef8dc78
gatk-hc vqsr: increase the parameter of maxum-training-variants to 50…
bintriz Sep 21, 2019
1a8ba08
Bug fix: duplicate jobs in running when the pre_2.submit_aln_jobs.sh …
bintriz Oct 7, 2019
3b14b90
Separate checkpoint per each step in a same job.
bintriz Oct 7, 2019
e3ec9b4
Incorporating bam2cram conversion steps to variant calling.
bintriz Oct 9, 2019
c3880b8
Bug fix
bintriz Oct 10, 2019
1dc66ef
Bug fix: wrong bash variable name
bintriz Oct 10, 2019
65e95a0
Bug fix & refactoring of run_genome_mapping.py
bintriz Oct 10, 2019
86b8ed1
Bug fix: Specify tmp path in bam2cram
bintriz Oct 11, 2019
4b8b386
Remove the existing sam when bam2cram starts
bintriz Oct 11, 2019
923460b
Add a new script that monitoring and resubmitting hanging jobs in queue
bintriz Oct 11, 2019
2471bd5
Change the logic of split fq files per lane: processing files by R1 a…
bintriz Oct 18, 2019
299a645
Change Picard version 2.17.4 for error correction
bintriz Nov 14, 2019
2a2b3ec
Minor fix in log output
bintriz Nov 14, 2019
c24702e
Bug fix in the mapping job submitter
bintriz Nov 14, 2019
db25ff0
Bug fix: creating tmp directory which running variant calling jobs only
bintriz Nov 14, 2019
31b4898
Bug fix: no bai file of the merge bam when only one read group exists
bintriz Nov 14, 2019
de2afb3
Bug fix: error in markdup due to not enough tmp space
bintriz Nov 14, 2019
a574195
Bug fix: flagstat isn't excuted
bintriz Nov 18, 2019
a7fa1e1
Add a python util for calculating STR info
bintriz Dec 3, 2019
a9b6b84
Bug fix: changing regex for grabbing fastq file name
bintriz Dec 27, 2019
e3f0c1e
Increasing the max bam file size to handle up to 2TB
bintriz Dec 27, 2019
0e91a12
Merge branch 'master' of https://github.com/bsmn/bsmn-pipeline
bintriz Dec 27, 2019
b6b0045
downloaded archived tools
attilagk Mar 11, 2020
7caff9c
Downloads to synapse started
attilagk Mar 17, 2020
9f2701d
downloads upload to synapse now works; TODO: documentation
attilagk Mar 17, 2020
3dffa9d
downloads to synapse script documented
attilagk Mar 17, 2020
aa70da8
added diagnostic status messages
attilagk Mar 18, 2020
5cc10c8
fixing install_tools.sh
attilagk Mar 18, 2020
4c0cc79
removed Python3 and pip installation from install_tools.sh
Mar 18, 2020
d606a46
installfix in progress
attilagk Mar 19, 2020
c6649b6
completed install_tools.sh fix; some tools could not be fixed
attilagk Mar 19, 2020
6a3b4b7
zlib to install_tools
attilagk Mar 19, 2020
aaca406
Made download_resources.sh more efficient and simple
attilagk Mar 19, 2020
069fd34
started documenting the installfix branch
attilagk Mar 20, 2020
c1d5e36
updated README.md with Versions section
attilagk Mar 21, 2020
d494cc4
Fixed misformatting in README
attilagk Mar 23, 2020
2e2f650
adding tests directory
attilagk Mar 25, 2020
9ca86d9
removed duplicate args to pip
Mar 26, 2020
79bd5d0
updated config.read function with tools not installed under bsmn-pipe…
attilagk Apr 1, 2020
d6aa92c
cython installation is added to install_tools
attilagk Apr 6, 2020
b9a4f67
quick fix for config allowed mapping to run down to aln_1.align_sort.sh
attilagk Apr 8, 2020
7d6582d
mapping ran until indel realignment
attilagk Apr 13, 2020
c05ebb2
NDAR credentials to be modified
attilagk Apr 16, 2020
d53f6cb
fixing path to awscli
attilagk Apr 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,12 @@
__pycache__/
*.py[cod]
*$py.class
*.swq
*.swp

# tools / resources directories
/tools/
/resources/
/downloads/
/dependencies/
/tests/
132 changes: 132 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# bsmn_pipeline
BSMN common data processing pipeline

# Setup and installation
This pipeline can be run in any cluster system using SGE job scheduler. I would recommend set your own cluster in AWS using AWS ParallelCluster.

## AWS ParallelCluster
For installing and setting up `parallelcluster`, please see the [`Getting Started Guide`](https://aws-parallelcluster.readthedocs.io/en/latest/getting_started.html) for AWS ParallelCluster.

## Installing pipeline
Clone this repository where you want it installed in your cluster. If you work with an m5.large type AWS EC2 instance we recommend the file systems mounted at `/shared` or `/efs`.
```
$ cd /shared
$ git clone https://github.com/bsmn/bsmn_pipeline
```

Install software dependencies into `bsmn_pipeline/tools` running the following script.
```
$ cd bsmn_pipeline
$ ./install_tools.sh
```

Download required resource files including the reference sequence. This step require a synapse account that can access to the Synapse page syn17062535.
```
$ ./download_resources.sh
```

## Extra set up for SGE
The pipeline require a parallel environment named "threaded" in your SGE system. If your SGE system doen't have this parallel environment, you should add it into yours.
```
$ cat >threaded.conf <<END
pe_name threaded
slots 99999
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule \$pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary TRUE
qsort_args NONE
END
```
```
$ sudo su
# qconf -Ap threaded.conf
# qconf -mattr queue pe_list threaded all.q
```

# Usage
## genome_mapping
Run the pipeline using a wrapper shell script.
```bash
genome_mapping.sh sample_list.txt
```

### sample_list.txt format
The lines starting with # will be commented out and ignored. The header line should start with # as well. Eg.
```
#sample_id file_name location
5154_brain-BSMN_REF_brain-534-U01MH106876 bulk_sorted.bam syn10639574
5154_fibroblast-BSMN_REF_fibroblasts-534-U01MH106876 fibroblasts_sorted.bam syn10639575
5154_NeuN_positive-BSMN_REF_NeuN+_E12-677-U01MH106876 E12_MDA_common_sorted.bam s3://nda-bsmn/abyzova_1497485007384/data/E12_MDA_common_sorted.bam
5154_NeuN_positive-BSMN_REF_NeuN+_C12-677-U01MH106876 C12_MDA_common_sorted.bam /efs/data/C12_MDA_common_sorted.bam
```
The "location" column can be a Synape ID, S3Uri of the NDA or a user, or LocalPath. For Data download, synapse or aws clients, or symbolic lins will be used, respectively.

### options
```
--parentid syn123
```
With parentid option, you can specify a Synapse ID of project or folder where to upload result bam files. If it is set, the result bam files will be uploaded into Synapse and deleted. Otherwise, they will be locally kept.

# Contributing

The `master` branch is protected. To make introduce changes:

1. Fork this repository
2. Open a branch with your github username and a short descriptive statement (like `kdaily-update-readme`). If there is an open issue on this repository, name your branch after the issue (like `kdaily-issue-7`).
3. Open a pull request and request a review.

# Versions

## v1.10 (installfix)

This version fixes broken URLs in the previous versions of `install_tools.sh` and `download_resources.sh` that prevented installing or fetching the dependencies of the pipeline.

To deal with the impermanence of URLs pointing to some of the dependencies the "installfix" branch of development gathered all resources and stored them on Synapse in a single folder called [bsmn-pipeline-dependencies](https://www.synapse.org/#!Synapse:syn21782058) (syn21782058). Under this main folder, actually, resources and tools have been stored in their corresponding subfolders (resources: syn21782062, tools: syn21782261). `install_tools.sh` and `download_resources.sh` now needs to refer only to the bsmn-pipeline-dependencies Synapse folder and its two subfolders instead of many volatile URLs pointing to individual tools/resources.

The `install_tools.sh` and `download_resources.sh` scripts have been successfully tested under Amazon Linux AMI 2018.03 and Ubuntu 18.04.4 LTS (both for the desktop and server edition). It was also tested under Debian GNU/Linux 10 (buster) but some of the tools failed to build from source when invoking `install_tools.sh`.

### TODOs

* The following tools are currently omitted from `install_tools.sh` due to issues with building from source (mostly under Ubuntu 18.04.4 LTS)
1. [cnvnator](https://github.com/abyzovlab/CNVnator) is currently excluded because its dependency [root](https://root.cern.ch/) framework failed to build from source
1. R and MosaicForecast
1. Perl
1. exonerate
* Some resources for MosaicForecast are currently missing from `download_resources.sh`.
* Test alignment and variant calling with GATK HaplotypeCaller and update documentation
* Implement variant calling with MuTect2. This will likely use Sentieon Tools' TNhapoltyper, which the faster reimplementation of GATK's original MuTect2 implementation
* Implement filtering the raw callsets of HaplotypeCaller MuTect2 by either or both of the following alternatives
1. the BSMN best practices heuristic filters
1. MosaicForecast



The present script was used for that operation. Its usage is as follows:

```
> dependencies_to_synapse.py maindir synapseParentID
```

"maindir" is the path to a local directory with its resources and tools
subdirectories, each containing dependencies packaged in file archives.
"synapseParentID" is the Synapse project or folder where the
bsmn-pipeline-dependencies Synapse folder will be created with its own
resources and tools Synapse subfolders.

```
maindir
|--resources
|--tools
```

## v1.00

This version was used by Taejeong Bae to produce the first batch of AWS cloud based results for the entire BSMN consortium. Its two main functionalities are:
* alignment of reads to the hs37d5 reference genome to produce BAM and/or CRAM files
* calling somatic variants with GATK HaplotypeCaller in its polyploid mode
30 changes: 30 additions & 0 deletions config.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
[TOOLS]
PYTHON3 = /usr/local/pyenv/shims/python3
SYNAPSE = ~/.local/bin/synapse
AWS = ~/.local/bin/aws
JAVA = tools/java/jdk1.8.0_222/bin/java
BWA = tools/bwa/0.7.16a/bin/bwa
SAMTOOLS = tools/samtools/1.7/bin/samtools
SAMBAMBA = tools/sambamba/0.6.7/bin/sambamba
GATK = tools/gatk/3.7-0/GenomeAnalysisTK.jar
GATK4 = tools/gatk/4.1-2/gatk
PICARD = tools/picard/2.17.4/picard.jar
BGZIP = tools/htslib/1.7/bin/bgzip
TABIX = tools/htslib/1.7/bin/tabix
VT = tools/vt/2018-06-07/bin/vt
BCFTOOLS = tools/bcftools/1.7/bin/bcftools
ROOTSYS = tools/root/6.14.00
CNVNATOR = tools/cnvnator/0.4/bin/cnvnator

[RESOURCES]
REFDIR = resources
REF = resources/hs37d5.fa
DBSNP = resources/dbsnp_138.b37.vcf
MILLS = resources/Mills_and_1000G_gold_standard.indels.b37.vcf
INDEL1KG = resources/1000G_phase1.indels.b37.vcf
OMNI = resources/1000G_omni2.5.b37.vcf
HAPMAP = resources/hapmap_3.3.b37.vcf
SNP1KG = resources/1000G_phase1.snps.high_confidence.b37.vcf
KNOWN_GERM_SNP = resources/gnomAD.1KG.ExAC.ESP6500.Kaviar.snps.txt.gz
MASK1KG = resources/20141020.strict_mask.whole_genome.fasta.gz
GNOMAD = resources/af-only-gnomad.raw.sites.b37.vcf.gz
93 changes: 93 additions & 0 deletions download_resources.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/bin/bash

# Sources:

# GNOMAD sites
# ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz

# UMAP mappability score:
# https://bismap.hoffmanlab.org/raw/hg19.umap.tar.gz
# https://bismap.hoffmanlab.org/raw/hg19.umap.tar.gz

# Segmental Duplication regions (should be removed before calling all kinds of mosaics):
# http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz

# Simple repeats (should be removed before calling mosaic INDELS):
# http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz

# Regions enriched for SNPs with >=3 haplotypes (should be removed before calling all kinds of mosaics):
# the link seems broken
# https://raw.githubusercontent.com/parklab/MosaicForecast/master/resources/predictedhap3ormore_cluster.bed

maindir=$(dirname $(realpath $0))
resdir=$maindir/resources

function do_download() {
wd=`pwd`
cd $resdir
# Synapse login
# Ensure your credentials are in ~/.synapseConfig!
synapse login
# Download all resources
synapse get syn21782062 --recursive
cd $wd
}

if test ! -d $resdir; then
mkdir -p $resdir
do_download
else
if test ! -f $resdir/SYNAPSE_METADATA_MANIFEST.tsv; then
do_download
fi
fi

cd $resdir


# Extract the human ref genome
test -f hs37d5.fa || gunzip hs37d5.fa.gz

# Extract all VCFs except for GNOMAD
mv af-only-gnomad.raw.sites.b37.vcf.gz{,~}
for F in *.vcf.*gz; do
gunzip $F 2> /dev/null
done
mv af-only-gnomad.raw.sites.b37.vcf.gz{~,}

if test ! -f chr1.fa; then
# Split the ref genome by chromosome
awk '{
r = match($1, "^>");
if (r != 0) {
filename = "chr"substr($1, 2, length($1))".fa";
print $0 > filename;
}
else {
print $0 >> filename;
}
}' hs37d5.fa
rm chrGL* chrhs37d5.fa chrNC_007605.fa
fi

# Exiting here; the resources below are for MosaicForecast
exit

# Download UMAP mappability score:
cd resources
wget -qO- resources https://bismap.hoffmanlab.org/raw/hg19.umap.tar.gz |tar xvz
cd ..
tools/ucsc/fetchChromSizes hg19 > resources/hg19/hg19.chrom.sizes
tools/ucsc/wigToBigWig <(zcat resources/hg19/k24.umap.wg.gz) resources/hg19/hg19.chrom.sizes resources/hg19/k24.umap.wg.bw

# Download repeat regions:
## Segmental Duplication regions (should be removed before calling all kinds of mosaics):
wget -P resources http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz
gunzip resources/genomicSuperDups.txt.gz

## Regions enriched for SNPs with >=3 haplotypes (should be removed before calling all kinds of mosaics):
wget -P resources https://raw.githubusercontent.com/parklab/MosaicForecast/master/resources/predictedhap3ormore_cluster.bed

## Simple repeats (should be removed before calling mosaic INDELS):
wget -P resources http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz
gunzip resources/simpleRepeat.txt.gz
27 changes: 0 additions & 27 deletions genome_mapping/job_scripts/aln_1.align_sort.sh

This file was deleted.

25 changes: 0 additions & 25 deletions genome_mapping/job_scripts/aln_2.merge_bam.sh

This file was deleted.

28 changes: 0 additions & 28 deletions genome_mapping/job_scripts/aln_3.markdup.sh

This file was deleted.

35 changes: 0 additions & 35 deletions genome_mapping/job_scripts/aln_4.indel_realign.sh

This file was deleted.

Loading