A Snakemake pipeline for genomic reference management
- Build mamba/conda environment:
Note - installing
gget
withconda
has given me issues, I recommend usingpip
mamba env create --name ref_snake --file envs/ref_snake.yml
mamba activate ref_snake
pip install gget snakemake
mamba install -c conda-forge pigz
Another note - if you already have these installed on your system, you can update the executable paths in config.yml
(EXEC
) as they are all called using this info
- Modify
config.yaml
to include what species you want to build refs for
SPECIES:
mus_musculus
homo_sapiens
- Run pipeline:
snakemake --use-conda --conda-frontend mamba -j 16
Run pipeline w/ slurm:
snakemake --cluster-config slurm_config.yml --cluster "sbatch -p {cluster.partition} -t {cluster.time} -N {cluster.nodes} --mem {cluster.mem} -o {cluster.output} --cpus-per-task={cluster.threads}" -j 32 --cluster-cancel scancel --use-conda --conda-frontend mamba
See out/README.md
for info on the files output for each species
- Download GENCODE references w/
wget
Mus musculus:
wget -e robots=off --recursive --no-parent https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/latest_release/
optionally, run in the background b/c this takes a while...
nohup wget -e robots=off --recursive --no-parent https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/latest_release > wget_output.log 2>&1 &
Homo sapiens:
wget -e robots=off --recursive --no-parent https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/latest_release/
optionally, run in the background b/c this takes a while...
nohup wget -e robots=off --recursive --no-parent https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/latest_release/ > wget_output.log 2>&1 &