Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
avantonder committed Jul 19, 2024
2 parents 99aa486 + 66f67a8 commit 2ec8010
Show file tree
Hide file tree
Showing 7 changed files with 167 additions and 26 deletions.
4 changes: 2 additions & 2 deletions course_files/scripts/M_tuberculosis/02-run_bacqc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ nextflow run avantonder/bacQC \
--max_memory '16.GB' --max_cpus 8 \
--input FIX_SAMPLESHEET \
--outdir results/bacqc \
--kraken2db databases/minikraken2_v1_8GB \
--brackendb databases/minikraken2_v1_8GB \
--kraken2db databases/k2_standard_08gb_20240605 \
--brackendb databases/k2_standard_08gb_20240605 \
--genome_size FIX_GENOME_SIZE
4 changes: 2 additions & 2 deletions course_files/scripts/S_aureus/01-run_assemblebac.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ nextflow run avantonder/assembleBAC \
--max_memory '16.GB' --max_cpus 8 \
--input SAMPLESHEET \
--outdir results/assemblebac \
--baktadb databases/db-light \
--baktadb databases/bakta_light_20240119 \
--genome_size GENOME_SIZE \
--checkm2db databases/CheckM2_database/uniref100.KO.1.dmnd
--checkm2db databases/checkm2_v2_20210323/uniref100.KO.1.dmnd
8 changes: 4 additions & 4 deletions materials/09-bacqc.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ nextflow run avantonder/bacQC \
--max_memory '16.GB' --max_cpus 8 \
--input SAMPLESHEET \
--outdir results/bacqc \
--kraken2db databases/minikraken2_v1_8GB \
--brackendb databases/minikraken2_v1_8GB \
--kraken2db databases/k2_standard_08gb_20240605 \
--brackendb databases/k2_standard_08gb_20240605 \
--genome_size GENOME_SIZE
```

Expand Down Expand Up @@ -113,8 +113,8 @@ nextflow run avantonder/bacQC \
--max_memory '16.GB' --max_cpus 8 \
--input samplesheet.csv \
--outdir results/bacqc \
--kraken2db databases/minikraken2_v1_8GB \
--brackendb databases/minikraken2_v1_8GB \
--kraken2db databases/k2_standard_08gb_20240605 \
--brackendb databases/k2_standard_08gb_20240605 \
--genome_size 4300000
```

Expand Down
6 changes: 3 additions & 3 deletions materials/20-assemblebac.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ nextflow run avantonder/assembleBAC \
--max_memory '16.GB' --max_cpus 8 \
--input SAMPLESHEET \
--outdir results/assemblebac \
--baktadb databases/db-light \
--baktadb databases/bakta_light_20240119 \
--genome_size GENOME_SIZE \
--checkm2db databases/checkme2/uniref100.KO.1.dmnd
```
Expand Down Expand Up @@ -101,9 +101,9 @@ nextflow run avantonder/assembleBAC \
--max_memory '16.GB' --max_cpus 8 \
--input samplesheet.csv \
--outdir results/assemblebac \
--baktadb databases/db-light \
--baktadb databases/bakta_light_20240119 \
--genome_size 2M \
--checkm2db databases/CheckM2_database/uniref100.KO.1.dmnd
--checkm2db databases/checkm2_v2_20210323/uniref100.KO.1.dmnd
```

After activating the software environment, we ran the script as instructed using:
Expand Down
44 changes: 29 additions & 15 deletions setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,10 +307,10 @@ You can follow the same instructions as for "Ubuntu".

## Data

The data used in these materials is provided as an archive file (`bacterial-genomics-data.tar`).
The data used in these materials is provided as an archive file (`bact-data.tar.gz`).
You can download it from the link below and extract the files from the archive into a directory of your choice.

<a href="https://www.dropbox.com/scl/fo/k2jyjgfsblfxjcktwlsmg/h?rlkey=6qov67ani513j2tom8pjl9ncm&dl=0">
<a href="https://www.dropbox.com/scl/fi/osjpmst8i2919fv3by3eh/bact-data.tar.gz?rlkey=iddkexnnm6ccsx8prfcay259u&dl=0">
<button class="btn"><i class="fa fa-download"></i> Download</button>
</a>

Expand All @@ -322,8 +322,8 @@ datadir="$HOME/Desktop/bacterial_genomics"

# download and extract to directory
mkdir $datadir
wget -O $datadir/bact-data.tar "https://www.dropbox.com/scl/fi/ba1ws6jx045jjq96m4bum/bacterial-genomics-data.tar?rlkey=thssczgyl9n32gvtdjwi1673f&dl=1"
tar -xvf $datadir/bact-data.tar -C $datadir
wget -O $datadir/bact-data.tar.gz "https://www.dropbox.com/scl/fi/osjpmst8i2919fv3by3eh/bact-data.tar.gz?rlkey=iddkexnnm6ccsx8prfcay259u&dl=1"
tar -xzvf $datadir/bact-data.tar.gz -C $datadir
rm $datadir/bact-data.tar
```

Expand All @@ -333,7 +333,7 @@ rm $datadir/bact-data.tar
We include a copy of public databases used in the exercises in the dropbox link above.
However, for your analyses you should always download the most up-to-date databases.

Our convention is to download these databases into a directory called `resources`.
In the code below we download these databases into a directory called `databases`.
This is optional, you can download the databases where it is most convenient for you.
If you work in a research group, it's a good idea to have a shared storage where everyone can access the same copy of the databases.

Expand All @@ -345,36 +345,50 @@ cd databases

#### Kraken2

We use a small version of the database for teaching purposes, whereas you may want to use the full version in your work.
Look at the [Kraken2 indexes page](https://benlangmead.github.io/aws-indexes/k2) for the latest versions available.

```bash
wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v1_8GB_201904.tgz
tar -xzf minikraken2_v1_8GB_201904.tgz
rm minikraken2_v1_8GB_201904.tgz
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240605.tar.gz
mkdir k2_standard_08gb_20240605
tar -xzvf k2_standard_08gb_20240605.tar.gz -C k2_standard_08gb_20240605
rm k2_standard_08gb_20240605.tar.gz
```

#### Bakta

We use the "light" version of the database for teaching purposes, whereas you may want to use the full version in your work.
Look at the [Bakta Zenodo repository](https://zenodo.org/records/10522951) for the latest versions available.

```bash
wget https://zenodo.org/record/7669534/files/db-light.tar.gz
tar -xzf db-light.tar.gz
wget https://zenodo.org/records/10522951/files/db-light.tar.gz
tar -xzvf db-light.tar.gz
mv db-light bakta_light_20240119
rm db-light.tar.gz

# make sure to activate bakta environment
mamba activate bakta
amrfinder_update --force_update --database db-light/amrfinderplus-db/
amrfinder_update --force_update --database bakta_light_20240119/amrfinderplus-db/
```

#### CheckM2

CheckM2 also provides a command `checkm2 database --download` to download the latest version of the database [from Zenodo](https://zenodo.org/records/5571251).

```bash
wget https://zenodo.org/records/5571251/files/checkm2_database.tar.gz?download=1
tar -xzf checkm2_database.tar.gz
wget https://zenodo.org/records/5571251/files/checkm2_database.tar.gz
tar -xzvf checkm2_database.tar.gz
mv CheckM2_database checkm2_v2_20210323
rm checkm2_database.tar.gz CONTENTS.json
```

#### GPSCs

```bash
wget https://gps-project.cog.sanger.ac.uk/GPS_v8_ref.tar.gz
tar -xzf GPS_v8_ref.tar.gz
mkdir poppunk
tar -xzvf GPS_v8_ref.tar.gz -C poppunk
rm GPS_v8_ref.tar.gz

wget https://gps-project.cog.sanger.ac.uk/GPS_v8_external_clusters.csv
wget -O poppunk/GPS_v8_external_clusters.csv https://gps-project.cog.sanger.ac.uk/GPS_v8_external_clusters.csv
```
74 changes: 74 additions & 0 deletions utils/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Use the official Ubuntu image as a base
FROM ubuntu:24.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive

# Update and install required packages
RUN apt update && \
apt install -y wget git default-jre

# create user
RUN useradd -ms /bin/bash participant

# switch to participant
USER participant
WORKDIR /home/participant

# Install Miniforge
RUN wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-$(uname -m).sh" && \
bash Miniforge3-Linux-$(uname -m).sh -b -p /home/participant/miniforge3 && \
rm Miniforge3-Linux-$(uname -m).sh && \
/home/participant/miniforge3/bin/mamba init

ENV PATH="/home/participant/miniforge3/bin:$PATH"


# Setup Conda channels and config
RUN conda config --add channels defaults && \
conda config --add channels bioconda && \
conda config --add channels conda-forge && \
conda config --set remote_read_timeout_secs 1000

# Install conda packages and create environments
RUN mamba install -y -n base pandas && \
mamba create -y -n bakta bakta && \
mamba create -y -n gubbins gubbins && \
mamba create -y -n iqtree iqtree snp-sites biopython && \
mamba create -y -n mlst mlst && \
mamba create -y -n nextflow nextflow && \
mamba create -y -n pairsnp pairsnp && \
mamba create -y -n panaroo python=3.9 panaroo>=1.3 snp-sites && \
mamba create -y -n poppunk python=3.10 poppunk && \
mamba create -y -n remove_blocks python=2.7 && \
/home/participant/miniforge3/envs/remove_blocks/bin/pip install git+https://github.com/sanger-pathogens/remove_blocks_from_aln.git && \
mamba create -y -n seqtk seqtk pandas && \
mamba create -y -n tb-profiler tb-profiler pandas && \
mamba create -y -n treetime treetime seqkit biopython

# Setup Nextflow config
RUN mkdir -p /home/participant/.nextflow && \
echo "\
conda { \
conda.enabled = true \
singularity.enabled = false \
docker.enabled = false \
useMamba = true \
createTimeout = '4 h' \
cacheDir = '/home/participant/.nextflow-conda-cache/' \
} \
singularity { \
singularity.enabled = true \
conda.enabled = false \
docker.enabled = false \
pullTimeout = '4 h' \
cacheDir = '/home/participant/.nextflow-singularity-cache/' \
} \
docker { \
docker.enabled = true \
singularity.enabled = false \
conda.enabled = false \
}" >> /home/participant/.nextflow/config

# Set the default command
CMD ["/bin/bash"]
53 changes: 53 additions & 0 deletions utils/download_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/bin/bash

# exit on error
set -e

# check for amrfinder_update
if ! command -v "amrfinder_update" &> /dev/null; then
echo "Error: amrfinder_update is not available on the PATH. You can install it with: mamba create -n bakta bakta" >&2
exit 1
fi

# Download and extract course data
echo "Downloading and extracting course files"
wget -O bact-data.tar.gz "https://www.dropbox.com/scl/fi/osjpmst8i2919fv3by3eh/bact-data.tar.gz?rlkey=iddkexnnm6ccsx8prfcay259u&dl=1"
tar -xzf bact-data.tar.gz
rm bact-data.tar.gz

# Download and extract public databases
mkdir databases
cd databases

# kraken2
echo "Downloading and extracting Kraken2 database"
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240605.tar.gz
mkdir k2_standard_08gb_20240605
tar -xzf k2_standard_08gb_20240605.tar.gz -C k2_standard_08gb_20240605
rm k2_standard_08gb_20240605.tar.gz

# bakta
echo "Downloading and extracting Bakta database"
wget https://zenodo.org/records/10522951/files/db-light.tar.gz
tar -xzf db-light.tar.gz
mv db-light bakta_light_20240119
rm db-light.tar.gz

# update amrfinder database
amrfinder_update --force_update --database bakta_light_20240119/amrfinderplus-db/

# checkm2
echo "Downloading and extracting CheckM2 database"
wget https://zenodo.org/records/5571251/files/checkm2_database.tar.gz
tar -xzf checkm2_database.tar.gz
mv CheckM2_database checkm2_v2_20210323
rm checkm2_database.tar.gz CONTENTS.json

# poppunk
echo "Downloading and extracting PopPunk database"
wget https://gps-project.cog.sanger.ac.uk/GPS_v8_ref.tar.gz
mkdir poppunk
tar -xzf GPS_v8_ref.tar.gz -C poppunk
rm GPS_v8_ref.tar.gz

wget -O poppunk/GPS_v8_external_clusters.csv https://gps-project.cog.sanger.ac.uk/GPS_v8_external_clusters.csv

0 comments on commit 2ec8010

Please sign in to comment.