Skip to content

Commit

Permalink
Change code overflow and update sample list
Browse files Browse the repository at this point in the history
  • Loading branch information
johnne committed Nov 26, 2024
1 parent afe2e8b commit fc63851
Showing 1 changed file with 29 additions and 29 deletions.
58 changes: 29 additions & 29 deletions pages/snakemake.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -908,7 +908,7 @@ In the MRSA workflow most of the programs are run with default settings and
don't use the `params:` directive. However, the `get_SRA_by_accession` rule
is an exception. Let's take a look at this part of the workflow:
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
def get_sample_url(wildcards):
samples = {
"SRR935090": "https://figshare.scilifelab.se/ndownloader/files/39539767",
Expand Down Expand Up @@ -961,7 +961,7 @@ need help, click to show the solution below.
::: {.callout-tip collapse="true" title="Click to show"}
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule get_SRA_by_accession:
"""
Retrieve a single-read FASTQ file
Expand Down Expand Up @@ -993,7 +993,7 @@ be static, they can be any Python expression. In particular, Snakemake provides
a global dictionary of configuration parameters called `config`. Let's modify
`get_SRA_by_accession` in order to make use of this dictionary:
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule get_SRA_by_accession:
"""
Retrieve a single-read FASTQ file
Expand Down Expand Up @@ -1151,7 +1151,7 @@ need help with how to redirect output to the log file.
If you need help, click to show the solution below for the rules.
::: {.callout-tip collapse="true" title="Click to show"}
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule multiqc:
"""
Aggregate all FastQC reports into a MultiQC report.
Expand Down Expand Up @@ -1260,7 +1260,7 @@ terminal.
::: {.callout-tip}
If you have a rule with a shell directive in which several commands are run
and you want to save stdout and stderr for all commands into the same log file
you can add `exec &{log}` as the first line of the shell directive.
you can add `exec &>{log}` as the first line of the shell directive.
:::
If you run with `-D` (or `-S` for a simpler version) you will see that the
Expand Down Expand Up @@ -1482,7 +1482,7 @@ See if you can update the `generate_count_table` rule in the same manner. If you
need help, click the solution below.
::: {.callout-tip collapse="true" title="Click to show"}
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule generate_count_table:
"""
Generate a count table using featureCounts.
Expand Down Expand Up @@ -1676,7 +1676,7 @@ flexible by moving the sample ids and URLs to a configuration file and turning t
Remove the `sample_ids = ["SRR935090", "SRR935091", "SRR935092"]` line we added
to the top of `snakefile_mrsa.smk` and add the following to `config.yml`:
```{.yaml filename="config.yml"}
```{.yaml filename="config.yml" .code-overflow-scroll}
samples:
SRR935090: "https://figshare.scilifelab.se/ndownloader/files/39539767"
SRR935091: "https://figshare.scilifelab.se/ndownloader/files/39539770"
Expand Down Expand Up @@ -1706,7 +1706,7 @@ Now change the `multiqc` and `generate_count_table` rules that use the `expand`
Try to change the `generate_count_table` rule in the same way. Check the solution below if you need help.
::: {.callout-tip collapse="true" title="Click to show"}
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule generate_count_table:
"""
Generate a count table using featureCounts.
Expand Down Expand Up @@ -1753,7 +1753,7 @@ workflow.
In the `get_genome_fasta` and `get_genome_gff3` rules we have hard-coded FTP
paths to the FASTA and GFF annotation file for the genome `NCTC8325`. Let's move this information to the configuration file, and also add information for another genome, `ST398`. Add the following to `config.yml`:
```{.yaml filename="config.yml"}
```{.yaml filename="config.yml" .code-overflow-scroll}
genomes:
NCTC8325:
fasta: ftp://ftp.ensemblgenomes.org/pub/bacteria/release-37/fasta/bacteria_18_collection/staphylococcus_aureus_subsp_aureus_nctc_8325/dna//Staphylococcus_aureus_subsp_aureus_nctc_8325.ASM1342v1.dna_rm.toplevel.fa.gz
Expand All @@ -1771,7 +1771,7 @@ Let's now look at how to do the mapping from genome id to FASTA path in the
rule `get_genome_fasta`. This is how the rule currently looks (if you have
added the log section as previously described).
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule get_genome_fasta:
"""
Retrieve the sequence in fasta format for a genome.
Expand Down Expand Up @@ -1894,19 +1894,19 @@ Now we can resolve the `genome_id` wildcard from the config. For
`align_to_genome` we change the input directive to:
```python
input:
"data/{sample_id}.fastq.gz",
expand("results/bowtie2/{genome_id}.{substr}.bt2",
genome_id = config["genome_id"],
substr = ["1", "2", "3", "4", "rev.1", "rev.2"])
input:
"data/{sample_id}.fastq.gz",
expand("results/bowtie2/{genome_id}.{substr}.bt2",
genome_id = config["genome_id"],
substr = ["1", "2", "3", "4", "rev.1", "rev.2"])
```
Here the `substr` wildcard gets expanded from a list while `genome_id` gets
expanded from the config dictionary. Also change the hard-coded `NCTC8325` in
the `shell:` directive of `align_to_genome` so that the genome_id is inserted
directly from the config, like this:
```python
```{.python .code-overflow-scroll}
shell:
"""
bowtie2 -x results/bowtie2/{config[genome_id]} -U {input[0]} > {output} 2>{log}
Expand All @@ -1915,7 +1915,7 @@ shell:
The final rule should look like this:
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule align_to_genome:
"""
Align a fastq file to a genome index using Bowtie 2.
Expand All @@ -1938,7 +1938,7 @@ rule align_to_genome:
Now let's change the hard-coded genome id in the `generate_count_table` input in
a similar manner. The final rule should look like this:
```{.python filename="snakefile_mrsa.smk"}
```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
rule generate_count_table:
"""
Generate a count table using featureCounts.
Expand Down Expand Up @@ -2006,12 +2006,12 @@ So far we've specified the samples to use in the workflow either as a hard-coded
list in the Snakefile, or as a list in the configuration file. This is of course
impractical for large real-world examples. Here we'll just quickly show how you
could supply the samples to the MRSA workflow via a tab-separated file. For
example you could create a file called `samples.tsv` with the following content:
example you could create a file called `samples.csv` with the following content:
```{.txt filename="samples.tsv"}
SRR935090 https://figshare.scilifelab.se/ndownloader/files/39539767
SRR935091 https://figshare.scilifelab.se/ndownloader/files/39539770
SRR935092 https://figshare.scilifelab.se/ndownloader/files/39539773
```{.txt filename="samples.csv"}
SRR935090,https://figshare.scilifelab.se/ndownloader/files/39539767
SRR935091,https://figshare.scilifelab.se/ndownloader/files/39539770
SRR935092,https://figshare.scilifelab.se/ndownloader/files/39539773
```
The first column has the sample id and the second column has the url to the
Expand All @@ -2023,11 +2023,11 @@ Snakemake we'll just add the following lines to the top of the Snakefile:
# define an empty 'samples' dictionary
samples = {}
# read the sample list file and populate the dictionary
with open("samples.tsv", "r") as fhin:
with open("samples.csv", "r") as fhin:
for line in fhin:
# strip the newline character from the end of the line
# then split by tab character to get the sample id and url
sample_id, url = line.strip().split("\t")
sample_id, url = line.strip().split(",")
# store the url in the dictionary with the sample id as key
samples[sample_id] = url
```
Expand All @@ -2050,11 +2050,11 @@ We can also use the `samples` dictionary in `expand()`, for example in the `mult
sample_id = samples.keys())
```
Now this depends on there being a `samples.tsv` file in the working directory.
Now this depends on there being a `samples.csv` file in the working directory.
To make this a configurable parameter we can add it to the config file:
```yaml
sample_list: "samples.tsv"
sample_list: "samples.csv"
```
and update the code for populating the `samples` dictionary:
Expand All @@ -2067,7 +2067,7 @@ with open(config["sample_list"], "r") as fhin:
for line in fhin:
# strip the newline character from the end of the line
# then split by tab character to get the sample id and url
sample_id, url = line.strip().split("\t")
sample_id, url = line.strip().split(",")
# store the url in the dictionary with the sample id as key
samples[sample_id] = url
```
Expand Down Expand Up @@ -2098,7 +2098,7 @@ Here is an example for a rule and its execution:
```python
rule align_to_genome:
output:
temp("results/bam/{sample_id,\w+}.bam")
temp("results/bam/{sample_id,\\w+}.bam")
input:
fastq = "data/{sample_id}.fastq.gz",
index = expand("results/bowtie2/{genome_id}.{substr}.bt2",
Expand Down

0 comments on commit fc63851

Please sign in to comment.