Change code overflow and update sample list

NBISweden · Nov 26, 2024 · fc63851 · fc63851
1 parent afe2e8b
commit fc63851
Showing 1 changed file with 29 additions and 29 deletions.
diff --git a/pages/snakemake.qmd b/pages/snakemake.qmd
@@ -908,7 +908,7 @@ In the MRSA workflow most of the programs are run with default settings and
 don't use the `params:` directive. However, the `get_SRA_by_accession` rule
 is an exception. Let's take a look at this part of the workflow:
 
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 def get_sample_url(wildcards):
     samples = {
         "SRR935090": "https://figshare.scilifelab.se/ndownloader/files/39539767",
@@ -961,7 +961,7 @@ need help, click to show the solution below.
 
 ::: {.callout-tip collapse="true" title="Click to show"}
 
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule get_SRA_by_accession:
     """
     Retrieve a single-read FASTQ file
@@ -993,7 +993,7 @@ be static, they can be any Python expression. In particular, Snakemake provides
 a global dictionary of configuration parameters called `config`. Let's modify
 `get_SRA_by_accession` in order to make use of this dictionary:
 
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule get_SRA_by_accession:
     """
     Retrieve a single-read FASTQ file
@@ -1151,7 +1151,7 @@ need help with how to redirect output to the log file.
 If you need help, click to show the solution below for the rules.
 
 ::: {.callout-tip collapse="true" title="Click to show"}
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule multiqc:
     """
     Aggregate all FastQC reports into a MultiQC report.
@@ -1260,7 +1260,7 @@ terminal.
 ::: {.callout-tip}
 If you have a rule with a shell directive in which several commands are run
 and you want to save stdout and stderr for all commands into the same log file
-you can add `exec &{log}` as the first line of the shell directive.
+you can add `exec &>{log}` as the first line of the shell directive.
 :::
 
 If you run with `-D` (or `-S` for a simpler version) you will see that the
@@ -1482,7 +1482,7 @@ See if you can update the `generate_count_table` rule in the same manner. If you
 need help, click the solution below.
 
 ::: {.callout-tip collapse="true" title="Click to show"}
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule generate_count_table:
     """
     Generate a count table using featureCounts.
@@ -1676,7 +1676,7 @@ flexible by moving the sample ids and URLs to a configuration file and turning t
 Remove the `sample_ids = ["SRR935090", "SRR935091", "SRR935092"]` line we added
 to the top of `snakefile_mrsa.smk` and add the following to `config.yml`:
 
-```{.yaml filename="config.yml"}
+```{.yaml filename="config.yml" .code-overflow-scroll}
 samples:
   SRR935090: "https://figshare.scilifelab.se/ndownloader/files/39539767"
   SRR935091: "https://figshare.scilifelab.se/ndownloader/files/39539770"
@@ -1706,7 +1706,7 @@ Now change the `multiqc` and `generate_count_table` rules that use the `expand`
 Try to change the `generate_count_table` rule in the same way. Check the solution below if you need help.
 
 ::: {.callout-tip collapse="true" title="Click to show"}
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule generate_count_table:
     """
     Generate a count table using featureCounts.
@@ -1753,7 +1753,7 @@ workflow.
 In the `get_genome_fasta` and `get_genome_gff3` rules we have hard-coded FTP
 paths to the FASTA and GFF annotation file for the genome `NCTC8325`. Let's move this information to the configuration file, and also add information for another genome, `ST398`. Add the following to `config.yml`:
 
-```{.yaml filename="config.yml"}
+```{.yaml filename="config.yml" .code-overflow-scroll}
 genomes:
   NCTC8325:
     fasta: ftp://ftp.ensemblgenomes.org/pub/bacteria/release-37/fasta/bacteria_18_collection/staphylococcus_aureus_subsp_aureus_nctc_8325/dna//Staphylococcus_aureus_subsp_aureus_nctc_8325.ASM1342v1.dna_rm.toplevel.fa.gz
@@ -1771,7 +1771,7 @@ Let's now look at how to do the mapping from genome id to FASTA path in the
 rule `get_genome_fasta`. This is how the rule currently looks (if you have
 added the log section as previously described).
 
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule get_genome_fasta:
     """
     Retrieve the sequence in fasta format for a genome.
@@ -1894,19 +1894,19 @@ Now we can resolve the `genome_id` wildcard from the config. For
 `align_to_genome` we change the input directive to:
 
 ```python
-input:
-    "data/{sample_id}.fastq.gz",
-    expand("results/bowtie2/{genome_id}.{substr}.bt2",
-           genome_id = config["genome_id"],
-           substr = ["1", "2", "3", "4", "rev.1", "rev.2"])
+    input:
+        "data/{sample_id}.fastq.gz",
+        expand("results/bowtie2/{genome_id}.{substr}.bt2",
+            genome_id = config["genome_id"],
+            substr = ["1", "2", "3", "4", "rev.1", "rev.2"])
 ```
 
 Here the `substr` wildcard gets expanded from a list while `genome_id` gets
 expanded from the config dictionary. Also change the hard-coded `NCTC8325` in
 the `shell:` directive of `align_to_genome` so that the genome_id is inserted
 directly from the config, like this:
 
-```python
+```{.python .code-overflow-scroll}
 shell:
     """
     bowtie2 -x results/bowtie2/{config[genome_id]} -U {input[0]} > {output} 2>{log}
@@ -1915,7 +1915,7 @@ shell:
 
 The final rule should look like this:
 
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule align_to_genome:
     """
     Align a fastq file to a genome index using Bowtie 2.
@@ -1938,7 +1938,7 @@ rule align_to_genome:
 Now let's change the hard-coded genome id in the `generate_count_table` input in
 a similar manner. The final rule should look like this:
 
-```{.python filename="snakefile_mrsa.smk"}
+```{.python filename="snakefile_mrsa.smk" .code-overflow-scroll}
 rule generate_count_table:
     """
     Generate a count table using featureCounts.
@@ -2006,12 +2006,12 @@ So far we've specified the samples to use in the workflow either as a hard-coded
 list in the Snakefile, or as a list in the configuration file. This is of course
 impractical for large real-world examples. Here we'll just quickly show how you
 could supply the samples to the MRSA workflow via a tab-separated file. For
-example you could create a file called `samples.tsv` with the following content:
+example you could create a file called `samples.csv` with the following content:
 
-```{.txt filename="samples.tsv"}
-SRR935090	https://figshare.scilifelab.se/ndownloader/files/39539767
-SRR935091	https://figshare.scilifelab.se/ndownloader/files/39539770
-SRR935092	https://figshare.scilifelab.se/ndownloader/files/39539773
+```{.txt filename="samples.csv"}
+SRR935090,https://figshare.scilifelab.se/ndownloader/files/39539767
+SRR935091,https://figshare.scilifelab.se/ndownloader/files/39539770
+SRR935092,https://figshare.scilifelab.se/ndownloader/files/39539773
 ```
 
 The first column has the sample id and the second column has the url to the
@@ -2023,11 +2023,11 @@ Snakemake we'll just add the following lines to the top of the Snakefile:
 # define an empty 'samples' dictionary
 samples = {}
 # read the sample list file and populate the dictionary
-with open("samples.tsv", "r") as fhin:
+with open("samples.csv", "r") as fhin:
     for line in fhin:
         # strip the newline character from the end of the line
         # then split by tab character to get the sample id and url
-        sample_id, url = line.strip().split("\t")
+        sample_id, url = line.strip().split(",")
         # store the url in the dictionary with the sample id as key
         samples[sample_id] = url
 ```
@@ -2050,11 +2050,11 @@ We can also use the `samples` dictionary in `expand()`, for example in the `mult
             sample_id = samples.keys())
 ```
 
-Now this depends on there being a `samples.tsv` file in the working directory.
+Now this depends on there being a `samples.csv` file in the working directory.
 To make this a configurable parameter we can add it to the config file:
 
 ```yaml
-sample_list: "samples.tsv"
+sample_list: "samples.csv"
 ```
 
 and update the code for populating the `samples` dictionary:
@@ -2067,7 +2067,7 @@ with open(config["sample_list"], "r") as fhin:
     for line in fhin:
         # strip the newline character from the end of the line
         # then split by tab character to get the sample id and url
-        sample_id, url = line.strip().split("\t")
+        sample_id, url = line.strip().split(",")
         # store the url in the dictionary with the sample id as key
         samples[sample_id] = url
 ```
@@ -2098,7 +2098,7 @@ Here is an example for a rule and its execution:
 ```python
 rule align_to_genome:
     output:
-        temp("results/bam/{sample_id,\w+}.bam")
+        temp("results/bam/{sample_id,\\w+}.bam")
     input:
         fastq = "data/{sample_id}.fastq.gz",
         index = expand("results/bowtie2/{genome_id}.{substr}.bt2",