Merge branch 'main' into update_gitpod

carpentries-incubator · Aug 26, 2024 · f6da636 · f6da636
2 parents 2c44837 + 5069330
commit f6da636
Show file tree

Hide file tree

Showing 7 changed files with 113 additions and 96 deletions.
diff --git a/episodes/02-workflow_parameters.md b/episodes/02-workflow_parameters.md
@@ -6,17 +6,17 @@ exercises: 5
 
 ::::::::::::::::::::::::::::::::::::::: objectives
 
-- "Use pipeline parameters to change the input to a workflow."
-- "Add a pipeline parameters to a Nextflow script."
-- "Understand how to create and use a parameter file."
+- Use pipeline parameters to change the input to a workflow.
+- Add a pipeline parameters to a Nextflow script.
+- Understand how to create and use a parameter file.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
 :::::::::::::::::::::::::::::::::::::::: questions
 
-- "How can I change the data a workflow uses?"
-- "How can I parameterise a workflow?"
-- "How can I add my parameters to a file?"
+- How can I change the data a workflow uses?
+- How can I parameterise a workflow?
+- How can I add my parameters to a file?
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -304,7 +304,7 @@ f5ef7b7a01 executor \> local (1) [f3/4fa480] process \> NUM_LINES
 
 
 :::::::::::::::::::::::::::::::::::::::: keypoints
-- "Pipeline parameters are specified by prepending the prefix `params` to a variable name, separated by dot character."
-- "To specify a pipeline parameter on the command line for a Nextflow run use `--variable_name` syntax."
-- "You can add parameters to a JSON formatted file and pass them to the script using option `-params-file`."
+- Pipeline parameters are specified by prepending the prefix `params` to a variable name, separated by dot character.
+- To specify a pipeline parameter on the command line for a Nextflow run use `--variable_name` syntax.
+- You can add parameters to a JSON formatted file and pass them to the script using option `-params-file`.
 ::::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/03-channels.md b/episodes/03-channels.md
@@ -380,13 +380,19 @@ Available fromPath options:
 
 We can change the default options for the `fromPath` method to give an error if the file doesn't exist using the `checkIfExists` parameter. In Nextflow, method parameters are separated by a `,` and parameter values specified with a colon `:`.
 
-If we execute a Nextflow script with the contents below, it will run and not produce an output. This is likely not what we want.
+If we execute a Nextflow script with the contents below, it will run and not produce an output, or an error message that the file does not exist. This is likely not what we want.
 
 ```groovy 
 read_ch = Channel.fromPath( 'data/chicken/reads/*.fq.gz' )
 read_ch.view()
 ```
 
+```output
+N E X T F L O W   ~  version 20.10.0
+
+Launching `channels.nf` [scruffy_swartz] DSL2 - revision: 2c8f18ab48
+```
+
 Add the argument `checkIfExists` with the value `true`.
 
 ```groovy
@@ -544,56 +550,6 @@ Launching `channels.nf` [stupefied_lumiere] - revision: a3741edde2
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
-### The **fromSRA** Channel factory
-
-Another useful factory method is `fromSRA`. The `fromSRA` method makes it possible to query the [NCBI SRA](https://www.ncbi.nlm.nih.gov/sra) archive and returns a queue channel emitting the FASTQ files matching the specified selection criteria.
-
-The queries can be project IDs or accession numbers supported by the [NCBI ESearch API](https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch).
-
-If you want to use this functionality, you will need an [NCBI API KEY](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/), and to set the environment variable `NCBI_API_KEY` to its value.
-
-```groovy 
-sra_ch =Channel.fromSRA('SRP043510')
-sra_ch.view()
-```
-
-This will print a tuple for every fastq file associated with that SRA project accession.
-
-```output
-[SRR1448794, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448794/SRR1448794.fastq.gz]
-[SRR1448795, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/005/SRR1448795/SRR1448795.fastq.gz]
-[SRR1448792, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/002/SRR1448792/SRR1448792.fastq.gz]
-[SRR1448793, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/003/SRR1448793/SRR1448793.fastq.gz]
-[SRR1910483, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/003/SRR1910483/SRR1910483.fastq.gz]
-[SRR1910482, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/002/SRR1910482/SRR1910482.fastq.gz]
-(remaining omitted)
-```
-
-Multiple accession IDs can be specified using a list object:
-
-```groovy 
-ids = ['ERR908507', 'ERR908506', 'ERR908505']
-sra_ch = Channel.fromSRA(ids)
-sra_ch.view()
-```
-
-```output
-[ERR908507, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_2.fastq.gz]]
-[ERR908506, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_2.fastq.gz]]
-[ERR908505, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908505/ERR908505_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908505/ERR908505_2.fastq.gz]]
-```
-
-:::::::::::::::::::::::::::::::::::::::::  callout
-
-## Read pairs from SRA
-
-Read pairs are implicitly managed, and are returned as a list of files.
-
-
-::::::::::::::::::::::::::::::::::::::::::::::::::
-
-
-
 :::::::::::::::::::::::::::::::::::::::: keypoints
 
 - Channels must be used to import data into Nextflow.

diff --git a/episodes/04-processes-part1.md b/episodes/04-processes-part1.md
@@ -289,6 +289,21 @@ workflow {
 }
 ```
 
+```bash
+$ nextflow run process_python.nf -process.debug
+```
+
+```output
+ N E X T F L O W   ~  version 24.04.4
+
+Launching `process_python.nf` [mad_montalcini] DSL2 - revision: ee25d49465
+
+executor >  local (1)
+[b4/a100c3] PROCESS_READS [100%] 1 of 1 ✔
+reads 14677
+bases 1482377
+```
+
 This allows the use of a different programming languages which may better fit a particular job. However, for large chunks of code it is suggested to save them into separate files and invoke them from the process script.
 
 ## Associated scripts
@@ -303,8 +318,8 @@ chmod 755 bin/process_reads.py
 ```
 
 ```python
-# process_reads.py
 #!/usr/bin/env python
+# process_reads.py
 import gzip
 import sys
 reads = 0
@@ -867,7 +882,8 @@ nextflow run process_exercise_input.nf -process.debug
 ```
 :::::::::::::::  solution
 
- ## Solution
+## Solution
+
 ```groovy
  
 
@@ -1052,7 +1068,7 @@ And include the command below in the script directive
 ```
 :::::::::::::::  solution
 
- ## Solution
+## Solution
 ```groovy
  // process_exercise_combine_answer.nf
  
@@ -1075,6 +1091,20 @@ And include the command below in the script directive
  }
 ```
 
+```bash
+$ nextflow run process_exercise_combine.nf -process.debug
+```
+
+```output
+N E X T F L O W   ~  version 24.04.4
+
+Launching `process_exercise_combine.nf` [fabulous_kare] DSL2 - revision: 1eade0a2e9
+
+executor >  local (1)
+[e0/b05fe7] COMBINE (1) [100%] 1 of 1 ✔
+118
+```
+
 :::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -1192,6 +1222,62 @@ $ nextflow run process_exercise_repeat.nf -process.debug
 
 This process runs 16 times.
 
+```output
+N E X T F L O W   ~  version 24.04.4
+
+Launching `process_exercise_repeat.nf` [ecstatic_turing] DSL2 - revision: 17891a7528
+
+executor >  local (16)
+[65/389033] COMBINE (13) [ 62%] 10 of 16
+executor >  local (16)
+[6d/f803e5] COMBINE (9)  [100%] 16 of 16 ✔
+Number of sequences for chromosome J: 398
+
+Number of sequences for chromosome G: 583
+
+Number of sequences for chromosome O: 597
+
+Number of sequences for chromosome N: 435
+
+Number of sequences for chromosome B: 456
+
+Number of sequences for chromosome E: 323
+
+executor >  local (16)
+[6d/f803e5] COMBINE (9)  [100%] 16 of 16 ✔
+Number of sequences for chromosome J: 398
+
+Number of sequences for chromosome G: 583
+
+Number of sequences for chromosome O: 597
+
+Number of sequences for chromosome N: 435
+
+Number of sequences for chromosome B: 456
+
+Number of sequences for chromosome E: 323
+
+Number of sequences for chromosome K: 348
+
+Number of sequences for chromosome H: 321
+
+Number of sequences for chromosome C: 186
+
+Number of sequences for chromosome M: 505
+
+Number of sequences for chromosome L: 580
+
+Number of sequences for chromosome A: 118
+
+Number of sequences for chromosome D: 836
+
+Number of sequences for chromosome F: 140
+
+Number of sequences for chromosome P: 513
+
+Number of sequences for chromosome I: 245
+```
+
 :::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::

diff --git a/episodes/10-workflow_checkpoint_caching.md b/episodes/10-workflow_checkpoint_caching.md
@@ -86,6 +86,8 @@ You will see that the execution of the process `NUMLINES` is actually skipped (c
 
 ## How does resume work?
 
+Nextflow stores all intermediate files and task results during the execution of a workflow is `work` directory. It acts as a scratch space where all the temporary data required for the workflow's execution is kept. Within the work directory, Nextflow creates subdirectories named with unique hashes (e.g., work/ab/cd1234...). Each of these subdirectories corresponds to a specific process or task in the pipeline. The hashed directory names ensure that each task's outputs are isolated and uniquely identified.
+
 The mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, within the `work` directory, where the tasks are executed and the results stored. A task's unique ID is generated as a 128-bit hash number obtained from a composition of the task's:
 
 - Inputs values

diff --git a/episodes/data/environment.yml b/episodes/data/environment.yml
@@ -3,8 +3,9 @@ channels:
   - conda-forge
   - bioconda
 dependencies:
-  - nextflow=20.10.0
+  - nextflow
   - salmon=1.5
   - fastqc=0.11
-  - multiqc=1.10
+  - multiqc
   - nf-core
+  - graphviz
diff --git a/episodes/files/scripts/rnaseq_pipeline/script8.nf b/episodes/files/scripts/rnaseq_pipeline/script8.nf
diff --git a/learners/setup.md b/learners/setup.md
@@ -57,10 +57,11 @@ To install conda see [here](https://carpentries-incubator.github.io/introduction
 An environment file is provided here [environment.yml](https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml)
 
 ```bash
-wget
+# You can use either wget or curl to download content from the web via the command line.
+# wget
 wget https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
 
-# or curl
+# curl 
 curl -L -o environment.yml https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
 ```