Skip to content

Commit

Permalink
Merge branch 'main' into update_gitpod
Browse files Browse the repository at this point in the history
  • Loading branch information
mahesh-panchal authored Aug 26, 2024
2 parents 2c44837 + 5069330 commit f6da636
Show file tree
Hide file tree
Showing 7 changed files with 113 additions and 96 deletions.
18 changes: 9 additions & 9 deletions episodes/02-workflow_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@ exercises: 5

::::::::::::::::::::::::::::::::::::::: objectives

- "Use pipeline parameters to change the input to a workflow."
- "Add a pipeline parameters to a Nextflow script."
- "Understand how to create and use a parameter file."
- Use pipeline parameters to change the input to a workflow.
- Add a pipeline parameters to a Nextflow script.
- Understand how to create and use a parameter file.

::::::::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::: questions

- "How can I change the data a workflow uses?"
- "How can I parameterise a workflow?"
- "How can I add my parameters to a file?"
- How can I change the data a workflow uses?
- How can I parameterise a workflow?
- How can I add my parameters to a file?

::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down Expand Up @@ -304,7 +304,7 @@ f5ef7b7a01 executor \> local (1) [f3/4fa480] process \> NUM_LINES


:::::::::::::::::::::::::::::::::::::::: keypoints
- "Pipeline parameters are specified by prepending the prefix `params` to a variable name, separated by dot character."
- "To specify a pipeline parameter on the command line for a Nextflow run use `--variable_name` syntax."
- "You can add parameters to a JSON formatted file and pass them to the script using option `-params-file`."
- Pipeline parameters are specified by prepending the prefix `params` to a variable name, separated by dot character.
- To specify a pipeline parameter on the command line for a Nextflow run use `--variable_name` syntax.
- You can add parameters to a JSON formatted file and pass them to the script using option `-params-file`.
::::::::::::::::::::::::::::::::::::::::::::::::::
58 changes: 7 additions & 51 deletions episodes/03-channels.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,13 +380,19 @@ Available fromPath options:

We can change the default options for the `fromPath` method to give an error if the file doesn't exist using the `checkIfExists` parameter. In Nextflow, method parameters are separated by a `,` and parameter values specified with a colon `:`.

If we execute a Nextflow script with the contents below, it will run and not produce an output. This is likely not what we want.
If we execute a Nextflow script with the contents below, it will run and not produce an output, or an error message that the file does not exist. This is likely not what we want.

```groovy
read_ch = Channel.fromPath( 'data/chicken/reads/*.fq.gz' )
read_ch.view()
```

```output
N E X T F L O W ~ version 20.10.0
Launching `channels.nf` [scruffy_swartz] DSL2 - revision: 2c8f18ab48
```

Add the argument `checkIfExists` with the value `true`.

```groovy
Expand Down Expand Up @@ -544,56 +550,6 @@ Launching `channels.nf` [stupefied_lumiere] - revision: a3741edde2

::::::::::::::::::::::::::::::::::::::::::::::::::

### The **fromSRA** Channel factory

Another useful factory method is `fromSRA`. The `fromSRA` method makes it possible to query the [NCBI SRA](https://www.ncbi.nlm.nih.gov/sra) archive and returns a queue channel emitting the FASTQ files matching the specified selection criteria.

The queries can be project IDs or accession numbers supported by the [NCBI ESearch API](https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch).

If you want to use this functionality, you will need an [NCBI API KEY](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/), and to set the environment variable `NCBI_API_KEY` to its value.

```groovy
sra_ch =Channel.fromSRA('SRP043510')
sra_ch.view()
```

This will print a tuple for every fastq file associated with that SRA project accession.

```output
[SRR1448794, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448794/SRR1448794.fastq.gz]
[SRR1448795, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/005/SRR1448795/SRR1448795.fastq.gz]
[SRR1448792, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/002/SRR1448792/SRR1448792.fastq.gz]
[SRR1448793, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/003/SRR1448793/SRR1448793.fastq.gz]
[SRR1910483, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/003/SRR1910483/SRR1910483.fastq.gz]
[SRR1910482, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/002/SRR1910482/SRR1910482.fastq.gz]
(remaining omitted)
```

Multiple accession IDs can be specified using a list object:

```groovy
ids = ['ERR908507', 'ERR908506', 'ERR908505']
sra_ch = Channel.fromSRA(ids)
sra_ch.view()
```

```output
[ERR908507, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_2.fastq.gz]]
[ERR908506, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_2.fastq.gz]]
[ERR908505, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908505/ERR908505_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908505/ERR908505_2.fastq.gz]]
```

::::::::::::::::::::::::::::::::::::::::: callout

## Read pairs from SRA

Read pairs are implicitly managed, and are returned as a list of files.


::::::::::::::::::::::::::::::::::::::::::::::::::



:::::::::::::::::::::::::::::::::::::::: keypoints

- Channels must be used to import data into Nextflow.
Expand Down
92 changes: 89 additions & 3 deletions episodes/04-processes-part1.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,21 @@ workflow {
}
```

```bash
$ nextflow run process_python.nf -process.debug
```

```output
N E X T F L O W ~ version 24.04.4
Launching `process_python.nf` [mad_montalcini] DSL2 - revision: ee25d49465
executor > local (1)
[b4/a100c3] PROCESS_READS [100%] 1 of 1 ✔
reads 14677
bases 1482377
```

This allows the use of a different programming languages which may better fit a particular job. However, for large chunks of code it is suggested to save them into separate files and invoke them from the process script.

## Associated scripts
Expand All @@ -303,8 +318,8 @@ chmod 755 bin/process_reads.py
```

```python
# process_reads.py
#!/usr/bin/env python
# process_reads.py
import gzip
import sys
reads = 0
Expand Down Expand Up @@ -867,7 +882,8 @@ nextflow run process_exercise_input.nf -process.debug
```
::::::::::::::: solution

## Solution
## Solution

```groovy
Expand Down Expand Up @@ -1052,7 +1068,7 @@ And include the command below in the script directive
```
::::::::::::::: solution

## Solution
## Solution
```groovy
// process_exercise_combine_answer.nf
Expand All @@ -1075,6 +1091,20 @@ And include the command below in the script directive
}
```

```bash
$ nextflow run process_exercise_combine.nf -process.debug
```

```output
N E X T F L O W ~ version 24.04.4
Launching `process_exercise_combine.nf` [fabulous_kare] DSL2 - revision: 1eade0a2e9
executor > local (1)
[e0/b05fe7] COMBINE (1) [100%] 1 of 1 ✔
118
```

:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down Expand Up @@ -1192,6 +1222,62 @@ $ nextflow run process_exercise_repeat.nf -process.debug

This process runs 16 times.

```output
N E X T F L O W ~ version 24.04.4
Launching `process_exercise_repeat.nf` [ecstatic_turing] DSL2 - revision: 17891a7528
executor > local (16)
[65/389033] COMBINE (13) [ 62%] 10 of 16
executor > local (16)
[6d/f803e5] COMBINE (9) [100%] 16 of 16 ✔
Number of sequences for chromosome J: 398
Number of sequences for chromosome G: 583
Number of sequences for chromosome O: 597
Number of sequences for chromosome N: 435
Number of sequences for chromosome B: 456
Number of sequences for chromosome E: 323
executor > local (16)
[6d/f803e5] COMBINE (9) [100%] 16 of 16 ✔
Number of sequences for chromosome J: 398
Number of sequences for chromosome G: 583
Number of sequences for chromosome O: 597
Number of sequences for chromosome N: 435
Number of sequences for chromosome B: 456
Number of sequences for chromosome E: 323
Number of sequences for chromosome K: 348
Number of sequences for chromosome H: 321
Number of sequences for chromosome C: 186
Number of sequences for chromosome M: 505
Number of sequences for chromosome L: 580
Number of sequences for chromosome A: 118
Number of sequences for chromosome D: 836
Number of sequences for chromosome F: 140
Number of sequences for chromosome P: 513
Number of sequences for chromosome I: 245
```

:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down
2 changes: 2 additions & 0 deletions episodes/10-workflow_checkpoint_caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ You will see that the execution of the process `NUMLINES` is actually skipped (c

## How does resume work?

Nextflow stores all intermediate files and task results during the execution of a workflow is `work` directory. It acts as a scratch space where all the temporary data required for the workflow's execution is kept. Within the work directory, Nextflow creates subdirectories named with unique hashes (e.g., work/ab/cd1234...). Each of these subdirectories corresponds to a specific process or task in the pipeline. The hashed directory names ensure that each task's outputs are isolated and uniquely identified.

The mechanism works by assigning a unique ID to each task. This unique ID is used to create a separate execution directory, within the `work` directory, where the tasks are executed and the results stored. A task's unique ID is generated as a 128-bit hash number obtained from a composition of the task's:

- Inputs values
Expand Down
5 changes: 3 additions & 2 deletions episodes/data/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ channels:
- conda-forge
- bioconda
dependencies:
- nextflow=20.10.0
- nextflow
- salmon=1.5
- fastqc=0.11
- multiqc=1.10
- multiqc
- nf-core
- graphviz
29 changes: 0 additions & 29 deletions episodes/files/scripts/rnaseq_pipeline/script8.nf

This file was deleted.

5 changes: 3 additions & 2 deletions learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,11 @@ To install conda see [here](https://carpentries-incubator.github.io/introduction
An environment file is provided here [environment.yml](https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml)

```bash
wget
# You can use either wget or curl to download content from the web via the command line.
# wget
wget https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml

# or curl
# curl
curl -L -o environment.yml https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
```

Expand Down

0 comments on commit f6da636

Please sign in to comment.