- Groovy scripting basics
- Understand channel creation
- Use operators to transform channels
-
Much of Nextflow scripting is done using the Groovy language
-
Numeric Operations
x = 2 // integer variable y = 2.5 // float variable z = 0.99 // float variable assert x * y == 5 // assertion, throw error if not true assert Math.round(z) == 1 // round to nearest integer assert [x, y].max() == 2.5 // maximum
-
String Operations
s1 = 'foo' // String variable s2 = 'bar' // String variable println(s1) // print value to standard output c1 = s1 + '-' + s2 // String concatenation c2 = "$s1-$s2" // String interpolation of variables assert c1 == c2 s3 = "1 + 2 = ${1 + 2}" // Strining interpolation of closure assert s3 == "1 + 2 = 3"
-
Logic Operations
isValid = true // boolean variable // ------------ if else statement ------------ // if (isValid) { println("valid") } else { println("not valid") } // ------------ ternary operator ------------ // isValid ? println("valid") : println("not valid")
-
See example code snippits in Groovy and Python: https://programming-idioms.org/cheatsheet/Groovy/Python
- Lists
- Nextflow/Groovy:
l = [1, 2, 3] l = l + 4 l = l.collect { it * 2 } // <-- closure println(l) println(l[0]) // (0-based index)
- R :
l = list(1, 2, 3) l = c(l, 4) l = lapply(l, function(x) x * 2) print(l) print(l[1]) # (1-based index)
- python (0-based):
l = [1, 2, 3] l.append(4) l = map(lambda x: x * 2, l) print(l) print(l[0]) # (0-based index)
- Nextflow/Groovy:
- Maps
- Equivalent to named lists in
R
or dicts inpython
- Nextflow/groovy:
x = [foo: 1, bar: 2] println(x.bar) x.baz = 3
- R:
x = list(foo = 1, bar = 2) print(x$bar) x$baz = 3
- python:
x = dict(foo = 1, bar = 2) print(x['bar']) x['baz'] = 3
- Equivalent to named lists in
- Try out some of above Groovy examples using the groovy shell:
module load java/1.8.0_92 groovy/4.0.0 groovysh
- Closures in groovy act as functions that can be passed to other functions
- For example the
.collect()
function which can be called on listsl = [1, 2, 3] println( l.collect { it + 1 } )
- The default variable name in a closure is
it
, but this can be overridden:l = [1, 2, 3] l.collect { x -> x + 1 }
- In Groovy and Nextflow we often deal with nested lists
nested = [[1, 'A'], [2, 'B'], [3, 'C']]
- When we call
collect
on a nested list such as this, the variableit
will be a list. For example here we are concatenating the elements of each inner list togeher:nested.collect { "${it[0]}-${it[1]}" }
[1-A, 2-B, 3-C]
- We can also unpack the values of the list and assign them their own variables, which makes writing closures clearer
nested.collect { number, letter -> "$number-$letter" }
- If we wanted to add to the inner list, we could instead do this:
nested.collect { number, letter -> [number, letter, "$number-$letter"] }
[[1, A, 1-A], [2, B, 2-B], [3, C, 3-C]]
- See https://www.nextflow.io/docs/latest/script.html#closures
-
In the Groovy shell, define the variable
data
as belowdata = [['foo', 1, 2], ['bar', 3, 4], ['baz', 5, 6]]
Using
collect
, transform the data to the following list:
["foo: 2", "bar: 12", "baz: 30"]
(the number is equal to the product of the numbers in the list)Solution
data.collect { s, n1, n2 -> "$s: ${n1*n2}" }
-
Using
collect
, transform data to the following nested list:
[['foo', 1, 2, 2], ['bar', 3, 4, 12], ['baz', 5, 6, 30]]
Solution
data.collect { s, n1, n2 -> [s, n1, n2, n1 * n2] }
Implicit Variables:
- A number of variables are available in all nextflow scripts:
params
: map storing workflow parametersprojectDir
: A string variable of the directory containing the nextflow script being run - useful for accessing workflow resource fileslaunchDir
: A string variable of the directory containing the nextflow script was run from- See https://www.nextflow.io/docs/latest/script.html#implicit-variables
Files:
- For creating
file
type variables, nextflow provides the functionfile()
- This must be used to convert a filepath string into a file object before passing to a process
- We can use the
checkIfExists
optional argument to make sure the file exists:input = file('/path/to/my_file.txt', checkIfExists: true)
- Nextflow files also support the FTP/HTTP protocols. These will be downloaded automatically when used by a process:
input = file('https://www.wehi.edu.au/sites/default/files/wehi-logo-2020.png')
- Nextflow includes a number of ways to create channels
channel.of(...)
- create a channel that emits each of the arguments one at a time.
channel.of('A', 'B', 'C')
- create a channel that emits each of the arguments one at a time.
channel.fromList(list)
- given a list, create a channel that emits each element of the list one at a time
list = ['A', 'B', 'C'] channel.fromList(list)
- given a list, create a channel that emits each element of the list one at a time
channel.fromPath(path)
:- Create a channel from a file path, emitting a
file
variable. Functions similarly to thefile()
method but returns a channel.channel.fromPath('/path/to/sample.bam')
- Optional argument
checkIfExists
will throw an error if file does not existschannel.fromPath('/path/to/sample.bam', checkIfExists: true)
- Files may be located on the web and will downloaded when needed:
channel.fromPath('https://www.wehi.edu.au/sites/default/files/wehi-logo-2020.png')
- See https://www.nextflow.io/docs/latest/channel.html#channel-factory for more ways to create channels
- Create a channel from a file path, emitting a
-
Open languages.nf and look at the CSV files in data
-
Create channels
authors_ch
andhomepage_ch
in the same way asyear_created_ch
, and call.view()
on them. -
Run languages.nf
nextflow run ~/wehi-nextflow-training/module_3/languages.nf
Solution
workflow { year_created_ch = Channel.fromPath("$projectDir/data/year_created.csv", checkIfExists: true) year_created_ch.view() authors_ch = Channel.fromPath("$projectDir/data/authors.csv", checkIfExists: true) authors_ch.view() homepage_ch = Channel.fromPath("$projectDir/data/homepage.csv", checkIfExists: true) homepage_ch.view() }
map
is the most commonly used nextflow operater- functionally similar to
collect {}
, but applied to channels instead of lists - Functionally similar to R's
lapply()
or Python'smap()
- The default variable name for a closure is
it
, e.g.:will print:channel.of(1, 2, 3).map { it * 2 }.view()
2 4 6
view
takes an input channel and prints the contents to the terminal- We can optionally provide a closure, similarly to
map
, which will instead print the value of applying the closurewill print:channel.of(1, 2, 3).view { x -> "$x * 2 = ${x * 2}" }
1 * 2 = 2 2 * 2 = 4 3 * 2 = 6
view
also returns the values of the unmodified input channel, and is useful for developing and debugging
- Convert a CSV (or TSV) file into a nextflow channel
- Optional argument
skip
may be used to skip a number of lines (e.g. the header) - Most often used on the output of
channel.fromPath()
, e.g.will print:channel.fromPath("$projectDir/data/year_created.csv") .splitCsv(skip: 1) .view()
[Python, 1991] [R, 1993] [Nextflow, 2013]
-
Update all channels in languages.nf with splitCsv as follows:
workflow { year_created_ch = Channel .fromPath("$projectDir/data/year_created.csv", checkIfExists: true) .splitCsv(skip: 1) year_created_ch.view() }
-
Run languages.nf
Solution
workflow { year_created_ch = Channel .fromPath("$projectDir/data/year_created.csv", checkIfExists: true) .splitCsv(skip: 1) year_created_ch.view() authors_ch = Channel .fromPath("$projectDir/data/authors.csv", checkIfExists: true) .splitCsv(skip: 1) authors_ch.view() homepage_ch = Channel .fromPath("$projectDir/data/homepage.csv", checkIfExists: true) .splitCsv(skip: 1) homepage_ch.view() }
-
Join two channels by a matching key
left = channel.of(['X', 1], ['Y', 2], ['Z', 3], ['P', 7]) right = channel.of(['Z', 6], ['Y', 5], ['X', 4]) left.join(right).view()
will print:
[Z, 3, 6] [Y, 2, 5] [X, 1, 4]
-
This operation is roughly equivalent to R's
dplyr::inner_join()
and Python's pandaspd.merge()
. For example, in R:left = data.frame(V1 = c('X', 'Y', 'Z', 'P'), V2 = c( 1, 2, 3, 7)) right = data.frame(V1 = c('Z', 'Y', 'X'), V3 = c( 6, 5, 4)) left %>% inner_join(right) %>% View()
V1 V2 V3 1 X 1 4 2 Y 2 5 3 Z 3 6
-
Many more operators are availabe, see https://www.nextflow.io/docs/latest/operator.html
-
Remove calls to the
view()
operator for channelesyear_created_ch
,authors_ch
andhomepage_ch
-
Using the
join()
operator, created a new channeljoined_ch
that joinsyear_created_ch
,authors_ch
andhomepage_ch
andview()
the outputSolution
workflow { year_created_ch = Channel .fromPath("$projectDir/data/year_created.csv", checkIfExists: true) .splitCsv(skip: 1) authors_ch = Channel .fromPath("$projectDir/data/authors.csv", checkIfExists: true) .splitCsv(skip: 1) homepage_ch = Channel .fromPath("$projectDir/data/homepage.csv", checkIfExists: true) .splitCsv(skip: 1) joined_ch = year_created_ch .join(authors_ch) .join(homepage_ch) .view() }
-
Using the
view {}
operator onjoined_ch
, print the following:R was created in 1993 by Ross Ihaka and Robert Gentleman. To learn more vist https://www.r-project.org/ Python ... Nextflow ...
Solution
joined_ch = year_created_ch .join(authors_ch) .join(homepage_ch) .view { lang, year, auth, url -> "$lang was created in $year by $auth. To learn more vist $url" }