-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List process container images in preview mode #4069
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
Personally, I quite like the idea of JSON. It's likely that containers could be shared between multiple processes in a lot of pipelines. If we use JSON then we could structure the output so that we have a deduplicated list of container URLs, each with an array of process names that correspond to it. Having said that, it's not too difficult to deduplicate a CSV for a user. So... maybe both? |
Instead of making a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice! I think at this point it could make sense to make it generic and allow the user to provide the list of directives they want to preview
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Okay, I added some config options for preview. Now it will write a JSON file a report, and you can specify which process directives to preview. Here is a sample of the default preview with {
"NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF": {
"container": "nf-core/ubuntu:20.04",
"cpus": 1,
"memory": "6 GB",
"time": "4h"
},
"NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA": {
"container": "nf-core/ubuntu:20.04",
"cpus": 1,
"memory": "6 GB",
"time": "4h"
},
"NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA": {
"container": "biocontainers/python:3.9--1",
"cpus": 1,
"memory": "6 GB",
"time": "4h"
},
// ...
} @pditommaso if you want to enable it separately, we could add a |
modules/nextflow/src/main/groovy/nextflow/trace/PreviewReportWriter.groovy
Outdated
Show resolved
Hide resolved
What if we call this inspect and turns on the preview flag implicitly? |
I asked @bentsherman to summarise this for me:
For the use cases we had been thinking about, I think a CLI flag makes most sense. This is information that the end user wants in a one-off way (usually for deployment). It's not a report that would normally need to be generated for every run. It's also not really specific to pipeline, but rather the user - so wouldn't make sense to put in a pipeline config. Sure, can use a user config, but one-off user-level spells CLI flag to me. So my +1 is for either |
I agree that it should be a CLI option. If it's going to be separate from Calling it |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
To move on this, I think we should focus on the primary use case, that's the previous of pipeline container configuration. In this extend I'd suggest the following
|
Okay, the report is now just containers, and it uses a fake task run to preview the container so that it can be Wave-aware. The report can be JSON or Nextflow config based on the file extension.
{
"NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF": "quay.io/nf-core/ubuntu:20.04",
"NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA": "quay.io/nf-core/ubuntu:20.04",
"NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA": "quay.io/biocontainers/python:3.9--1",
// ...
process { withName: 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF' { container = 'quay.io/nf-core/ubuntu:20.04' } }
process { withName: 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA' { container = 'quay.io/nf-core/ubuntu:20.04' } }
process { withName: 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA' { container = 'quay.io/biocontainers/python:3.9--1' } }
// ... |
This is awesome 🤩 |
…o/nextflow into 3340-preview-container-images
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Nice! This certainly looks like what we want, and I like the separate command 👍🏻 Some quick questions:
It would be great to test some of the larger / more weird nf-core pipelines with it. Ben's initial PR comment had a good list. |
modules/nextflow/src/main/groovy/nextflow/cli/CliOptions.groovy
Outdated
Show resolved
Hide resolved
modules/nextflow/src/main/groovy/nextflow/container/inspect/ContainersInspector.groovy
Outdated
Show resolved
Hide resolved
modules/nextflow/src/main/groovy/nextflow/container/inspect/ContainersInspector.groovy
Show resolved
Hide resolved
plugins/nf-wave/src/main/io/seqera/wave/plugin/config/WaveConfig.groovy
Outdated
Show resolved
Hide resolved
Yes, it is independent of the container runtime so it works with all of them.
You can specify config files and profiles, but not CLI params or params file. Currently params can only be included through config files.
I think that's a great idea... |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
nf-core pipelines appear to be working as before. Paolo didn't change any of the core logic, just the user interface |
Yes. Above all we are going to have Singulatiy native builds via Wave. Local conversion is not going to be needed any more!
Currently, only the profile, but adding the support for params is straightforward. I'm going to add it
Not in the very short term, but the use of a dedicated command would make it possible to easily extend this functionality. |
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Added support for I'm curios about the feedback from @drpatelh. I remember he was working on a python script doing something similar. Wonder if this address his problem. [
{
"name": "NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC",
"container": "quay.io/biocontainers/fastqc:0.11.9--0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_FORWARD:UCSC_BEDCLIP",
"container": "quay.io/biocontainers/ucsc-bedclip:377--h0b8a92a_2"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:DESEQ2_QC_STAR_SALMON",
"container": "quay.io/biocontainers/mulled-v2-8849acf39a43cdd6c839a369a74c0adc823e2f91:ab110436faf952a33575c64dd74615a84011450b-0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:MULTIQC_CUSTOM_BIOTYPE",
"container": "quay.io/biocontainers/python:3.9--1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:UNTAR_SALMON_INDEX",
"container": "quay.io/nf-core/ubuntu:20.04"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN",
"container": "quay.io/biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:1df389393721fc66f3fd8778ad938ac711951107-0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS",
"container": "quay.io/biocontainers/multiqc:1.14--pyhdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE",
"container": "quay.io/biocontainers/trim-galore:0.6.7--hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_FORWARD:UCSC_BEDGRAPHTOBIGWIG",
"container": "quay.io/biocontainers/ucsc-bedgraphtobigwig:377--h446ed27_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:CAT_FASTQ",
"container": "quay.io/nf-core/ubuntu:20.04"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_JUNCTIONANNOTATION",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CUSTOM_GETCHROMSIZES",
"container": "quay.io/biocontainers/samtools:1.16.1--h6899075_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE",
"container": "quay.io/biocontainers/python:3.9--1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_QUANT",
"container": "quay.io/biocontainers/salmon:1.10.1--h7e5ed60_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_SE_GENE_SCALED",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_REVERSE:UCSC_BEDCLIP",
"container": "quay.io/biocontainers/ucsc-bedclip:377--h0b8a92a_2"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_JUNCTIONSATURATION",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:DESEQ2_QC_SALMON",
"container": "quay.io/biocontainers/mulled-v2-8849acf39a43cdd6c839a369a74c0adc823e2f91:ab110436faf952a33575c64dd74615a84011450b-0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF",
"container": "quay.io/nf-core/ubuntu:20.04"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:DUPRADAR",
"container": "quay.io/biocontainers/bioconductor-dupradar:1.28.0--r42hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE",
"container": "quay.io/biocontainers/fq:0.9.1--h9ee0642_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:STRINGTIE_STRINGTIE",
"container": "quay.io/biocontainers/stringtie:2.2.1--hecb563c_2"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:PICARD_MARKDUPLICATES",
"container": "quay.io/biocontainers/picard:3.0.0--hdfd78af_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:SUBREAD_FEATURECOUNTS",
"container": "quay.io/biocontainers/subread:2.0.1--hed695b0_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TXIMPORT",
"container": "quay.io/biocontainers/bioconductor-tximeta:1.12.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BEDTOOLS_GENOMECOV",
"container": "quay.io/biocontainers/bedtools:2.30.0--hc088bd4_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_INNERDISTANCE",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_ADDITIONAL_FASTA",
"container": "quay.io/nf-core/ubuntu:20.04"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_QUANT",
"container": "quay.io/biocontainers/salmon:1.10.1--h7e5ed60_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT",
"container": "quay.io/biocontainers/bioconductor-tximeta:1.12.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_SE_GENE",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TX2GENE",
"container": "quay.io/biocontainers/python:3.9--1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_BAMSTAT",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_READDUPLICATION",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BBMAP_BBSPLIT",
"container": "quay.io/biocontainers/bbmap:39.01--h5c4e2a8_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK",
"container": "quay.io/biocontainers/python:3.9--1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:BBMAP_BBSPLIT",
"container": "quay.io/biocontainers/bbmap:39.01--h5c4e2a8_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_SE_GENE_LENGTH_SCALED",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUALIMAP_RNASEQ",
"container": "quay.io/biocontainers/qualimap:2.2.2d--1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_READDISTRIBUTION",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE_SCALED",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT",
"container": "quay.io/biocontainers/salmon:1.10.1--h7e5ed60_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:STAR_GENOMEGENERATE",
"container": "quay.io/biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:1df389393721fc66f3fd8778ad938ac711951107-0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF2BED",
"container": "quay.io/biocontainers/perl:5.26.2"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_INFEREXPERIMENT",
"container": "quay.io/biocontainers/rseqc:3.0.1--py37h516909a_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:MULTIQC",
"container": "quay.io/biocontainers/multiqc:1.14--pyhdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_SE_TRANSCRIPT",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CAT_ADDITIONAL_FASTA",
"container": "quay.io/biocontainers/python:3.9--1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT",
"container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE_LENGTH_SCALED",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_REVERSE:UCSC_BEDGRAPHTOBIGWIG",
"container": "quay.io/biocontainers/ucsc-bedgraphtobigwig:377--h446ed27_1"
},
{
"name": "NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_TRANSCRIPT",
"container": "quay.io/biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0"
}
] |
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Yup - there are several. nf-core/tools is a python script that does it, and I'm aware of a few others. The problem is that none of them really understand the code and mostly work by just trying to parse the text strings. So whenever we change anything (such as the recent adoption of
Hopefully! That was the motivation for the issue and this resulting PR. It's certainly looking that way, I'm happy! 😅 |
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Made a few changes:
Think it's ready |
Think more on this, i'm not super convinced of adding a prompt confirmation when using Wave for building the container images. A better solution could be to perform a dry-run request to wave (currently not existing) by default, and submit a real build request when a specific option is provided. |
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Thanks Ben for updating the docs. Let's merge this |
Sorry, just catching up with this. Thank you 🙏 It is indeed the perfect replacement for the custom Python scripts we have cobbled together to scrape container information. |
This commit introduces a new nextflow command named `inspect`. The inspect command allows resolving a pipeline script or project reporting all container images used by the pipeline execution. The main advantage of this command over the existing `config` command is that it's able to resolve container names defined "dynamically" or Wave containers that are only determined at execution time. The command option `-concretise` when used along with the Wave freeze option allows building ahead all the container images required by the pipeline execution. Signed-off-by: Ben Sherman <bentshermann@gmail.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Close #3340
Currently, this PR just lists the container image for each process during a preview run. It creates a basic task config for each process and tries to resolve the container directive. Even if the directive is dynamic, as long as it's defined in terms of variables that are defined at the pipeline level (e.g.
workflow
,ext
directive from config), it will work.If for some reason the container is defined in terms of some task specific property, it resolve will fail and just print null. But in practice I think this is extremely rare.
Successful tests:
Remaining questions:
download
command for this?