update to version 2.0.0

avantonder · Nov 6, 2024 · d55915f · d55915f
1 parent 4090a29
commit d55915f
Show file tree

Hide file tree

Showing 5 changed files with 80 additions and 46 deletions.
diff --git a/assets/bacQC_metromap.png b/assets/bacQC_metromap.png
diff --git a/docs/.DS_Store b/docs/.DS_Store
diff --git a/docs/output.md b/docs/output.md
@@ -17,6 +17,7 @@ and processes data using the following steps:
 - [`Assign taxonomy to reads`](#assign-taxonomy-to-reads)
 - [`Re-estimate taxonomy`](#re-estimate-taxonomy)
 - [`Extract reads`](#extract-reads)
+- [`Visualize taxonomy`](#visualize-taxonomy)
 - [`Species composition`](#calculate-species-composition)
 - [`Sequencing statistics`](#sequencing-statistics)
 - [`MultiQC`](#multiqc) 
@@ -113,6 +114,18 @@ and processes data using the following steps:
 
 [KrakenTools](https://github.com/jenniferlu717/KrakenTools) is a suite of scripts to be used for post-analysis of Kraken/KrakenUniq/Kraken2/Bracken results.
 
+### Visualize taxonomy
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `krona/`
+  - `*.html`: HTML files containing taxonomy visualizations
+
+</details>
+
+[Krona](https://pubmed.ncbi.nlm.nih.gov/21961884/) creates interactive metagenomic visualizations in a Web browser.
+
 ### Calculate species composition
 
 <details markdown="1">

diff --git a/docs/parameters.md b/docs/parameters.md
@@ -8,46 +8,19 @@ Define where the pipeline should find input data and save output data.
 
 | Parameter | Description | Type | Default | Required | Hidden |
 |-----------|-----------|-----------|-----------|-----------|-----------|
-| `input` | Path to comma-separated file containing information about the samples in the experiment. <details><summary>Help</summary><small>You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. </small></details>| `string` |  |  |  |
-| `kraken2db` | Path to Kraken 2 database | `string` | None |  |  |
-| `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. | `string` |  |  |  |
-| `email` | Email address for completion summary. <details><summary>Help</summary><small>Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the 
-workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.</small></details>| `string` |  |  |  |
+| `input` | Path to comma-separated file containing information about the samples in the experiment. <details><summary>Help</summary><small>You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.</small></details>| `string` |  | True |  |
+| `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. | `string` |  | True |  |
+| `email` | Email address for completion summary. <details><summary>Help</summary><small>Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.</small></details>| `string` |  |  |  |
 | `multiqc_title` | MultiQC report title. Printed as page header, used for filename if not otherwise specified. | `string` |  |  |  |
 
-## Quality Control options
-
-
-
-| Parameter | Description | Type | Default | Required | Hidden |
-|-----------|-----------|-----------|-----------|-----------|-----------|
-| `skip_fastp` | Skip the fastp trimming step. | `boolean` |  |  |  |
-| `skip_fastqc` | Skip the fastQC step. | `boolean` |  |  |  |
-| `save_trimmed_fail` | Save failed trimmed reads. | `boolean` |  |  |  |
-| `skip_multiqc` | Skip MultiQC. | `boolean` |  |  |  |
-| `adapter_file` | Path to file containing adapters in FASTA format. | `string` | '${baseDir}/assets/adapters.fas' |  |  |
-| `skip_kraken2` | Skip Kraken 2 and Bracken. | `boolean` |  |  |  |
-| `genome_size` | Specify a genome size to be used by fastq-scan to calculate coverage | `integer` |  |  |  |
-
-## Extract reads options
-
-
-
-| Parameter | Description | Type | Default | Required | Hidden |
-|-----------|-----------|-----------|-----------|-----------|-----------|
-| `kraken_extract` | Extract reads from fastq files based on taxon id | `boolean` |  |  |  |
-| `tax_id` | If --kraken_extract is used, --tax_is specifies the taxon id to be used to extract reads | `string` |  |  |  |
-
 ## Institutional config options
 
 Parameters used to describe centralised config profiles. These should not be edited.
 
 | Parameter | Description | Type | Default | Required | Hidden |
 |-----------|-----------|-----------|-----------|-----------|-----------|
 | `custom_config_version` | Git commit id for Institutional configs. | `string` | master |  | True |
-| `custom_config_base` | Base directory for Institutional configs. <details><summary>Help</summary><small>If you're running offline, Nextflow will not be able to fetch the institutional config files 
-from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this 
-parameter.</small></details>| `string` | https://raw.githubusercontent.com/nf-core/configs/master |  | True |
+| `custom_config_base` | Base directory for Institutional configs. <details><summary>Help</summary><small>If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.</small></details>| `string` | https://raw.githubusercontent.com/nf-core/configs/master |  | True |
 | `config_profile_name` | Institutional config name. | `string` |  |  | True |
 | `config_profile_description` | Institutional config description. | `string` |  |  | True |
 | `config_profile_contact` | Institutional config contact information. | `string` |  |  | True |
@@ -59,12 +32,9 @@ Set the top limit for requested resources for any single job.
 
 | Parameter | Description | Type | Default | Required | Hidden |
 |-----------|-----------|-----------|-----------|-----------|-----------|
-| `max_cpus` | Maximum number of CPUs that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the CPU requirement for each process. Should be an 
-integer e.g. `--max_cpus 1`</small></details>| `integer` | 16 |  | True |
-| `max_memory` | Maximum amount of memory that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the memory requirement for each process. Should 
-be a string in the format integer-unit e.g. `--max_memory '8.GB'`</small></details>| `string` | 128.GB |  | True |
-| `max_time` | Maximum amount of time that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the time requirement for each process. Should be a 
-string in the format integer-unit e.g. `--max_time '2.h'`</small></details>| `string` | 240.h |  | True |
+| `max_cpus` | Maximum number of CPUs that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`</small></details>| `integer` | 16 |  | True |
+| `max_memory` | Maximum amount of memory that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`</small></details>| `string` | 128.GB |  | True |
+| `max_time` | Maximum amount of time that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`</small></details>| `string` | 240.h |  | True |
 
 ## Generic options
 
@@ -73,17 +43,59 @@ Less common options for the pipeline, typically set in a config file.
 | Parameter | Description | Type | Default | Required | Hidden |
 |-----------|-----------|-----------|-----------|-----------|-----------|
 | `help` | Display help text. | `boolean` |  |  | True |
-| `publish_dir_mode` | Method used to save pipeline results to output directory. <details><summary>Help</summary><small>The Nextflow `publishDir` option specifies which intermediate files should be 
-saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for 
-details.</small></details>| `string` | copy |  | True |
-| `email_on_fail` | Email address for completion summary, only when pipeline fails. <details><summary>Help</summary><small>An email address to send a summary email to when the pipeline is completed - 
-ONLY sent if the pipeline does not exit successfully.</small></details>| `string` |  |  | True |
+| `version` | Display version and exit. | `boolean` |  |  | True |
+| `publish_dir_mode` | Method used to save pipeline results to output directory. <details><summary>Help</summary><small>The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.</small></details>| `string` | copy |  | True |
+| `email_on_fail` | Email address for completion summary, only when pipeline fails. <details><summary>Help</summary><small>An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.</small></details>| `string` |  |  | True |
 | `plaintext_email` | Send plain-text email instead of HTML. | `boolean` |  |  | True |
 | `max_multiqc_email_size` | File size limit when attaching MultiQC reports to summary emails. | `string` | 25.MB |  | True |
 | `monochrome_logs` | Do not use coloured log outputs. | `boolean` |  |  | True |
+| `hook_url` | Incoming hook URL for messaging service <details><summary>Help</summary><small>Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.</small></details>| `string` |  |  | True |
 | `multiqc_config` | Custom config file to supply to MultiQC. | `string` |  |  | True |
-| `tracedir` | Directory to keep pipeline Nextflow logs and reports. | `string` | ${params.outdir}/pipeline_info |  | True |
+| `multiqc_logo` | Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file | `string` |  |  | True |
+| `multiqc_methods_description` | Custom MultiQC yaml file containing HTML including a methods description. | `string` |  |  |  |
 | `validate_params` | Boolean whether to validate parameters against the schema at runtime | `boolean` | True |  | True |
-| `show_hidden_params` | Show all params when using `--help` <details><summary>Help</summary><small>By default, parameters set as _hidden_ in the schema are not shown on the command line when a user 
-runs with `--help`. Specifying this option will tell the pipeline to show all parameters.</small></details>| `boolean` |  |  | True |
-| `enable_conda` | Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter. | `boolean` |  |  | True |
+| `validationShowHiddenParams` | Show all params when using `--help` <details><summary>Help</summary><small>By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters.</small></details>| `boolean` |  |  | True |
+| `validationFailUnrecognisedParams` | Validation of parameters fails when an unrecognised parameter is found. <details><summary>Help</summary><small>By default, when an unrecognised parameter is found, it returns a warinig.</small></details>| `boolean` |  |  | True |
+| `validationLenientMode` | Validation of parameters in lenient more. <details><summary>Help</summary><small>Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode).</small></details>| `boolean` |  |  | True |
+| `pipelines_testdata_base_path` | Base URL or local path to location of pipeline test dataset files | `string` | https://raw.githubusercontent.com/nf-core/test-datasets/ |  |  |
+
+## FastQC/fastp options
+
+
+
+| Parameter | Description | Type | Default | Required | Hidden |
+|-----------|-----------|-----------|-----------|-----------|-----------|
+| `skip_fastqc` | Skip the fastQC step. | `boolean` |  |  |  |
+| `skip_fastp` | Skip the fastp trimming step. | `boolean` |  |  |  |
+| `adapter_fasta` | File in FASTA format containing possible adapters to remove. Accepted formats: *.{fasta,fna,fas,fa} | `string` | None |  |  |
+| `save_trimmed_fail` | Specify `true` to save files that failed to pass trimming thresholds ending in *.fail.fastq.gz | `boolean` |  |  |  |
+| `save_merged` | Specify true to save all merged reads to the a file ending in *.merged.fastq.gz | `boolean` |  |  |  |
+| `extra_fastp_args` | Extra arguments for fastp. For example, `--trim_front1 15 --trim_front2 15 --trim_tail1 5 --trim_tail2 5{:bash}` | `string` | None |  |  |
+
+## fastq-scan options
+
+
+
+| Parameter | Description | Type | Default | Required | Hidden |
+|-----------|-----------|-----------|-----------|-----------|-----------|
+| `genome_size` | Specify a genome size to be used by fastq-scan to calculate coverage | `integer` | 2000000 |  |  |
+
+## Kraken 2 options
+
+
+
+| Parameter | Description | Type | Default | Required | Hidden |
+|-----------|-----------|-----------|-----------|-----------|-----------|
+| `skip_kraken2` | Skip Kraken 2 and Bracken. | `boolean` |  |  |  |
+| `kraken2db` | Path to Kraken 2 database | `string` | None |  |  |
+| `save_output_fastqs` | Turn on saving of Kraken2-aligned reads | `boolean` |  |  |  |
+| `save_reads_assignment` | Turn on saving of Kraken2 per-read taxonomic assignment file | `boolean` | True |  |  |
+| `kraken_extract` | Extract reads from fastq files based on taxon id | `boolean` |  |  |  |
+
+## Krona options
+
+
+
+| Parameter | Description | Type | Default | Required | Hidden |
+|-----------|-----------|-----------|-----------|-----------|-----------|
+| `kronadb` | Path to Krona taxonomy file | `string` | None |  |  |
diff --git a/docs/usage.md b/docs/usage.md
@@ -52,6 +52,14 @@ The pipeline can be provided with a path to a Kraken 2 database which is used, a
 
 The Kraken 2 and Bracken steps can by skipped by specifying the `--skip_kraken2` parameter.
 
+## Krona taxonomy file
+
+The pipeline can be provided with a path to a Krona taxonomy file which creates HTML visualizations of the Bracken results. Use the `--kronadb` parameter to specify the location of the Krona taxonomy file:
+
+```console
+--kronadb '[path to Krona taxonomy file]'
+```
+
 ## Genome size
 
 The pipeline can be provided with a genome size which will be used by fastq-scan to calculate an approximate read coverage.  Use the `--genome size` parameter to specify the genome size of the species being analysed:
@@ -69,6 +77,7 @@ nextflow run avantonder/bacQC \
   --input samplesheet.csv \
   -profile singularity \
   --kraken2db path/to/kraken2/dir \
+  --kronadb path/to/kronataxonomy \
   --genome_size <ESTIMATED GENOME SIZE> \
   --outdir <OUTDIR> \
   -resume