Skip to content

Commit

Permalink
Merge pull request #26 from avantonder/dsl2
Browse files Browse the repository at this point in the history
Dsl2
  • Loading branch information
avantonder authored Nov 6, 2024
2 parents a6eeb6d + 32b086a commit 7dcc740
Show file tree
Hide file tree
Showing 73 changed files with 3,314 additions and 2,457 deletions.
Binary file modified .DS_Store
Binary file not shown.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.0.0 - [05/11/24]

- Significant recoding of pipeline to bring it more in line with current nf-core template.
- Add Krona to produce graphical outputs of Bracken results. Path to Krona Taxonomy file will have to be specified with `--kronadb`.
- Update FastQC from version 0.11.9 to version 0.12.1.
- Update fastp from version 0.23.2 to version 0.23.4.
- Update Kraken 2 from version 2.1.2 to version 2.1.3.
- Update Bracken from version 2.7 to version 2.9.
- Update MultiQC from version 1.13 to version 1.25.1. Report now includes Bracken outputs and Kraken 2 outputs.

## v1.2 - [30/01/24]

- Remove `--brackendb` parameter as redundant. Bracken will now use the database location specified with `--krakendb`.
Expand Down
3 changes: 3 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@
- [Kraken 2](https://www.ncbi.nlm.nih.gov/pubmed/31779668/)
> Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0. PubMed PMID: 31779668; PubMed Central PMCID: PMC6883579.
- [Krona](https://pubmed.ncbi.nlm.nih.gov/21961884/)
> Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: 10.1186/1471-2105-12-385. PMID: 21961884; PMCID: PMC3190407.
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
3. Trim reads for quality and adapter sequence ([`fastp`](https://github.com/OpenGene/fastp))
4. Assign taxonomic labels to sequence reads ([`Kraken 2`](https://ccb.jhu.edu/software/kraken2/))
5. Re-estimate taxonomic abundance of samples analyzed by kraken 2([`Bracken`](https://ccb.jhu.edu/software/bracken/))
6. Visualize Bracken reports with ([`Krona`](https://github.com/marbl/Krona))
6. Extract reads using Taxon ID ([`KrakenTools`](https://github.com/jenniferlu717/KrakenTools))) (OPTIONAL)
6. Present QC and visualisation for raw read, trimmed read and kraken2/Bracken results ([`MultiQC`](http://multiqc.info/))

Expand All @@ -36,6 +37,12 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

tar xvfz minikraken2_v1_8GB_201904.tgz
```
4. Download the taxonomy file for Krona (this requires Krona to be installed e.g. with Conda):

```console
ktUpdateTaxonomy.sh .
```

4. Download the pipeline and test it on a minimal dataset with a single command:

```bash
Expand Down Expand Up @@ -65,6 +72,7 @@ Alternatively the samplesheet.csv file created by nf-core/fetchngs can also be u
-profile <docker/singularity/podman/conda/institute> \
--input samplesheet.csv \
--kraken2db minikraken2_v1_8GB \
--kronadb taxonomy.tab \
--genome_size 4300000 \
--outdir <OUTDIR>
```
Expand All @@ -76,6 +84,7 @@ Alternatively the samplesheet.csv file created by nf-core/fetchngs can also be u
-profile <docker/singularity/podman/conda/institute> \
--input samplesheet.csv \
--kraken2db minikraken2_v1_8GB \
--kronadb taxonomy.tab \
--genome_size 4300000 \
--kraken_extract \
--tax_id <TAXON_ID> \
Expand Down
Binary file modified assets/bacQC_metromap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 27 additions & 0 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
id: "avantonder-bacQC-methods-description"
description: "Suggested text and references to use when describing pipeline usage within the methods section of a publication."
section_name: "avantonder/bacQC Methods Description"
section_href: "https://github.com/avantonder/bacQC"
plot_type: "html"
data: |
<h4>Methods</h4>
<p>Data was processed using avantonder/bacQC v${workflow.manifest.version} ${doi_text} utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
<pre><code>${workflow.commandLine}</code></pre>
<p>${tool_citations}</p>
<h4>References</h4>
<ul>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The avantonder framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
<li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
<li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
${tool_bibliography}
</ul>
<div class="alert alert-info">
<h5>Notes:</h5>
<ul>
${nodoi_text}
<li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
<li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
</ul>
</div>
102 changes: 81 additions & 21 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,37 +1,97 @@
report_comment: >
This report has been generated by the <a href="https://github.com/avantonder/bacQC" target="_blank">bacQC</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/avantonder/bacQC" target="_blank">documentation</a>.
<a href="https://github.com/avantonder/bacQC/blob/main/docs/output.md" target="_blank">documentation</a>.
data_format: "yaml"
max_table_rows: 10000
report_section_order:
"avantonder-bacQC-methods-description":
order: -1000
software_versions:
order: -1001
"avantonder-bacqc-summary":
order: -1002
general_stats":
order: 1000
fastqc:
order: 900
fastp:
order: 800
kraken:
order: 700
bracken:
order: 600

export_plots: true

run_modules:
- custom_content
- fastqc
- fastp
- kraken
- custom_content

module_order:
- fastqc:
name: "PREPROCESS: FastQC (raw reads)"
top_modules:
- "fastqc":
name: "FastQC (raw reads)"
info: "This section of the report shows FastQC results for the raw reads before adapter trimming."
path_filters:
- "./fastqc/*.zip"
- fastp:
name: "PREPROCESS: fastp (adapter trimming)"
- "*_fastqc.zip"
- "fastp":
name: "fastp (adapter trimming)"
info: "This section of the report shows fastp results for reads after adapter and quality trimming."
- kraken:
name: "PREPROCESS: Kraken 2"
info: "This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp."
- "kraken":
name: "Kraken"
path_filters:
- "*.kraken2.report.txt"
- "kraken":
name: "Bracken"
anchor: "bracken"
target: "Bracken"
doi: "10.7717/peerj-cs.104"
info: "Estimates species abundances in metagenomics samples by probabilistically re-distributing reads in the taxonomic tree."
path_filters:
- "*.kraken2.report_bracken.txt"

report_section_order:
summary_assembly_metrics:
before: summary_variants_metrics
software_versions:
order: -1001
"nf-core-bacqc-summary":
order: -1002
table_columns_placement:
FastQC (raw reads):
total_sequences: 100
avg_sequence_length: 110
median_sequence_length: 120
percent_duplicates: 130
percent_gc: 140
percent_fails: 150
fastp (adapter trimming):
pct_adapter: 300
pct_surviving: 310
pct_duplication: 320
after_filtering_gc_content: 330
after_filtering_q30_rate: 340
after_filtering_q30_bases: 350
filtering_result_passed_filter_reads: 360
Bracken:
"% Unclassified": 1200
"% Top 5": 1210
Kraken:
"% Unclassified": 1600
"% Top 5": 1610

table_columns_visible:
FastQC (raw reads):
total_sequences: True
avg_sequence_length: True
percent_duplicates: True
percent_gc: True
percent_fails: False
Kraken: False
Bracken: False

table_columns_name:
FastQC (raw reads):
total_sequences: "Nr. Input Reads"
avg_sequence_length: "Length Input Reads"
percent_gc: "% GC Input Reads"
percent_duplicates: "% Dups Input Reads"
percent_fails: "% Failed Input Reads"

export_plots: true
section_comments:
general_stats: "By default, all read count columns are displayed as millions (M) of reads."
33 changes: 33 additions & 0 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/denovotranscript/master/assets/schema_input.json",
"title": "nf-core/denovotranscript pipeline - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"sample": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces",
"meta": ["id"]
},
"fastq_1": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
}
},
"required": ["sample", "fastq_1"]
}
}
44 changes: 22 additions & 22 deletions assets/sendmail_template.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,21 @@ Content-Type: text/html; charset=utf-8
$email_html

--nfcoremimeboundary
Content-Type: image/png;name="nf-core-bacqc_logo.png"
Content-Type: image/png;name="bacqc_logo.png"
Content-Transfer-Encoding: base64
Content-ID: <nfcorepipelinelogo>
Content-Disposition: inline; filename="nf-core-bacqc_logo.png"
Content-Disposition: inline; filename="bacqc_logo.png"

<% out << new File("$projectDir/assets/nf-core-bacqc_logo.png").
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' ) %>
<% out << new File("$projectDir/assets/bacqc_logo.png").
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' ) %>

<%
if (mqcFile){
Expand All @@ -37,17 +37,17 @@ Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"

${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>

--nfcoremimeboundary--
--nfcoremimeboundary--
Loading

0 comments on commit 7dcc740

Please sign in to comment.