Skip to content

Commit

Permalink
Merge pull request #1476 from nf-core/dev
Browse files Browse the repository at this point in the history
Dev -> Master for 3.18.0
  • Loading branch information
pinin4fjords authored Dec 20, 2024
2 parents 00f924c + 324dcdf commit b96a753
Show file tree
Hide file tree
Showing 97 changed files with 6,676 additions and 517 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,11 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- uses: actions/setup-java@8df1039502a15bceb9433410b1a100fbe190c53b # v4
with:
distribution: "temurin"
java-version: "17"

- name: Set up Nextflow
uses: nf-core/setup-nextflow@v2
with:
Expand Down
108 changes: 78 additions & 30 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,54 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

# 3.18.0 - 2024-12-19

### Credits

Special thanks to the following for their contributions to the release:

- [Caitlin Winkler](https://github.com/oligomyeggo)
- [Jonathan Manning](https://github.com/pinin4fjords)
- [Lorenzo Sola](https://github.com/LorenzoS96)
- [Maxime Garcia](https://github.com/maxulysse)
- [Siddhartha Bagaria](https://github.com/siddharthab)

### Enhancements & fixes

- [PR #1369](https://github.com/nf-core/rnaseq/pull/1369) - Add umicollapse as an alternative to umi-tools
- [PR #1461](https://github.com/nf-core/rnaseq/pull/1461) - Add FASTQ linting during preprocessing
- [PR #1463](https://github.com/nf-core/rnaseq/pull/1463) - Move channel operations outside of the onComplete() block
- [PR #1467](https://github.com/nf-core/rnaseq/pull/1467) - Add test suite for UMI handling functionality
- [PR #1466](https://github.com/nf-core/rnaseq/pull/1466) - Factor out UMI handling
- [PR #1470](https://github.com/nf-core/rnaseq/pull/1470) - Update subworkflow to account for fix to bad argument handling
- [PR #1469](https://github.com/nf-core/rnaseq/pull/1469) - Minor docs fix
- [PR #1459](https://github.com/nf-core/rnaseq/pull/1466) - Remove reference to unused "skip_sample_count" value in email templates
- [PR #1471](https://github.com/nf-core/rnaseq/pull/1471) - Fix prepare_genome subworkflow for sortmerna
- [PR #1473](https://github.com/nf-core/rnaseq/pull/1473) - Bump STAR modules
- [PR #1474](https://github.com/nf-core/rnaseq/pull/1474) - Bump versions to 3.18.0
- [PR #1475](https://github.com/nf-core/rnaseq/pull/1475) - Fix log publishing around umitools/ umicollapse
- [PR #1447](https://github.com/nf-core/rnaseq/pull/1447) - Add tutorial series for analysing count data

## Parameters

| Old parameter | New parameter |
| ------------- | --------------------- |
| | `--skip_linting` |
| | `--extra_fqlint_args` |
| | `--umi_dedup_tool` |

### Software dependencies

| Dependency | Old version | New version |
| ------------- | ----------- | ----------- |
| `UMICollapse` | | 1.1.0 |

> **NB:** Dependency has been **updated** if both old and new version information is present.
>
> **NB:** Dependency has been **added** if just the new version information is present.
>
> **NB:** Dependency has been **removed** if new version information isn't present.
## [[3.17.0](https://github.com/nf-core/rnaseq/releases/tag/3.17.0)] - 2024-10-23

### Credits
Expand Down Expand Up @@ -1007,14 +1055,14 @@ Note, since the pipeline is now using Nextflow DSL2, each process will be run wi

### Parameters

| Old parameter | New parameter |
| --------------------------- | -------------------------------------- |
| `--fc_extra_attributes` | `--gtf_extra_attributes` |
|  `--fc_group_features` |  `--gtf_group_features` |
|  `--fc_count_type` |  `--gtf_count_type` |
|  `--fc_group_features_type` |  `--gtf_group_features_type` |
|   |  `--singularity_pull_docker_container` |
|  `--skip_featurecounts` |   |
| Old parameter | New parameter |
| -------------------------- | ------------------------------------- |
| `--fc_extra_attributes` | `--gtf_extra_attributes` |
| `--fc_group_features` | `--gtf_group_features` |
| `--fc_count_type` | `--gtf_count_type` |
| `--fc_group_features_type` | `--gtf_group_features_type` |
| | `--singularity_pull_docker_container` |
| `--skip_featurecounts` | |

> **NB:** Parameter has been **updated** if both old and new parameter information is present.
> **NB:** Parameter has been **added** if just the new parameter information is present.
Expand Down Expand Up @@ -1092,28 +1140,28 @@ Note, since the pipeline is now using Nextflow DSL2, each process will be run wi

#### Updated

| Old parameter | New parameter |
| ----------------------------- | --------------------------- |
| `--reads` | `--input` |
|  `--igenomesIgnore` |  `--igenomes_ignore` |
|  `--removeRiboRNA` |  `--remove_ribo_rna` |
|  `--rRNA_database_manifest` |  `--ribo_database_manifest` |
|  `--save_nonrRNA_reads` |  `--save_non_ribo_reads` |
|  `--saveAlignedIntermediates` |  `--save_align_intermeds` |
|  `--saveReference` |  `--save_reference` |
|  `--saveTrimmed` |  `--save_trimmed` |
|  `--saveUnaligned` |  `--save_unaligned` |
|  `--skipAlignment` |  `--skip_alignment` |
|  `--skipBiotypeQC` |  `--skip_biotype_qc` |
|  `--skipDupRadar` |  `--skip_dupradar` |
|  `--skipFastQC` |  `--skip_fastqc` |
|  `--skipMultiQC` |  `--skip_multiqc` |
|  `--skipPreseq` |  `--skip_preseq` |
|  `--skipQC` |  `--skip_qc` |
|  `--skipQualimap` |  `--skip_qualimap` |
|  `--skipRseQC` |  `--skip_rseqc` |
|  `--skipTrimming` |  `--skip_trimming` |
|  `--stringTieIgnoreGTF` |  `--stringtie_ignore_gtf` |
| Old parameter | New parameter |
| ---------------------------- | -------------------------- |
| `--reads` | `--input` |
| `--igenomesIgnore` | `--igenomes_ignore` |
| `--removeRiboRNA` | `--remove_ribo_rna` |
| `--rRNA_database_manifest` | `--ribo_database_manifest` |
| `--save_nonrRNA_reads` | `--save_non_ribo_reads` |
| `--saveAlignedIntermediates` | `--save_align_intermeds` |
| `--saveReference` | `--save_reference` |
| `--saveTrimmed` | `--save_trimmed` |
| `--saveUnaligned` | `--save_unaligned` |
| `--skipAlignment` | `--skip_alignment` |
| `--skipBiotypeQC` | `--skip_biotype_qc` |
| `--skipDupRadar` | `--skip_dupradar` |
| `--skipFastQC` | `--skip_fastqc` |
| `--skipMultiQC` | `--skip_multiqc` |
| `--skipPreseq` | `--skip_preseq` |
| `--skipQC` | `--skip_qc` |
| `--skipQualimap` | `--skip_qualimap` |
| `--skipRseQC` | `--skip_rseqc` |
| `--skipTrimming` | `--skip_trimming` |
| `--stringTieIgnoreGTF` | `--stringtie_ignore_gtf` |

#### Added

Expand Down
19 changes: 0 additions & 19 deletions assets/email_template.html
Original file line number Diff line number Diff line change
Expand Up @@ -34,25 +34,6 @@ <h4 style="margin-top: 0; color: inherit">nf-core/rnaseq execution completed uns
<p>The full error message was:</p>
<pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0">${errorReport}</pre>
</div>
""" } else if(skip_sample_count > 0) { out << """
<div
style="
color: #856404;
background-color: #fff3cd;
border-color: #ffeeba;
padding: 15px;
margin-bottom: 20px;
border: 1px solid transparent;
border-radius: 4px;
"
>
<h4 style="margin-top: 0; color: inherit">nf-core/rnaseq execution completed with warnings!</h4>
<p>
The pipeline finished successfully, but samples were skipped. Please check warnings at the top of the MultiQC report.
</p>
<p></p>
</div>

""" } else { out << """
<div
style="
Expand Down
7 changes: 0 additions & 7 deletions assets/email_template.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,6 @@ The full error message was:

${errorReport}
"""
} else if (skip_sample_count > 0) {
out << """##################################################
## nf-core/rnaseq execution completed with warnings ##
##################################################
The pipeline finished successfully, but samples were skipped.
Please check warnings at the top of the MultiQC report.
"""
} else {
out << "## nf-core/rnaseq execution completed successfully! ##"
}
Expand Down
Binary file added docs/images/mqc_fastqc_adapter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mqc_fastqc_counts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mqc_fastqc_quality.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 19 additions & 4 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Pipeline overview](#pipeline-overview)
- [Preprocessing](#preprocessing)
- [cat](#cat)
[fq lint](#fq-lint)
- [FastQC](#fastqc)
- [UMI-tools extract](#umi-tools-extract)
- [TrimGalore](#trimgalore)
Expand Down Expand Up @@ -73,6 +74,20 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

If multiple libraries/runs have been provided for the same sample in the input samplesheet (e.g. to increase sequencing depth) then these will be merged at the very beginning of the pipeline in order to have consistent sample naming throughout the pipeline. Please refer to the [usage documentation](https://nf-co.re/rnaseq/usage#samplesheet-input) to see how to specify these samples in the input samplesheet.

### fq lint

<details markdown="1">
<summary>Output files</summary>

- `fq_lint/*`
- `*.fq_lint.txt`: Linting report per library from `fq lint`.

> **NB:** You will see subdirectories here based on the stage of preprocessing for the files that have been linted, for example `raw`, `trimmed`.
</details>

[fq lint](https://github.com/stjude-rust-labs/fq) runs several checks on input FASTQ files. It will fail with a non-zero error code when issues are found, which will terminate the workflow execution. In the absence of this, the successful linting produces the logs you will find here.

### FastQC

<details markdown="1">
Expand Down Expand Up @@ -105,7 +120,7 @@ If multiple libraries/runs have been provided for the same sample in the input s

</details>

[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI-tools dedup](#umi-tools-dedup) section.
[UMI-tools](https://github.com/CGATOxford/UMI-tools) and [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse) deduplicate reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI dedup](#umi-dedup) section.

To facilitate processing of input data which has the UMI barcode already embedded in the read name from the start, `--skip_umi_extract` can be specified in conjunction with `--with_umi`.

Expand Down Expand Up @@ -290,7 +305,7 @@ The original BAM files generated by the selected alignment algorithm are further

![MultiQC - SAMtools mapped reads per contig plot](images/mqc_samtools_idxstats.png)

### UMI-tools dedup
### UMI dedup

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -299,7 +314,7 @@ The original BAM files generated by the selected alignment algorithm are further
- `<SAMPLE>.umi_dedup.sorted.bam`: If `--save_umi_intermeds` is specified the UMI deduplicated, coordinate sorted BAM file containing read alignments will be placed in this directory.
- `<SAMPLE>.umi_dedup.sorted.bam.bai`: If `--save_umi_intermeds` is specified the BAI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
- `<SAMPLE>.umi_dedup.sorted.bam.csi`: If `--save_umi_intermeds --bam_csi_index` is specified the CSI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
- `<ALIGNER>/umitools/`
- `<ALIGNER>/umitools/` (UMI-tools only)
- `*_edit_distance.tsv`: Reports the (binned) average edit distance between the UMIs at each position.
- `*_per_umi.tsv`: UMI-level summary statistics.
- `*_per_umi_per_position.tsv`: Tabulates the counts for unique combinations of UMI and position.
Expand All @@ -308,7 +323,7 @@ The content of the files above is explained in more detail in the [UMI-tools doc

</details>

After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information using the UMI-tools `dedup` command. This will generate a filtered BAM file after the removal of PCR duplicates.
After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information. UMI deduplication can be carried out either with [UMI-tools](https://github.com/CGATOxford/UMI-tools) or [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse), set via the `umi_dedup_tool` parameter. The output BAM files are the same, though UMI-tools has some additional outputs, as described above. Either method will generate a filtered BAM file after the removal of PCR duplicates.

### picard MarkDuplicates

Expand Down
6 changes: 6 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,a
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto
```

### Linting

By default, the pipeline will run [fq lint](https://github.com/stjude-rust-labs/fq) on all input FASTQ files, both at the start of preprocessing and after each preprocessing step that manipulates FASTQ files. If errors are found, and error will be reported and the workflow will stop.

The `extra_fqlint_args` parameter can be manipulated to disable [any validator](https://github.com/stjude-rust-labs/fq?tab=readme-ov-file#validators) from `fq` you wish. For example, we have found that checks on the names of paired reads are prone to failure, so that check is disabled by default (setting `extra_fqlint_args` to `--disable-validator P001`).

### Strandedness Prediction

If you set the strandedness value to `auto`, the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to automatically infer the strandedness, and then propagate this information through the rest of the pipeline. This behavior is controlled by the `--stranded_threshold` and `--unstranded_threshold` parameters, which are set to 0.8 and 0.1 by default, respectively. This means:
Expand Down
Loading

0 comments on commit b96a753

Please sign in to comment.