diff --git a/.editorconfig b/.editorconfig
index dd9ffa53..e1058815 100644
--- a/.editorconfig
+++ b/.editorconfig
@@ -11,6 +11,7 @@ indent_style = space
[*.{md,yml,yaml,html,css,scss,js}]
indent_size = 2
+
# These files are edited and tested upstream in nf-core/modules
[/modules/nf-core/**]
charset = unset
@@ -25,13 +26,12 @@ insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset
+
+
[/assets/email*]
indent_size = unset
-# ignore Readme
-[README.md]
-indent_style = unset
-# ignore python
+# ignore python and markdown
[*.{py,md}]
indent_style = unset
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
index 0779fb9b..09ba835d 100644
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/smrnaseq, the standard workflow is
1. Check that there isn't already an issue about your idea in the [nf-core/smrnaseq issues](https://github.com/nf-core/smrnaseq/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/smrnaseq repository](https://github.com/nf-core/smrnaseq) to your GitHub account
3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
-4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
+4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).
@@ -40,7 +40,7 @@ There are typically two types of tests that run:
### Lint tests
`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
-To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command.
+To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command.
If any failures or warnings are encountered, please follow the listed URL for more documentation.
@@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards:
2. Write the process block (see below).
3. Define the output channel if needed (see below).
4. Add any new parameters to `nextflow.config` with a default (see below).
-5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool).
+5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool).
6. Add sanity checks and validation for all relevant parameters.
7. Perform local tests to validate that the new code works as expected.
8. If applicable, add a new test command in `.github/workflow/ci.yml`.
@@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards:
Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.
-Once there, use `nf-core schema build` to add to `nextflow_schema.json`.
+Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`.
### Default processes resource requirements
-Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
+Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block.
@@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g
### Nextflow version bumping
-If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`
+If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]`
### Images and figures
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index ef59ff45..2544ad43 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -17,8 +17,8 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/smrn
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/smrnaseq/tree/master/.github/CONTRIBUTING.md)
- [ ] If necessary, also make a PR on the nf-core/smrnaseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
-- [ ] Make sure your code lints (`nf-core lint`).
-- [ ] Ensure the test suite passes (`nf-test test main.nf.test -profile test,docker`).
+- [ ] Make sure your code lints (`nf-core pipelines lint`).
+- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `).
- [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml
index 99c96e77..36fdbad6 100644
--- a/.github/workflows/awsfulltest.yml
+++ b/.github/workflows/awsfulltest.yml
@@ -1,19 +1,36 @@
name: nf-core AWS full size tests
-# This workflow is triggered on published releases.
+# This workflow is triggered on PRs opened against the master branch.
# It can be additionally triggered manually with GitHub actions workflow dispatch button.
# It runs the -profile 'test_full' on AWS batch
on:
- release:
- types: [published]
+ pull_request:
+ branches:
+ - master
workflow_dispatch:
+ pull_request_review:
+ types: [submitted]
+
jobs:
- run-tower:
+ run-platform:
name: Run AWS full tests
- if: github.repository == 'nf-core/smrnaseq'
+ # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered
+ if: github.repository == 'nf-core/smrnaseq' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
steps:
- - name: Launch workflow via tower
+ - uses: octokit/request-action@v2.x
+ id: check_approvals
+ with:
+ route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ - id: test_variables
+ if: github.event_name != 'workflow_dispatch'
+ run: |
+ JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}'
+ CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length')
+ test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required
+ - name: Launch workflow via Seqera Platform
uses: seqeralabs/action-tower-launch@v2
# Add full size test data (but still relatively small datasets for few samples)
# on the `test_full.config` test runs with only one set of parameters
@@ -32,7 +49,7 @@ jobs:
- uses: actions/upload-artifact@v4
with:
- name: Tower debug log file
+ name: Seqera Platform debug log file
path: |
- tower_action_*.log
- tower_action_*.json
+ seqera_platform_action_*.log
+ seqera_platform_action_*.json
diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml
index 5386cbc0..398ec4cc 100644
--- a/.github/workflows/awstest.yml
+++ b/.github/workflows/awstest.yml
@@ -5,13 +5,13 @@ name: nf-core AWS test
on:
workflow_dispatch:
jobs:
- run-tower:
+ run-platform:
name: Run AWS tests
if: github.repository == 'nf-core/smrnaseq'
runs-on: ubuntu-latest
steps:
- # Launch workflow using Tower CLI tool action
- - name: Launch workflow via tower
+ # Launch workflow using Seqera Platform CLI tool action
+ - name: Launch workflow via Seqera Platform
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
@@ -23,11 +23,11 @@ jobs:
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/smrnaseq/results-test-${{ github.sha }}"
}
- profiles: test
+ profiles: test,illumina
- uses: actions/upload-artifact@v4
with:
- name: Tower debug log file
+ name: Seqera Platform debug log file
path: |
- tower_action_*.log
- tower_action_*.json
+ seqera_platform_action_*.log
+ seqera_platform_action_*.json
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index cdadbf16..dc67eea0 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -4,12 +4,23 @@ on:
push:
branches:
- dev
+ - master
pull_request:
+ branches:
+ - dev
+ - master
release:
types: [published]
+ workflow_dispatch:
env:
NXF_ANSI_LOG: false
+ NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity
+ NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity
+ NFT_VER: "0.9.0"
+ NFT_WORKDIR: "~"
+ NFT_DIFF: "pdiff"
+ NFT_DIFF_ARGS: "--line-numbers --expand-tabs=2"
concurrency:
group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}"
@@ -17,32 +28,56 @@ concurrency:
jobs:
test:
- name: Run pipeline with test data
+ name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }})"
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/smrnaseq') }}"
runs-on: ubuntu-latest
strategy:
+ fail-fast: false
matrix:
+ shard: [1, 2, 3, 4]
NXF_VER:
- - "23.04.0"
- - "latest-everything"
- profile:
- - "test"
- - "test_no_genome"
- - "test_umi"
- - "test_index"
+ - "24.04.2"
+ profile: ["docker"]
+ env:
+ SHARDS: "4"
steps:
- name: Check out pipeline code
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
+ uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
+ with:
+ fetch-depth: 0
+
+ - uses: actions/setup-python@v4
+ with:
+ python-version: "3.11"
+ architecture: "x64"
- - name: Install Nextflow
- uses: nf-core/setup-nextflow@v1
+ - name: Install pdiff to see diff between nf-test snapshots
+ run: |
+ python -m pip install --upgrade pip
+ pip install pdiff
+
+ - uses: nf-core/setup-nextflow@v2
with:
version: "${{ matrix.NXF_VER }}"
- - name: Disk space cleanup
- uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+ - uses: nf-core/setup-nf-test@v1
+ with:
+ version: ${{ env.NFT_VER }}
- - name: Run pipeline with test data
+ - name: Run Tests (Shard ${{ matrix.shard }}/${{ env.SHARDS }})
run: |
- nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker --outdir ./results
+ nf-test test \
+ --ci \
+ --shard ${{ matrix.shard }}/${{ env.SHARDS }} \
+ --changed-since HEAD^ \
+ --profile "+${{ matrix.profile }},ci" \
+ --filter pipeline \
+ --junitxml=test.xml
+
+ - name: Publish Test Report
+ uses: mikepenz/action-junit-report@v3
+ if: always() # always run even if the previous step fails
+ with:
+ report_paths: test.xml
+ annotate_only: true
diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml
index 08622fd5..1552cf2e 100644
--- a/.github/workflows/download_pipeline.yml
+++ b/.github/workflows/download_pipeline.yml
@@ -1,4 +1,4 @@
-name: Test successful pipeline download with 'nf-core download'
+name: Test successful pipeline download with 'nf-core pipelines download'
# Run the workflow when:
# - dispatched manually
@@ -8,12 +8,14 @@ on:
workflow_dispatch:
inputs:
testbranch:
- description: "The specific branch you wish to utilize for the test execution of nf-core download."
+ description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download."
required: true
default: "dev"
pull_request:
types:
- opened
+ - edited
+ - synchronize
branches:
- master
pull_request_target:
@@ -28,15 +30,20 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Install Nextflow
- uses: nf-core/setup-nextflow@v1
+ uses: nf-core/setup-nextflow@v2
- - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
+ - name: Disk space cleanup
+ uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+
+ - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
- python-version: "3.11"
+ python-version: "3.12"
architecture: "x64"
- - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7
+
+ - name: Setup Apptainer
+ uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0
with:
- singularity-version: 3.8.3
+ apptainer-version: 1.3.4
- name: Install dependencies
run: |
@@ -49,24 +56,64 @@ jobs:
echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV}
echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV}
+ - name: Make a cache directory for the container images
+ run: |
+ mkdir -p ./singularity_container_images
+
- name: Download the pipeline
env:
- NXF_SINGULARITY_CACHEDIR: ./
+ NXF_SINGULARITY_CACHEDIR: ./singularity_container_images
run: |
- nf-core download ${{ env.REPO_LOWERCASE }} \
+ nf-core pipelines download ${{ env.REPO_LOWERCASE }} \
--revision ${{ env.REPO_BRANCH }} \
--outdir ./${{ env.REPOTITLE_LOWERCASE }} \
--compress "none" \
--container-system 'singularity' \
- --container-library "quay.io" -l "docker.io" -l "ghcr.io" \
+ --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \
--container-cache-utilisation 'amend' \
- --download-configuration
+ --download-configuration 'yes'
- name: Inspect download
run: tree ./${{ env.REPOTITLE_LOWERCASE }}
- - name: Run the downloaded pipeline
+ - name: Count the downloaded number of container images
+ id: count_initial
+ run: |
+ image_count=$(ls -1 ./singularity_container_images | wc -l | xargs)
+ echo "Initial container image count: $image_count"
+ echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV}
+
+ - name: Run the downloaded pipeline (stub)
+ id: stub_run_pipeline
+ continue-on-error: true
env:
- NXF_SINGULARITY_CACHEDIR: ./
+ NXF_SINGULARITY_CACHEDIR: ./singularity_container_images
NXF_SINGULARITY_HOME_MOUNT: true
- run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results
+ run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity,illumina --outdir ./results
+ - name: Run the downloaded pipeline (stub run not supported)
+ id: run_pipeline
+ if: ${{ job.steps.stub_run_pipeline.status == failure() }}
+ env:
+ NXF_SINGULARITY_CACHEDIR: ./singularity_container_images
+ NXF_SINGULARITY_HOME_MOUNT: true
+ run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results
+
+ - name: Count the downloaded number of container images
+ id: count_afterwards
+ run: |
+ image_count=$(ls -1 ./singularity_container_images | wc -l | xargs)
+ echo "Post-pipeline run container image count: $image_count"
+ echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV}
+
+ - name: Compare container image counts
+ run: |
+ if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then
+ initial_count=${{ env.IMAGE_COUNT_INITIAL }}
+ final_count=${{ env.IMAGE_COUNT_AFTER }}
+ difference=$((final_count - initial_count))
+ echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!"
+ tree ./singularity_container_images
+ exit 1
+ else
+ echo "The pipeline can be downloaded successfully!"
+ fi
diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml
index 56151e57..5dbcd658 100644
--- a/.github/workflows/fix-linting.yml
+++ b/.github/workflows/fix-linting.yml
@@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
steps:
# Use the @nf-core-bot token to check out so we can push later
- - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
+ - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
with:
token: ${{ secrets.nf_core_bot_auth_token }}
@@ -32,9 +32,9 @@ jobs:
GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }}
# Install and run pre-commit
- - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
+ - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
- python-version: 3.11
+ python-version: "3.12"
- name: Install pre-commit
run: pip install pre-commit
diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
index 073e1876..a502573c 100644
--- a/.github/workflows/linting.yml
+++ b/.github/workflows/linting.yml
@@ -1,6 +1,6 @@
name: nf-core linting
# This workflow is triggered on pushes and PRs to the repository.
-# It runs the `nf-core lint` and markdown lint tests to ensure
+# It runs the `nf-core pipelines lint` and markdown lint tests to ensure
# that the code meets the nf-core guidelines.
on:
push:
@@ -14,13 +14,12 @@ jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
+ - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
- - name: Set up Python 3.11
- uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
+ - name: Set up Python 3.12
+ uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
- python-version: 3.11
- cache: "pip"
+ python-version: "3.12"
- name: Install pre-commit
run: pip install pre-commit
@@ -32,27 +31,42 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Check out pipeline code
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
+ uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
- name: Install Nextflow
- uses: nf-core/setup-nextflow@v1
+ uses: nf-core/setup-nextflow@v2
- - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
+ - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
- python-version: "3.11"
+ python-version: "3.12"
architecture: "x64"
+ - name: read .nf-core.yml
+ uses: pietrobolcato/action-read-yaml@1.1.0
+ id: read_yml
+ with:
+ config: ${{ github.workspace }}/.nf-core.yml
+
- name: Install dependencies
run: |
python -m pip install --upgrade pip
- pip install nf-core
+ pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }}
+
+ - name: Run nf-core pipelines lint
+ if: ${{ github.base_ref != 'master' }}
+ env:
+ GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
+ run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md
- - name: Run nf-core lint
+ - name: Run nf-core pipelines lint --release
+ if: ${{ github.base_ref == 'master' }}
env:
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
- run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md
+ run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md
- name: Save PR number
if: ${{ always() }}
@@ -60,7 +74,7 @@ jobs:
- name: Upload linting log file artifact
if: ${{ always() }}
- uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4
+ uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4
with:
name: linting-logs
path: |
diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml
index b706875f..42e519bf 100644
--- a/.github/workflows/linting_comment.yml
+++ b/.github/workflows/linting_comment.yml
@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Download lint results
- uses: dawidd6/action-download-artifact@f6b0bace624032e30a85a8fd9c1a7f8f611f5737 # v3
+ uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6
with:
workflow: linting.yml
workflow_conclusion: completed
diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml
index d468aeaa..c6ba35df 100644
--- a/.github/workflows/release-announcements.yml
+++ b/.github/workflows/release-announcements.yml
@@ -12,7 +12,7 @@ jobs:
- name: get topics and convert to hashtags
id: get_topics
run: |
- curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ' >> $GITHUB_OUTPUT
+ echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT
- uses: rzr/fediverse-action@master
with:
@@ -25,13 +25,13 @@ jobs:
Please see the changelog: ${{ github.event.release.html_url }}
- ${{ steps.get_topics.outputs.GITHUB_OUTPUT }} #nfcore #openscience #nextflow #bioinformatics
+ ${{ steps.get_topics.outputs.topics }} #nfcore #openscience #nextflow #bioinformatics
send-tweet:
runs-on: ubuntu-latest
steps:
- - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
+ - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
python-version: "3.10"
- name: Install dependencies
diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml
new file mode 100644
index 00000000..e8aafe44
--- /dev/null
+++ b/.github/workflows/template_version_comment.yml
@@ -0,0 +1,46 @@
+name: nf-core template version comment
+# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version.
+# It posts a comment to the PR, even if it comes from a fork.
+
+on: pull_request_target
+
+jobs:
+ template_version:
+ runs-on: ubuntu-latest
+ steps:
+ - name: Check out pipeline code
+ uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
+ with:
+ ref: ${{ github.event.pull_request.head.sha }}
+
+ - name: Read template version from .nf-core.yml
+ uses: nichmor/minimal-read-yaml@v0.0.2
+ id: read_yml
+ with:
+ config: ${{ github.workspace }}/.nf-core.yml
+
+ - name: Install nf-core
+ run: |
+ python -m pip install --upgrade pip
+ pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }}
+
+ - name: Check nf-core outdated
+ id: nf_core_outdated
+ run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV}
+
+ - name: Post nf-core template version comment
+ uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
+ if: |
+ contains(env.OUTPUT, 'nf-core')
+ with:
+ repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }}
+ allow-repeats: false
+ message: |
+ > [!WARNING]
+ > Newer version of the nf-core template is available.
+ >
+ > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}.
+ > Please update your pipeline to the latest version.
+ >
+ > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync).
+ #
diff --git a/.gitignore b/.gitignore
index 4109b5c9..24fc8b91 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,4 +6,6 @@ results/
testing/
testing*
*.pyc
+null/
execution_trace*
+.nf-test*
diff --git a/.gitpod.yml b/.gitpod.yml
index 105a1821..46118637 100644
--- a/.gitpod.yml
+++ b/.gitpod.yml
@@ -4,17 +4,14 @@ tasks:
command: |
pre-commit install --install-hooks
nextflow self-update
- - name: unset JAVA_TOOL_OPTIONS
- command: |
- unset JAVA_TOOL_OPTIONS
vscode:
extensions: # based on nf-core.nf-core-extensionpack
- - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code
+ #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code
- EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files
- Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar
- mechatroner.rainbow-csv # Highlight columns in csv files in different colors
- # - nextflow.nextflow # Nextflow syntax highlighting
+ - nextflow.nextflow # Nextflow syntax highlighting
- oderwat.indent-rainbow # Highlight indentation level
- streetsidesoftware.code-spell-checker # Spelling checker for source code
- charliermarsh.ruff # Code linter Ruff
diff --git a/.nf-core.yml b/.nf-core.yml
index 8a74623b..98e8cdf5 100644
--- a/.nf-core.yml
+++ b/.nf-core.yml
@@ -1,5 +1,19 @@
-repository_type: pipeline
+bump_version: null
lint:
nextflow_config:
- config_defaults:
- params.fastp_known_mirna_adapters
+nf_core_version: 3.0.2
+org_path: null
+repository_type: pipeline
+template:
+ author: "P. Ewels, C. Wang, R. Hammar\xE9n, L. Pantano, A. Peltzer"
+ description: Small RNA-Seq Best Practice Analysis Pipeline.
+ force: false
+ is_nfcore: true
+ name: smrnaseq
+ org: nf-core
+ outdir: .
+ skip_features: null
+ version: 2.4.0
+update: null
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index af57081f..9e9f0e1c 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -3,8 +3,11 @@ repos:
rev: "v3.1.0"
hooks:
- id: prettier
+ additional_dependencies:
+ - prettier@3.2.5
+
- repo: https://github.com/editorconfig-checker/editorconfig-checker.python
- rev: "2.7.3"
+ rev: "3.0.3"
hooks:
- id: editorconfig-checker
alias: ec
diff --git a/.prettierignore b/.prettierignore
index 437d763d..610e5069 100644
--- a/.prettierignore
+++ b/.prettierignore
@@ -1,3 +1,4 @@
+
email_template.html
adaptivecard.json
slackreport.json
diff --git a/CHANGELOG.md b/CHANGELOG.md
index c42d00e3..32568137 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,6 +3,64 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## v2.4.0 - 2024-10-14 - Navy Iron Boxer
+
+- [[#349]](https://github.com/nf-core/smrnaseq/pull/349) - Fix [MIRTOP_QUANT conda issue](https://github.com/nf-core/smrnaseq/issues/347) - change conda-base to conda-forge channel.
+- [[#350]](https://github.com/nf-core/smrnaseq/pull/350) - Fix [MIRTOP_QUANT conda issue](https://github.com/nf-core/smrnaseq/issues/347) - set python version to 3.7 to fix pysam issue.
+- [[#361]](https://github.com/nf-core/smrnaseq/pull/361) - Fix [[#332]](https://github.com/nf-core/smrnaseq/issues/332) - Fix documentation to use only single-end.
+- [[#364]](https://github.com/nf-core/smrnaseq/pull/364) - Fix [Protocol inheritance issue](https://github.com/nf-core/smrnaseq/issues/351) - fixing protocol inheritance from subworkflow with move to config profile(s) for different protocols.
+- [[#372]](https://github.com/nf-core/smrnaseq/pull/372) - Fix [Plain test profile](https://github.com/nf-core/smrnaseq/issues/371) - Updated default protocol value to "custom".
+- [[#374]](https://github.com/nf-core/smrnaseq/pull/374) - Fix [default tests](https://github.com/nf-core/smrnaseq/issues/375) so that they do not require additional profiles in CI. Change GitHub CI fail-fast strategy to false.
+- [[#375]](https://github.com/nf-core/smrnaseq/pull/375) - Test [technical repeats](https://github.com/nf-core/smrnaseq/issues/212) - Test merging of technical repeats.
+- [[#377]](https://github.com/nf-core/smrnaseq/pull/377) - Fix [Linting](https://github.com/nf-core/smrnaseq/issues/369) - Fixed linting warnings and updated modules & subworkflows.
+- [[#378]](https://github.com/nf-core/smrnaseq/pull/378) - Fix [`--mirtrace_species` bug](<(https://github.com/nf-core/smrnaseq/issues/348)>) - Make `MIRTRACE` process conditional. Add mirgenedb test.
+- [[#380]](https://github.com/nf-core/smrnaseq/pull/380) - Fix [edgeR_mirBase.R](https://github.com/nf-core/smrnaseq/issues/187) - Fix checking number of samples which causes error in plotMDS. Add nf-tests for local modules using custom R scripts.
+- [[#381]](https://github.com/nf-core/smrnaseq/pull/381) - Update [Convert tests to nf-tests](https://github.com/nf-core/smrnaseq/issues/379) - CI tests to nf-tests.
+- [[#382]](https://github.com/nf-core/smrnaseq/pull/382) - Add [collapse_mirtop.R](https://github.com/nf-core/smrnaseq/issues/174) - Add nf-tests for local modules using custom R scripts.
+- [[#383]](https://github.com/nf-core/smrnaseq/pull/383) - Fix [parameter `--skip_fastp` throws an error](https://github.com/nf-core/smrnaseq/issues/263) - Fix parameter --skip_fastp.
+- [[#384]](https://github.com/nf-core/smrnaseq/pull/384) - Fix [filter status bug fix](https://github.com/nf-core/smrnaseq/issues/360) - Fix filter stats module and add filter contaminants test profile.
+- [[#386]](https://github.com/nf-core/smrnaseq/pull/386) - Fix [Nextflex trimming support](https://github.com/nf-core/smrnaseq/issues/365) - Fix Nextflex trimming support.
+- [[#387]](https://github.com/nf-core/smrnaseq/pull/387) - Add [contaminant filter failure because the Docker image for BLAT cannot be pulled](https://github.com/nf-core/smrnaseq/issues/354) - Add nf-test to local module `blat_mirna` and fixes . Adds a small test profile to test contaminant filter results.
+- [[#388]](https://github.com/nf-core/smrnaseq/pull/388) - Fix [igenomes fix](https://github.com/nf-core/smrnaseq/issues/360) - Fix workflow scripts so that they can use igenome parameters.
+- [[#391]](https://github.com/nf-core/smrnaseq/pull/391) - Fix [error because of large chromosomes](https://github.com/nf-core/smrnaseq/issues/132) - Change `.bai` index for `.csi` index in `samtools_index` to fix .
+- [[#392]](https://github.com/nf-core/smrnaseq/pull/392) - Update [Reduce tests](https://github.com/nf-core/smrnaseq/issues/389) - Combine and optimize tests, and reduce samplesheets sizes.
+- [[#397]](https://github.com/nf-core/smrnaseq/pull/397) - Fix [contaminant filter failure because of the Docker image for BLAT](https://github.com/nf-core/smrnaseq/issues/354) - Improvements to contaminant filter subworkflow and replacement for nf-core modules.
+- [[#398]](https://github.com/nf-core/smrnaseq/pull/398) - Update [Input channels](https://github.com/nf-core/smrnaseq/issues/390) - Updated channel and params handling through workflows.
+- [[#405]](https://github.com/nf-core/smrnaseq/pull/405) - Fix [Umicollapse algo wrong set](https://github.com/nf-core/smrnaseq/issues/404) - Fix potential bug in Umicollapse (not effective as we do not allow PE data in smrnaseq - but for consistency)
+- [[#420]](https://github.com/nf-core/smrnaseq/pull/420) - Fix [mirTrace produces an error in test nextflex](https://github.com/nf-core/smrnaseq/issues/419) - Allow config mode to be used in mirtrace/qc
+- [[#425]](https://github.com/nf-core/smrnaseq/pull/425) - Raise [minimum required NXF version for pipeline](https://github.com/nf-core/smrnaseq/issues/424) - usage of `arity` in some modules now requires this
+- [[#426]](https://github.com/nf-core/smrnaseq/pull/426) - Add [nf-core mirtop](https://github.com/nf-core/smrnaseq/issues/426) - replace local for nf-core `mirtop`
+- [[#427]](https://github.com/nf-core/smrnaseq/pull/427) - Add [nf-core pigz uncompress](https://github.com/nf-core/smrnaseq/issues/422) - replace local `mirdeep_pigz`
+- [[#429]](https://github.com/nf-core/smrnaseq/pull/429) - Make [saving of intermediate files optional](https://github.com/nf-core/smrnaseq/issues/424) - Allows user to choose whether to save intermediate files or not. Replaces several params that referred to the same such as `params.save_aligned` and `params.save_aligned_mirna_quant`.
+- [[#430]](https://github.com/nf-core/smrnaseq/pull/430) - Emit a [warning if paired-end end data is used](https://github.com/nf-core/smrnaseq/issues/423) - pipeline handles SE data
+- [[#432]](https://github.com/nf-core/smrnaseq/pull/432) - Update [MultiQC and all modules to latest version](https://github.com/nf-core/smrnaseq/issues/428) - Include UMIcollapse module in MultiQC.
+- [[#435]](https://github.com/nf-core/smrnaseq/pull/435) - Replace local instances of bowtie for nf-core [`bowtie2`](https://github.com/nf-core/smrnaseq/issues/434) and [`bowtie1`](https://github.com/nf-core/smrnaseq/issues/433) - Additionally adds a `bioawk` module that cleans fasta files.
+- [[#438]](https://github.com/nf-core/smrnaseq/pull/438) - Update [Mirtop to latest version](https://github.com/nf-core/smrnaseq/issues/437) - Process samples separately and join results with `CSVTK_JOIN`.
+- [[#439]](https://github.com/nf-core/smrnaseq/pull/439) - Fix [Fix paired end samples processing](https://github.com/nf-core/smrnaseq/issues/415) - Fix paired end sample handling and add test profile.
+- [[#441]](https://github.com/nf-core/smrnaseq/pull/441) - Migrate [local contaminant bowtie to nf-core](https://github.com/nf-core/smrnaseq/issues/436) - Replace local processes with `BOWTIE2_ALIGN`.
+- [[#443]](https://github.com/nf-core/smrnaseq/pull/443) - Migrate [mirna and genome_quant bowtie to nf-core](https://github.com/nf-core/smrnaseq/issues/436) - Replace local processes with `BOWTIE_ALIGN`.
+- [[#447]](https://github.com/nf-core/smrnaseq/pull/447) - Fix [Minor fixes and general pipeline cleanup](https://github.com/nf-core/smrnaseq/issues/400) - Update variable and processes names, update channel comments, remove unused modules and params.
+- [[#448]](https://github.com/nf-core/smrnaseq/pull/448) - Migrate local mirdeep to [nf-core mirdeep2 modules and subworkflow](https://github.com/nf-core/smrnaseq/issues/443) and generate [test profile for mirdeep2](https://github.com/nf-core/smrnaseq/issues/399).
+- [[#452]](https://github.com/nf-core/smrnaseq/pull/452) - Fix [Fix ch_bowtie_index channel structure](https://github.com/nf-core/smrnaseq/issues/451) and replace untarfiles with untar [replace untarfiles with untar](https://github.com/nf-core/smrnaseq/issues/449).
+- [[#457]](https://github.com/nf-core/smrnaseq/pull/457) - QC all input [fasta files and clean them](https://github.com/nf-core/smrnaseq/issues/455).
+- [[#459]](https://github.com/nf-core/smrnaseq/pull/459) - Update modules and subworkflows [and fix linting](https://github.com/nf-core/smrnaseq/issues/458).
+- [[#462]](https://github.com/nf-core/smrnaseq/pull/462) - Remove automatic wrapping of fasta files by `seqkit replace`. Minor documentation updates.
+- [[#464]](https://github.com/nf-core/smrnaseq/pull/464) - Added [proper licences and authorship information to scripts in `bin` folder](https://github.com/nf-core/smrnaseq/issues/465)
+
+### Software dependencies
+
+| Dependency | Old version | New version |
+| ---------- | ----------- | ----------- |
+| `bioawk` | - | 1.0 |
+| `bowtie` | 1.3.1 | 1.3.0 |
+| `bowtie2` | 2.4.5 | 2.5.2 |
+| `csvtk` | - | 0.30 |
+| `gawk` | - | 5.3.0 |
+| `mirtop` | 0.4.25 | 0.4.28 |
+| `multiqc` | 1.21 | 1.25.1 |
+| `samtools` | 1.19.2 | 1.21 |
+| `seqkit` | 2.6.1 | 2.8.1 |
+
## v2.3.1 - 2024-04-18 - Gray Zinc Dalmation Patch
- [[#328]](https://github.com/nf-core/smrnaseq/pull/328) - Fix [casting issue](https://github.com/nf-core/smrnaseq/issues/327) in mirtrace module
diff --git a/README.md b/README.md
index ccb136f0..a546cc1b 100644
--- a/README.md
+++ b/README.md
@@ -9,11 +9,11 @@
[![GitHub Actions Linting Status](https://github.com/nf-core/smrnaseq/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/smrnaseq/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/smrnaseq/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.10696391?labelColor=000000)](https://doi.org/10.5281/zenodo.10696391)
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
-[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
+[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
-[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/nf-core/smrnaseq)
+[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/smrnaseq)
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23smrnaseq-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/smrnaseq)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
@@ -78,7 +78,15 @@ You can find numerous talks on the nf-core events page from various topics inclu
> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
-First, prepare a samplesheet with your input data that looks as follows:
+You can test the pipeline as follows:
+
+```bash
+nextflow run nf-core/smrnaseq \
+ -profile test,docker \
+ --outdir
+```
+
+In order to use the pipeline with your own data, first prepare a samplesheet with your input data that looks as follows:
`samplesheet.csv`:
@@ -100,17 +108,18 @@ Now, you can run the pipeline using:
```bash
nextflow run nf-core/smrnaseq \
- -profile \
+ -profile , \
--input samplesheet.csv \
--genome 'GRCh37' \
--mirtrace_species 'hsa' \
- --protocol 'illumina' \
--outdir
```
+> [!IMPORTANT]
+> Remember to add a protocol as an additional profile (such as `illumina`, `nexttflex`, `qiaseq` or `cats`) when running with your own data. If no protocol is indicated via -profile, the pipeline will likely fail. Alternatively, if needed to run a custom protocol, parameters must be set manually, and auto-detect feature is available. See [usage documentation](https://nf-co.re/smrnaseq/usage) for more details about these profiles.
+
> [!WARNING]
-> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
-> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
+> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/smrnaseq/usage) and the [parameter documentation](https://nf-co.re/smrnaseq/parameters).
@@ -124,9 +133,14 @@ For more details about the output files and reports, please refer to the
nf-core/smrnaseq was originally written by P. Ewels, C. Wang, R. Hammarén, L. Pantano, A. Peltzer.
+Lorena Pantano ([@lpantano](https://github.com/lpantano)) from MIT updated the pipeline to Nextflow DSL2.
+
We thank the following people for their extensive assistance in the development of this pipeline:
-Lorena Pantano ([@lpantano](https://github.com/lpantano)) from MIT updated the pipeline to Nextflow DSL2.
+- [@atrigila] Anabella Trigila
+- [@nschcolnicov] Nicolás Alejandro Schcolnicov
+- [@christopher-mohr] Christopher Mohr
+- [@grst] Gregor Sturm
## Contributions and Support
diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml
index 9264f6fa..55973984 100644
--- a/assets/multiqc_config.yml
+++ b/assets/multiqc_config.yml
@@ -1,8 +1,7 @@
report_comment: >
- This report has been generated by the nf-core/smrnaseq
+ This report has been generated by the nf-core/smrnaseq
analysis pipeline. For information about how to interpret these results, please see the
- documentation.
-
+ documentation.
report_section_order:
"nf-core-smrnaseq-methods-description":
order: -1000
@@ -31,3 +30,6 @@ module_order:
info: "This section of the report shows FastQC results after UMI-based deduplication."
path_filters:
- "**/*.deduplicated_fastqc.zip"
+sp:
+ mirtop:
+ fn: mirtop_stats.log
diff --git a/assets/schema_input.json b/assets/schema_input.json
index 892b1996..b5face8a 100644
--- a/assets/schema_input.json
+++ b/assets/schema_input.json
@@ -1,5 +1,5 @@
{
- "$schema": "http://json-schema.org/draft-07/schema",
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-core/smrnaseq/master/assets/schema_input.json",
"title": "nf-core/smrnaseq pipeline - params.input schema",
"description": "Schema for the file provided with params.input",
diff --git a/bin/collapse_mirtop.r b/bin/collapse_mirtop.r
index 6b5c77f1..49c95832 100755
--- a/bin/collapse_mirtop.r
+++ b/bin/collapse_mirtop.r
@@ -1,4 +1,7 @@
#!/usr/bin/env Rscript
+
+# Written by Lorena Pantano and released under the MIT license. See LICENSE https://github.com/nf-core/smrnaseq/blob/master/LICENSE for details.
+
library(data.table)
# Command line arguments
args = commandArgs(trailingOnly=TRUE)
diff --git a/bin/edgeR_miRBase.r b/bin/edgeR_miRBase.r
index 5be691fc..5c561437 100755
--- a/bin/edgeR_miRBase.r
+++ b/bin/edgeR_miRBase.r
@@ -1,5 +1,8 @@
#!/usr/bin/env Rscript
+# Originally written by Phil Ewels and Chuan Wang and released under the MIT license.
+# Contributions by Alexander Peltzer, Anabella Trigila, James Fellows Yates, Sarah Djebali, Kevin Menden, Konrad Stawinski and Lorena Pantano also released under the MIT license. See LICENSE https://github.com/nf-core/smrnaseq/blob/master/LICENSE for details.
+
# Command line arguments
args = commandArgs(trailingOnly=TRUE)
@@ -79,7 +82,7 @@ for (i in 1:2) {
}
# Make MDS plot (only perform with 3 or more samples)
- if (length(filelist[[1]]) > 2){
+ if (ncol(dataNorm$counts) > 2){
pdf(paste(header,"_edgeR_MDS_plot.pdf",sep=""))
MDSdata <- plotMDS(dataNorm)
dev.off()
@@ -111,6 +114,8 @@ for (i in 1:2) {
# Write clustered distance values to file
write.table(hmap$carpet, paste(header,"_log2CPM_sample_distances.txt",sep=""), quote=FALSE, sep="\t")
+ } else {
+ warning("Not enough samples to create an MDS plot. At least 3 samples are required.")
}
}
diff --git a/conf/base.config b/conf/base.config
index 544ed42d..7d3a72eb 100644
--- a/conf/base.config
+++ b/conf/base.config
@@ -10,9 +10,9 @@
process {
- cpus = { check_max( 1 * task.attempt, 'cpus' ) }
- memory = { check_max( 6.GB * task.attempt, 'memory' ) }
- time = { check_max( 4.h * task.attempt, 'time' ) }
+ cpus = { 1 * task.attempt }
+ memory = { 6.GB * task.attempt }
+ time = { 4.h * task.attempt }
errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
@@ -24,30 +24,30 @@ process {
// adding in your processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
- cpus = { check_max( 1 , 'cpus' ) }
- memory = { check_max( 6.GB * task.attempt, 'memory' ) }
- time = { check_max( 4.h * task.attempt, 'time' ) }
+ cpus = { 1 }
+ memory = { 6.GB * task.attempt }
+ time = { 4.h * task.attempt }
}
withLabel:process_low {
- cpus = { check_max( 2 * task.attempt, 'cpus' ) }
- memory = { check_max( 12.GB * task.attempt, 'memory' ) }
- time = { check_max( 6.h * task.attempt, 'time' ) }
+ cpus = { 2 * task.attempt }
+ memory = { 12.GB * task.attempt }
+ time = { 4.h * task.attempt }
}
withLabel:process_medium {
- cpus = { check_max( 6 * task.attempt, 'cpus' ) }
- memory = { check_max( 36.GB * task.attempt, 'memory' ) }
- time = { check_max( 8.h * task.attempt, 'time' ) }
+ cpus = { 6 * task.attempt }
+ memory = { 36.GB * task.attempt }
+ time = { 8.h * task.attempt }
}
withLabel:process_high {
- cpus = { check_max( 12 * task.attempt, 'cpus' ) }
- memory = { check_max( 72.GB * task.attempt, 'memory' ) }
- time = { check_max( 10.h * task.attempt, 'time' ) }
+ cpus = { 12 * task.attempt }
+ memory = { 72.GB * task.attempt }
+ time = { 16.h * task.attempt }
}
withLabel:process_long {
- time = { check_max( 20.h * task.attempt, 'time' ) }
+ time = { 20.h * task.attempt }
}
withLabel:process_high_memory {
- memory = { check_max( 200.GB * task.attempt, 'memory' ) }
+ memory = { 200.GB * task.attempt }
}
withLabel:error_ignore {
errorStrategy = 'ignore'
@@ -56,7 +56,4 @@ process {
errorStrategy = 'retry'
maxRetries = 2
}
- withName:CUSTOM_DUMPSOFTWAREVERSIONS {
- cache = false
- }
}
diff --git a/conf/ci.config b/conf/ci.config
new file mode 100644
index 00000000..0c63ba51
--- /dev/null
+++ b/conf/ci.config
@@ -0,0 +1,13 @@
+// CI max resource settings
+process {
+ withLabel:'.*' {
+ cpus = 2
+ memory = 6.GB
+ time = 6.h
+ }
+ withLabel:process_single {
+ cpus = 2
+ memory = 6.GB
+ time = 6.h
+ }
+}
diff --git a/conf/igenomes_ignored.config b/conf/igenomes_ignored.config
new file mode 100644
index 00000000..b4034d82
--- /dev/null
+++ b/conf/igenomes_ignored.config
@@ -0,0 +1,9 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Nextflow config file for iGenomes paths
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Empty genomes dictionary to use when igenomes is ignored.
+----------------------------------------------------------------------------------------
+*/
+
+params.genomes = [:]
diff --git a/conf/modules.config b/conf/modules.config
index e67745fe..2abe9be6 100644
--- a/conf/modules.config
+++ b/conf/modules.config
@@ -42,6 +42,14 @@ process {
]
}
+ withName: '.*:PREPARE_GENOME:UNTAR_BOWTIE_INDEX' {
+ publishDir = [
+ mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
+ saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+ ]
+ }
+
//
// FASTQ_FASTQC_UMITOOLS_FASTP
//
@@ -49,7 +57,6 @@ process {
ext.args = [ "",
params.trim_fastq ? "" : "--disable_adapter_trimming",
params.clip_r1 > 0 ? "--trim_front1 ${params.clip_r1}" : "", // Remove bp from the 5' end of read 1.
- params.three_prime_clip_r1 > 0 ? "--trim_tail1 ${params.three_prime_clip_r1}" : "", // Remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed.
params.fastp_min_length > 0 ? "-l ${params.fastp_min_length}" : "",
params.fastp_max_length > 0 ? "--max_len1 ${params.fastp_max_length}" : "",
params.three_prime_adapter == "auto-detect" ? "" : "--adapter_sequence ${params.three_prime_adapter}"
@@ -70,6 +77,37 @@ process {
mode: params.publish_dir_mode,
pattern: "*.fail.fastq.gz",
enabled: params.save_trimmed_fail
+ ],
+ [
+ path: { "${params.outdir}/fastp/fastq" },
+ mode: params.publish_dir_mode,
+ pattern: "*.fastp.fastq.gz",
+ enabled: params.save_merged
+ ]
+ ]
+ }
+ //
+ // FASTQ_FASTQC_UMITOOLS_FASTP
+ //
+ withName: '.*:FASTP3' {
+ ext.prefix = { "${meta.id}.fastp3" }
+ ext.args = [ "",
+ "--disable_adapter_trimming",
+ "--disable_quality_filtering",
+ params.three_prime_clip_r1 > 0 ? "--trim_tail1 ${params.three_prime_clip_r1}" : "", // Remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed.
+ params.fastp_min_length > 0 ? "-l ${params.fastp_min_length}" : "",
+ params.fastp_max_length > 0 ? "--max_len1 ${params.fastp_max_length}" : "",
+ ].join(" ").trim()
+ publishDir = [
+ [
+ path: { "${params.outdir}/fastp/on_raw" },
+ mode: params.publish_dir_mode,
+ pattern: "*.{json,html}"
+ ],
+ [
+ path: { "${params.outdir}/fastp/on_raw/log" },
+ mode: params.publish_dir_mode,
+ pattern: "*.log"
]
]
}
@@ -80,6 +118,7 @@ process {
publishDir = [
path: { "${params.outdir}/fastqc/raw" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -89,6 +128,7 @@ process {
publishDir = [
path: { "${params.outdir}/fastqc/trimmed" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -131,6 +171,18 @@ process {
publishDir = [
path: { "${params.outdir}/bowtie_index/genome" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
+ saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+ ]
+ }
+
+ withName: 'CLEAN_FASTA' {
+ ext.args = "-c fastx '{gsub(/[^ATGCatgc]/, \"N\", \$seq); sub(/ .*/, \"\", \$name); print \">\"\$name\"\\n\"\$seq}'"
+ ext.prefix = {"${meta.id}_clean.fa"}
+ publishDir = [
+ path: { "${params.outdir}/bowtie_index/genome" },
+ mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -140,7 +192,7 @@ process {
//
withName: '.*:UMICOLLAPSE_FASTQ' {
- ext.args = { meta.single_end ? "--algo ${params.umitools_method} --two-pass" : "--method ${params.umitools_method} --two-pass --paired --remove-unpaired --remove-chimeric" }
+ ext.args = { meta.single_end ? "--algo ${params.umitools_method} --two-pass" : "--algo ${params.umitools_method} --two-pass --paired --remove-unpaired --remove-chimeric" }
ext.prefix = { "${meta.id}.umi_dedup.sorted" }
publishDir = [
path: { "${params.outdir}/umi_dedup/bam_deduplicated" },
@@ -175,10 +227,9 @@ process {
//
// MIRTRACE QC
//
- withName: 'MIRTRACE_RUN' {
+ withName: 'MIRTRACE_QC' {
publishDir = [
- //"mirtrace" already part of the published folder
- path: { "${params.outdir}" },
+ path: { "${params.outdir}/mirtrace/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
@@ -191,6 +242,81 @@ process {
publishDir = [
path: { "${params.outdir}/contaminant_filter/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
+ enabled: false,
+ saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+ ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:BLAT.*' {
+ ext.args = '-out=blast8'
+ ext.prefix = {"${meta.id}_${meta2.id}"}
+ tag = {"${meta.id} ${meta2.id}"}
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:GAWK.*' {
+ ext.prefix = {"significant_hits_${meta.id}"}
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:SEQKIT_GREP.*' {
+ ext.prefix = {"filtered_${meta.id}"}
+ ext.args = '-v'
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:BOWTIE2_ALIGN.*' {
+ ext.args = '--very-sensitive-local -k 1'
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:STATS_GAWK_RRNA' {
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ ext.suffix = "stats"
+ ext.args2 = '\'BEGIN {tot=0} {if(NR==4 || NR==5){tot+=\$1}} END {print "\\"' + "rRNA" + '\\": " tot}\''
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:STATS_GAWK_TRNA' {
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ ext.suffix = "stats"
+ ext.args2 = '\'BEGIN {tot=0} {if(NR==4 || NR==5){tot+=\$1}} END {print "\\"' + "tRNA" + '\\": " tot}\''
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:STATS_GAWK_CDNA' {
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ ext.suffix = "stats"
+ ext.args2 = '\'BEGIN {tot=0} {if(NR==4 || NR==5){tot+=\$1}} END {print "\\"' + "cDNA" + '\\": " tot}\''
+ publishDir = [ enabled: false ]
+ }
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:STATS_GAWK_NCRNA' {
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ ext.suffix = "stats"
+ ext.args2 = '\'BEGIN {tot=0} {if(NR==4 || NR==5){tot+=\$1}} END {print "\\"' + "ncRNA" + '\\": " tot}\''
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:STATS_GAWK_PIRNA' {
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ ext.suffix = "stats"
+ ext.args2 = '\'BEGIN {tot=0} {if(NR==4 || NR==5){tot+=\$1}} END {print "\\"' + "piRNA" + '\\": " tot}\''
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:STATS_GAWK_OTHER' {
+ ext.prefix = {"${meta.contaminant}_${meta.id}"}
+ ext.suffix = "stats"
+ ext.args2 = '\'BEGIN {tot=0} {if(NR==4 || NR==5){tot+=\$1}} END {print "\\"' + "other" + '\\": " tot}\''
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:CONTAMINANT_FILTER:FILTER_STATS' {
+ publishDir = [
+ path: { "${params.outdir}/contaminant_filter/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+ mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -202,6 +328,7 @@ process {
publishDir = [
path: { "${params.outdir}/mirna_quant/reference" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -209,6 +336,7 @@ process {
publishDir = [
path: { "${params.outdir}/mirna_quant/reference" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -216,6 +344,7 @@ process {
publishDir = [
path: { "${params.outdir}/bowtie_index/mirna_mature" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -223,15 +352,24 @@ process {
publishDir = [
path: { "${params.outdir}/bowtie_index/mirna_hairpin" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BOWTIE_MAP_MATURE' {
+ ext.args = [ "",
+ "-t",
+ "-k 50",
+ "--best",
+ "--strata",
+ "-e 99999",
+ "--chunkmbs 2048",
+ ].join(" ").trim()
publishDir = [
path: { "${params.outdir}/mirna_quant/bam/mature" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
- enabled: params.save_aligned_mirna_quant
+ enabled: params.save_intermediates
]
}
withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MATURE:.*' {
@@ -239,15 +377,24 @@ process {
publishDir = [
path: { "${params.outdir}/mirna_quant/bam/mature" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BOWTIE_MAP_HAIRPIN' {
+ ext.args = [ "",
+ "-t",
+ "-k 50",
+ "--best",
+ "--strata",
+ "-e 99999",
+ "--chunkmbs 2048",
+ ].join(" ").trim()
publishDir = [
path: { "${params.outdir}/mirna_quant/bam/hairpin" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
- enabled: params.save_aligned_mirna_quant
+ enabled: params.save_intermediates
]
}
withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_HAIRPIN:.*' {
@@ -255,6 +402,7 @@ process {
publishDir = [
path: { "${params.outdir}/mirna_quant/bam/hairpin" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
@@ -265,30 +413,73 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
- withName: 'SEQCLUSTER_SEQUENCES' {
+ withName: 'SEQCLUSTER_COLLAPSE' {
publishDir = [
path: { "${params.outdir}/mirna_quant/seqcluster" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
+ ext.args = "-m 1 --min_size 15"
+ ext.prefix = {"${meta.id}_seqcluster"}
}
withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BOWTIE_MAP_SEQCLUSTER' {
+ ext.args = [ "",
+ "-t",
+ "-k 50",
+ "--best",
+ "--strata",
+ "-e 99999",
+ "--chunkmbs 2048",
+ ].join(" ").trim()
publishDir = [
path: { "${params.outdir}/mirna_quant/bam/seqcluster" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
- enabled: params.save_aligned_mirna_quant
+ enabled: params.save_intermediates
]
}
- withName: 'MIRTOP_QUANT' {
+
+
+ // Mirtop
+
+ withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:.*' {
publishDir = [
- //mirtop already part of the output folder
- path: { "${params.outdir}/mirna_quant/" },
+ path: { "${params.outdir}/mirna_quant/mirtop" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
- withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:TABLE_MERGE' {
+
+ withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_COUNTS' {
+ ext.args = '--add-extra'
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS' {
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_GFF' {
+ publishDir = [
+ path: { "${params.outdir}/mirna_quant/mirtop/gff" },
+ mode: params.publish_dir_mode,
+ saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+ ]
+ }
+
+ withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:CSVTK_JOIN' {
+ ext.args = "--fields 'UID,Read,miRNA,Variant,iso_5p,iso_3p,iso_add3p,iso_snp,iso_5p_nt,iso_3p_nt,iso_add3p_nt,iso_snp_nt' --tabs --outer-join --na \"0\" --out-delimiter \"\t\""
+ ext.prefix = "joined_samples_mirtop"
+ publishDir = [
+ path: { "${params.outdir}/mirna_quant/mirtop" },
+ mode: params.publish_dir_mode,
+ saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+ ]
+ }
+
+
+ withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:DATATABLE_MERGE' {
publishDir = [
path: { "${params.outdir}/mirna_quant/mirtop" },
mode: params.publish_dir_mode,
@@ -297,6 +488,7 @@ process {
}
+
//
// GENOME_QUANT
//
@@ -305,9 +497,18 @@ process {
publishDir = [
path: { "${params.outdir}/genome_quant/bam" },
mode: params.publish_dir_mode,
+ enabled: params.save_intermediates,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
+
+ withName: 'SAMTOOLS_INDEX' {
+ ext.args = '-c'
+ publishDir = [
+ enabled: params.save_intermediates,
+ ]
+ }
+
withName: 'NFCORE_SMRNASEQ:GENOME_QUANT:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.sorted" }
publishDir = [
@@ -317,11 +518,19 @@ process {
]
}
withName: 'NFCORE_SMRNASEQ:GENOME_QUANT:BOWTIE_MAP_GENOME' {
+ ext.args = [ "",
+ "-t",
+ "-k 50",
+ "--best",
+ "--strata",
+ "-e 99999",
+ "--chunkmbs 2048",
+ ].join(" ").trim()
publishDir = [
path: { "${params.outdir}/genome_quant/bam" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
- enabled: params.save_aligned
+ enabled: params.save_intermediates
]
}
@@ -329,31 +538,30 @@ process {
//
// MIRDEEP
//
- withName: 'NFCORE_SMRNASEQ:MIRDEEP2:MIRDEEP2_MAPPER' {
- publishDir = [
- path: { "${params.outdir}/mirdeep2/mapper" },
- mode: params.publish_dir_mode,
- saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
- ]
+
+ withName: 'MIRDEEP2_MAPPER' {
+ ext.args = "-c -j -m -v"
+ publishDir = [ enabled: false ]
}
- withName: 'NFCORE_SMRNASEQ:MIRDEEP2:MIRDEEP2_RUN' {
- publishDir = [
- path: { "${params.outdir}/mirdeep2/run" },
- mode: params.publish_dir_mode,
- saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
- ]
+
+ withName: 'SEQKIT_REPLACE' {
+ ext.args = '-p "\\s+|\\." -w 0'
+ ext.suffix = "fasta"
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'SEQKIT_FQ2FA' {
+ publishDir = [ enabled: false ]
+ }
+
+ withName: 'MIRDEEP2_MIRDEEP2' {
+ errorStrategy = { task.exitStatus in (255) ? 'ignore' : '' }
}
//
// reports
//
- withName: 'CUSTOM_DUMPSOFTWAREVERSIONS' {
- publishDir = [
- path: { "${params.outdir}/pipeline_info" },
- mode: params.publish_dir_mode,
- pattern: '*_versions.yml'
- ]
- }
+
withName: 'MULTIQC' {
ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
publishDir = [
diff --git a/conf/protocol_cats.config b/conf/protocol_cats.config
new file mode 100644
index 00000000..c7e38014
--- /dev/null
+++ b/conf/protocol_cats.config
@@ -0,0 +1,6 @@
+//This profile handles CATs miRNA defaults. Include it as an additional profile to set certain pipeline parameters appropriately.
+params{
+ clip_r1 = 3
+ three_prime_clip_r1 = 0
+ three_prime_adapter = "AAAAAAAA"
+}
diff --git a/conf/protocol_illumina.config b/conf/protocol_illumina.config
new file mode 100644
index 00000000..d86e4e3f
--- /dev/null
+++ b/conf/protocol_illumina.config
@@ -0,0 +1,6 @@
+//This profile handles Illumina miRNA defaults. Include it as an additional profile to set certain pipeline parameters appropriately.
+params{
+ clip_r1 = 0
+ three_prime_clip_r1 = 0
+ three_prime_adapter = "TGGAATTCTCGGGTGCCAAGG"
+}
diff --git a/conf/protocol_nextflex.config b/conf/protocol_nextflex.config
new file mode 100644
index 00000000..7992a38f
--- /dev/null
+++ b/conf/protocol_nextflex.config
@@ -0,0 +1,6 @@
+//This profile handles Nextflex miRNA defaults. Include it as an additional profile to set certain pipeline parameters appropriately.
+params{
+ clip_r1 = 4
+ three_prime_clip_r1 = 4
+ three_prime_adapter = "TGGAATTCTCGGGTGCCAAGG"
+}
diff --git a/conf/protocol_qiaseq.config b/conf/protocol_qiaseq.config
new file mode 100644
index 00000000..da59ac1a
--- /dev/null
+++ b/conf/protocol_qiaseq.config
@@ -0,0 +1,6 @@
+//This profile handles QIASEQ miRNA defaults. Include it as an additional profile to set certain pipeline parameters appropriately.
+params{
+ clip_r1 = 0
+ three_prime_clip_r1 = 0
+ three_prime_adapter = "AACTGTAGGCACCATCAAT"
+}
diff --git a/conf/test.config b/conf/test.config
index a56b2e96..b3952ad0 100644
--- a/conf/test.config
+++ b/conf/test.config
@@ -10,25 +10,30 @@
----------------------------------------------------------------------------------------
*/
+process {
+ resourceLimits = [
+ cpus: 4,
+ memory: '15.GB',
+ time: '1.h'
+ ]
+}
+
params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
- // Limit resources so that this can run on GitHub Actions
- max_cpus = 2
- max_memory = '6.GB'
- max_time = '6.h'
-
// Input data
input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet.csv'
fasta = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa'
+ bowtie_index = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/bowtie_index.tar.gz'
mirtrace_species = 'hsa'
- protocol = 'illumina'
skip_mirdeep = true
save_merged = false
- save_aligned_mirna_quant = false
- cleanup = true //Otherwise tests dont run through properly.
}
+
+// Include illumina config to run test without additional profiles
+
+includeConfig 'protocol_illumina.config'
diff --git a/conf/test_contamination.config b/conf/test_contamination.config
new file mode 100644
index 00000000..266c288c
--- /dev/null
+++ b/conf/test_contamination.config
@@ -0,0 +1,35 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Defines input files and everything required to run a fast and simple pipeline test.
+
+ Use as follows:
+ nextflow run nf-core/smrnaseq -profile test_contamination, --outdir
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+ config_profile_name = 'Test profile'
+ config_profile_description = 'Minimal test dataset to check pipeline function with contamination filter'
+
+ // Input data
+
+ input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet.csv'
+ fasta = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa'
+
+ mirtrace_species = 'hsa'
+ skip_mirdeep = true
+ save_merged = false
+
+
+ filter_contamination = true
+ cdna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/Homo_sapiens.GRCh37.cdna.all.fa"
+ ncrna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/Homo_sapiens.GRCh37.ncrna.fa"
+ trna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/hg19-tRNAs.fa"
+}
+
+// Include illumina config to run test without additional profiles
+
+includeConfig 'protocol_illumina.config'
diff --git a/conf/test_contamination_tech_reps.config b/conf/test_contamination_tech_reps.config
new file mode 100644
index 00000000..86f1dbfa
--- /dev/null
+++ b/conf/test_contamination_tech_reps.config
@@ -0,0 +1,36 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Defines input files and everything required to run a fast and simple pipeline test.
+
+ Use as follows:
+ nextflow run nf-core/smrnaseq -profile test_contamination_tech_reps, --outdir
+
+----------------------------------------------------------------------------------------
+*/
+// Test covers techincal_repeats, skip_fastqc, filter_contamination and running without genome.
+
+params {
+ config_profile_name = 'Test technical repeats profile'
+ config_profile_description = 'Minimal test dataset to check pipeline function'
+
+ // Input data
+ input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_technical_repeats_short.csv'
+
+ mirtrace_species = 'hsa'
+ save_intermediates = true
+
+ skip_multiqc = true
+ skip_mirdeep = true
+ skip_fastqc = true
+
+ filter_contamination = true
+ cdna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/Homo_sapiens.GRCh37.cdna.all.fa"
+ ncrna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/Homo_sapiens.GRCh37.ncrna.fa"
+ trna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/hg19-tRNAs.fa"
+}
+
+// Include illumina config to run test without additional profiles
+
+includeConfig 'protocol_illumina.config'
diff --git a/conf/test_full.config b/conf/test_full.config
index 964dc5b2..cc5ecd92 100644
--- a/conf/test_full.config
+++ b/conf/test_full.config
@@ -18,7 +18,6 @@ params {
input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet-full.csv'
genome = 'GRCh37'
mirtrace_species = 'hsa'
- protocol = 'illumina'
}
diff --git a/conf/test_full_filter_contamination.config b/conf/test_full_filter_contamination.config
new file mode 100644
index 00000000..7d8da991
--- /dev/null
+++ b/conf/test_full_filter_contamination.config
@@ -0,0 +1,30 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Nextflow config file for running full-size tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Defines input files and everything required to run a full size pipeline test.
+
+ Use as follows:
+ nextflow run nf-core/smrnaseq -profile test_full_filter_contamination, --outdir
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+ config_profile_name = 'Full test profile'
+ config_profile_description = 'Full test dataset to check pipeline function with filter contamination feature'
+
+ // Input data for full size test
+ genome = 'GRCh37'
+ input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet-full.csv'
+ mirtrace_species = 'hsa'
+ three_prime_adapter = 'auto-detect'
+ filter_contamination = true
+ cdna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/Homo_sapiens.GRCh37.cdna.all.fa"
+ ncrna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/Homo_sapiens.GRCh37.ncrna.fa"
+ trna = "https://huggingface.co/datasets/nf-core/smrnaseq/resolve/main/GRCh37/hg19-tRNAs.fa"
+}
+
+includeConfig 'protocol_qiaseq.config'
+
+
diff --git a/conf/test_mirgenedb.config b/conf/test_mirgenedb.config
new file mode 100644
index 00000000..097c73fe
--- /dev/null
+++ b/conf/test_mirgenedb.config
@@ -0,0 +1,35 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Defines input files and everything required to run a fast and simple pipeline test.
+
+ Use as follows:
+ nextflow run nf-core/smrnaseq -profile test_mirgenedb, --outdir
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+ config_profile_name = 'Test profile with mirgeneDB inputs and run mirdeep2'
+ config_profile_description = 'Minimal test dataset to check pipeline function with mirgeneDB inputs and run mirdeep2'
+
+ // Input data
+ input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_test_short.csv'
+ fasta = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa'
+
+ mirgenedb = true
+
+ mirgenedb_mature = "https://github.com/nf-core/test-datasets/raw/smrnaseq/MirGeneDB/mirgenedb_hsa_mature.fa"
+ mirgenedb_hairpin = "https://github.com/nf-core/test-datasets/raw/smrnaseq/MirGeneDB/mirgenedb_hsa_hairpin.fa"
+ mirgenedb_gff = "https://github.com/nf-core/test-datasets/raw/smrnaseq/MirGeneDB/mirgenedb_hsa.gff"
+ mirgenedb_species = "Hsa"
+
+ skip_mirdeep = false
+ save_intermediates = true
+
+}
+
+// Include illumina config to run test without additional profiles
+
+includeConfig 'protocol_illumina.config'
diff --git a/conf/test_nextflex.config b/conf/test_nextflex.config
new file mode 100644
index 00000000..93d817e2
--- /dev/null
+++ b/conf/test_nextflex.config
@@ -0,0 +1,33 @@
+/*
+========================================================================================
+ Nextflow config file for running minimal tests
+========================================================================================
+ Defines input files and everything required to run a fast and simple pipeline test.
+
+ Use as follows:
+ nextflow run nf-core/smrnaseq -profile test_nextflex,
+
+----------------------------------------------------------------------------------------
+*/
+// This test profile tests nextflex without genome and paired end sample handling
+
+params {
+ config_profile_name = 'Nextflex Test profile'
+ config_profile_description = 'Minimal test dataset to check pipeline function'
+
+ // Input data
+ input = 'https://raw.githubusercontent.com/nf-core/test-datasets/smrnaseq/samplesheet/v2.0/samplesheet_test_nextflex.csv'
+ mature = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/mature.fa'
+ hairpin = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hairpin.fa'
+ mirna_gtf = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hsa.gff3'
+ mirtrace_species = 'hsa'
+
+ skip_mirdeep = true
+ save_intermediates = true
+ //skip_fastp // this profile should not be used with skip_fastq to allow for testing paired end sample handling
+
+}
+
+// Include nextflex config to run test without additional profiles
+
+includeConfig 'protocol_nextflex.config'
diff --git a/conf/test_no_genome.config b/conf/test_no_genome.config
deleted file mode 100644
index aae8ce91..00000000
--- a/conf/test_no_genome.config
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
-========================================================================================
- Nextflow config file for running minimal tests
-========================================================================================
- Defines input files and everything required to run a fast and simple pipeline test.
-
- Use as follows:
- nextflow run nf-core/smrnaseq -profile test,
-
-----------------------------------------------------------------------------------------
-*/
-
-params {
- config_profile_name = 'Test profile'
- config_profile_description = 'Minimal test dataset to check pipeline function'
-
- // Limit resources so that this can run on GitHub Actions
- max_cpus = 2
- max_memory = '6.GB'
- max_time = '6.h'
-
- // Input data
- input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet.csv'
- mature = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/mature.fa'
- hairpin = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hairpin.fa'
- mirna_gtf = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hsa.gff3'
- mirtrace_species = 'hsa'
- skip_mirdeep = true
- protocol = 'illumina'
-
-}
diff --git a/conf/test_index.config b/conf/test_skipfastp.config
similarity index 60%
rename from conf/test_index.config
rename to conf/test_skipfastp.config
index bb9f4707..de786332 100644
--- a/conf/test_index.config
+++ b/conf/test_skipfastp.config
@@ -5,31 +5,24 @@
Defines input files and everything required to run a fast and simple pipeline test.
Use as follows:
- nextflow run nf-core/smrnaseq -profile test_index, --outdir
+ nextflow run nf-core/smrnaseq -profile test_skipfastp, --outdir
----------------------------------------------------------------------------------------
*/
+// Test covers running with genome, index and skipfastp
params {
- config_profile_name = 'Test index profile'
- config_profile_description = 'Minimal test dataset to check pipeline function with bowtie index'
-
- // Limit resources so that this can run on GitHub Actions
- max_cpus = 2
- max_memory = '6.GB'
- max_time = '6.h'
+ config_profile_name = 'Test profile'
+ config_profile_description = 'Minimal test dataset to check pipeline function skipping trimming'
// Input data
- input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet.csv'
+ input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_skipfastp.csv'
fasta = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa'
bowtie_index = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/bowtie_index.tar.gz'
mirtrace_species = 'hsa'
- protocol = 'illumina'
skip_mirdeep = true
- save_merged = false
- save_aligned_mirna_quant = false
-
- cleanup = true //Otherwise tests dont run through properly.
+ skip_fastp = true
+ save_intermediates = true
}
diff --git a/conf/test_technical_repeats.config b/conf/test_technical_repeats.config
new file mode 100644
index 00000000..2c969ddd
--- /dev/null
+++ b/conf/test_technical_repeats.config
@@ -0,0 +1,28 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Defines input files and everything required to run a fast and simple pipeline test.
+
+ Use as follows:
+ nextflow run nf-core/smrnaseq -profile test_technical_repeats, --outdir
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+ config_profile_name = 'Test technical repeats profile'
+ config_profile_description = 'Minimal test dataset to check pipeline function'
+
+ // Input data
+
+ input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_technical_repeats.csv'
+ fasta = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa'
+
+ mirtrace_species = 'hsa'
+ skip_mirdeep = true
+ save_intermediates = true
+
+ skip_fastqc = true
+ skip_multiqc = true
+}
diff --git a/conf/test_umi.config b/conf/test_umi.config
index c7d0db15..5945efcf 100644
--- a/conf/test_umi.config
+++ b/conf/test_umi.config
@@ -14,23 +14,23 @@ params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
- // Limit resources so that this can run on GitHub Actions
- max_cpus = 2
- max_memory = '6.GB'
- max_time = '6.h'
-
// Input data
input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_umi.csv'
fasta = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa'
+ bowtie_index = 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/bowtie_index.tar.gz'
mirtrace_species = 'hsa'
- protocol = 'illumina'
skip_mirdeep = true
//UMI Specific testcase
with_umi = true
- umitools_extract_method = 'regex'
- umitools_bc_pattern = '.+(?PAACTGTAGGCACCATCAAT){s<=2}(?P.{12})(?P.*)'
- save_umi_intermeds = true
+ umitools_extract_method = 'regex'
+ umitools_bc_pattern = '.+(?PAACTGTAGGCACCATCAAT){s<=2}(?P.{12})(?P.*)'
+ save_umi_intermeds = true
+ save_intermediates = true
}
+
+// Include illumina config to run test without additional profiles
+
+includeConfig 'protocol_illumina.config'
diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png
deleted file mode 100755
index 361d0e47..00000000
Binary files a/docs/images/mqc_fastqc_adapter.png and /dev/null differ
diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png
deleted file mode 100755
index cb39ebb8..00000000
Binary files a/docs/images/mqc_fastqc_counts.png and /dev/null differ
diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png
deleted file mode 100755
index a4b89bf5..00000000
Binary files a/docs/images/mqc_fastqc_quality.png and /dev/null differ
diff --git a/docs/output.md b/docs/output.md
index 10d3e677..39ee17a3 100644
--- a/docs/output.md
+++ b/docs/output.md
@@ -6,12 +6,13 @@
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
-The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
+The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level `/results` directory.
## Pipeline overview
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
+- [Preprocessing](#preprocessing) - Preprocessing of reference files
- [FastQC](#fastqc) - read quality control
- [UMI-tools extract](#umi-tools-extract) - UMI barcode extraction
- [UMI-collapse deduplicate](#umicollapse-deduplicate) - read deduplication
@@ -27,6 +28,20 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
+If `--save_intermediates` is specified, intermediate files generated by each process will be saved in the output directory.
+
+## Preprocessing
+
+
+Output files
+
+- `bowtie_index/genome`: Cleaned genome.fa fasta.
+- `untar/bowtie_index`: Uncompressed bowtie index file.
+
+
+
+Preprocessing is done to format reference files before using them in the workflow, it includes [`untar`](https://www.gnu.org/software/tar/manual/) and [`bioawk`](https://github.com/lh3/bioawk). If the `bowtie_index` file provided is in gzip format it will be processed by `untar`. The fasta file provided will be cleaned using `bioawk`.
+
### FastQC
@@ -47,7 +62,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
Output files
-- `umitools/`
+- `umi_dedup/fastq_extracted_umi/`
- `*.fastq.gz`: If `--save_umi_intermeds` is specified, FastQ files **after** UMI extraction will be placed in this directory.
- `*.log`: Log file generated by the UMI-tools `extract` command.
@@ -59,13 +74,13 @@ To facilitate processing of input data which has the UMI barcode already embedde
## FastP
-[FastP](https://github.com/OpenGene/fastp) is used for removal of adapter contamination and trimming of low quality regions.
+[FastP](https://github.com/OpenGene/fastp) is used for removal of adapter contamination and trimming of low-quality regions.
MultiQC reports the percentage of bases removed by FastP in the _General Statistics_ table, along some further information on the results.
**Output directory: `results/fastp`**
-Contains FastQ files with quality and adapter trimmed reads for each sample, along with a log file describing the trimming.
+Contains FastQ files with quality and adapter-trimmed reads for each sample, along with a log file describing the trimming.
- `sample_fastp.json` - JSON report file with information on parameters and trimming metrics
- `sample_fastp.html` - HTML report with some visualizations of trimming metrics
@@ -77,8 +92,7 @@ FastP can automatically detect adapter sequences when not specified directly by
Output files
-- `umi_dedup/`
- - `*.log`: Results statistics files detailing the UMI deduplication results.
+- `umi_dedup/bam_deduplicated`
- `*.fastq.gz`: If `--save_umi_intermeds` is specified, the deduplicated fastq.gz files **after** UMI deduplication will be placed in this directory.
@@ -86,29 +100,38 @@ FastP can automatically detect adapter sequences when not specified directly by
## Bowtie2
-[Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) is used to align the reads to user-defined databases of contaminants.
+[Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) is used to align the reads to user-defined databases and to build indexes for `--filter_contaminant` files.
MultiQC reports the number of reads that were removed by each of the contaminant databases.
## Bowtie
-[Bowtie](http://bowtie-bio.sourceforge.net/index.shtml) is used for mapping adapter trimmed reads against the mature miRNAs and miRNA precursors (hairpins) of the chosen database [miRBase](http://www.mirbase.org/) or [MirGeneDB](https://mirgenedb.org/).
+[Bowtie](http://bowtie-bio.sourceforge.net/index.shtml) is used for building the index for the fasta genome, if needed. It is also used for mapping adapter trimmed reads against the mature miRNAs and miRNA precursors (hairpins) of the chosen database [miRBase](http://www.mirbase.org/) or [MirGeneDB](https://mirgenedb.org/).
+
+**Output directory: `results/`**
-**Output directory: `results/samtools`**
+- `bowtie_index/`
+ - `mirna_hairpin/bowtie`: hairpin.fa bowtie index files.
+ - `mirna_mature/bowtie`: mature.fa bowtie index files.
+- `genome_quant/`
+ - `genome_quant/bam/.*bam`: The aligned BAM file results.
+ - `genome_quant/bam/.*unmapped.fastq.gz`: Unmapped reads results.
+- `mirna_quant/`
-- `sample_mature.bam`: The aligned BAM file of alignment against mature miRNAs
-- `sample_mature_unmapped.fq.gz`: Unmapped reads against mature miRNAs _This file will be used as input for the alignment against miRNA precursors (hairpins)_
-- `sample_mature_hairpin.bam`: The aligned BAM file of alignment against miRNA precursors (hairpins) that didn't map to the mature
-- `sample_mature_hairpin_unmapped.fq.gz`: Unmapped reads against miRNA precursors (hairpins)
-- `sample_mature_hairpin_genome.bam`: The aligned BAM file of alignment against that didn't map to the precursor.
+ - `mirna_quant/bam/{hairpin,mature,seqcluster}/.*bam`: The aligned BAM file results against hairpin, mature or seqcluster.
+ - `mirna_quant/bam/{hairpin,mature,seqcluster}/.*unmapped.fastq.gz`: Unmapped reads for hairpin, mature or seqcluster.
+
+If `--save_intermediates` is specified, these files will be placed in this directory.
## SAMtools
[SAMtools](http://samtools.sourceforge.net/) is used for sorting and indexing the output BAM files from Bowtie. In addition, the numbers of features are counted with the `idxstats` option.
-**Output directory: `results/samtools/samtools_stats`**
+**Output directory: `results/{genome_quant,mirna_quant}/bam`**
+
+These files will be saved in this directory if `--save_intermediates` is specified. In any case, these stats will always be available in the MultiQC report.
-- `stats|idxstats|flagstat`: BAM stats for each of the files listed above.
+- `.*stats|.*idxstats|.*flagstat`: BAM stats for each of the files listed above.
![samtools](images/samtools_alignment_plot.png)
@@ -116,7 +139,7 @@ MultiQC reports the number of reads that were removed by each of the contaminant
[edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html) is an R package used for differential expression analysis of RNA-seq expression profiles.
-**Output directory: `results/edgeR`**
+**Output directory: `results/mirna_quant/edger_qc`**
- `[mature/hairpin]_normalized_CPM.txt` TMM normalized counts of reads aligned to mature miRNAs/miRNA precursors (hairpins)
- `[mature/hairpin]_edgeR_MDS_plot.pdf` Multidimensional scaling plot of all samples based on the expression profile of mature miRNAs/miRNA precursors (hairpins)
@@ -134,11 +157,11 @@ MultiQC reports the number of reads that were removed by each of the contaminant
[mirtop](https://github.com/miRTop/mirtop) is used to parse the BAM files from `bowtie` alignment, and produce a [mirgff3](https://github.com/miRTop/mirGFF3) file with information about miRNAs and isomirs.
-**Output directory: `results/mirtop`**
+**Output directory: `results/mirna_quant/mirtop`**
-- `mirtop.gff`: [mirgff3](https://github.com/miRTop/mirGFF3) file
-- `mirtop.tsv`: tabular file of the previous file for easy integration with downstream analysis.
-- `mirtop_rawData.tsv`: File compatible with [isomiRs](http://lpantano.github.io/isomiRs/reference/IsomirDataSeqFromMirtop.html) Bioconductor package to perform isomiRs analysis.
+- `gff/{sample.id}.gff`: [mirgff3](https://github.com/miRTop/mirGFF3) file
+- `joined_samples_mirtop.tsv`: a tabular version of the previous file for easy integration with downstream analysis.
+- `export/{sample.id}_mirtop_rawData.tsv`: File compatible with [isomiRs](http://lpantano.github.io/isomiRs/reference/IsomirDataSeqFromMirtop.html) Bioconductor package to perform isomiRs analysis.
- `mirna.tsv`: tabular file with miRNA counts after summarizing unique isomiRs for each miRNA
## miRDeep2
@@ -147,15 +170,15 @@ MultiQC reports the number of reads that were removed by each of the contaminant
**Output directory: `results/mirdeep2`**
-- `mirdeep/timestamp_sample.bed` File with the known and novel miRNAs in bed format.
-- `mirdeep/timestamp_sample.csv` File with an overview of all detected miRNAs (known and novel) in csv format.
-- `mirdeep/timestamp_sample.html` A HTML report with an overview of all detected miRNAs (known and novel) in html format.
+- `mirdeep2/result_{sample.id}.bed` File with the known and novel miRNAs in bed format.
+- `mirdeep2/result_{sample.id}.csv` File with an overview of all detected miRNAs (known and novel) in csv format.
+- `mirdeep2/result_{sample.id}.html` A HTML report with an overview of all detected miRNAs (known and novel) in html format.
## miRTrace
-[miRTrace](https://github.com/friedlanderlab/mirtrace) is a quality control specifically for small RNA sequencing data (smRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). By default, the pipeline sets the PHRED-offset to the most common +33, so if you need to adjust this, use the `params.phred_offset` option to include this accordingly for your FASTQ files.
+[miRTrace](https://github.com/friedlanderlab/mirtrace) is a quality control specifically for small RNA sequencing data (smRNA-Seq). Each sample is characterized by profiling sequencing quality, read length, sequencing depth and miRNA complexity and also the amounts of miRNAs versus undesirable sequences (derived from tRNAs, rRNAs and sequencing artifacts). By default, the pipeline sets the PHRED offset to the most common value of +33, so if you need to adjust this, use the `params.phred_offset` option to include this accordingly for your FASTQ files.
-**Output directory: `results/mirtrace`**
+**Output directory: `results/mirtrace/${sample.id}`**
- `mirtrace-report.html` An interactive HTML report summarizing all output statistics from miRTrace
- `mirtrace-results.json` A JSON file with all output statistics from miRTrace
@@ -163,7 +186,7 @@ MultiQC reports the number of reads that were removed by each of the contaminant
- `qc_passed_reads.all.collapsed` FASTA file per sample with sequence reads that passed QC in miRTrace
- `qc_passed_reads.rnatype_unknown.collapsed` FASTA file per sample with unknown reads in the RNA type analysis
-Refer to the [tool manual](https://github.com/friedlanderlab/mirtrace/blob/master/release-bundle-includes/manual.pdf) for detailed specifications about output files. Here is an example of the RNA types plot that you will see:
+The files for each sample can also be visualized into a single plot in the MultiQC report. Refer to the [tool manual](https://github.com/friedlanderlab/mirtrace/blob/master/release-bundle-includes/manual.pdf) for detailed specifications about output files. Here is an example of the RNA types plot that you will see:
![mirtrace](images/mirtrace_plot.png)
@@ -171,9 +194,8 @@ Refer to the [tool manual](https://github.com/friedlanderlab/mirtrace/blob/maste
![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
-:::note
-The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
-:::
+> [!NOTE]
+> The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
### MultiQC
@@ -191,6 +213,9 @@ The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They m
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see .
+> [!NOTE]
+> There may be a discrepancy in read counts number displayed in MultiQC between the original FASTQ and BAM files, this is due to secondary alignments being reported by the aligner, which can inflate the total read count number in the BAM files. [More info about this behavior can be found here](https://github.com/nf-core/smrnaseq/issues/94).
+
### Pipeline information
@@ -198,7 +223,7 @@ Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQ
- `pipeline_info/`
- Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.
- - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
+ - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameters are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
- Parameters used by the pipeline run: `params.json`.
diff --git a/docs/usage.md b/docs/usage.md
index 881bb2ff..46fbe6eb 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -10,38 +10,48 @@
This option indicates the experimental protocol used for the sample preparation. Currently supporting:
-- 'illumina': adapter (`TGGAATTCTCGGGTGCCAAGG`)
-- 'nextflex': adapter (`TGGAATTCTCGGGTGCCAAGG`), clip_r1 (`4`), three_prime_clip_r1 (`4`)
-- 'qiaseq': adapter (`AACTGTAGGCACCATCAAT`)
-- 'cats': adapter (`GATCGGAAGAGCACACGTCTG`), clip_r1(`3)
-- 'custom' (where the user can indicate the `three_prime_adapter`, `clip_r1` and `three_prime_clip_r1` manually)
+- 'illumina': three_prime_adapter (`TGGAATTCTCGGGTGCCAAGG`), clip_r1 (`0`), three_prime_clip_r1 (`0`)
+- 'nextflex': three_prime_adapter (`TGGAATTCTCGGGTGCCAAGG`), clip_r1 (`4`), three_prime_clip_r1 (`4`)
+- 'qiaseq': three_prime_adapter (`AACTGTAGGCACCATCAAT`), clip_r1 (`0`), three_prime_clip_r1 (`0`)
+- 'cats': three_prime_adapter (`AAAAAAAA`), clip_r1(`3`), three_prime_clip_r1 (`0`)
+
+This option is not chosen as a parameter but as an additional profile that sets the corresponding `three_prime_adapter`, `clip_r1` and `three_prime_clip_r1` parameters accordingly. You can choose to either use any of the provided profiles by running the pipeline with e.g. `illumina` to set the defaults as described above in a more convenient way.
+
+```bash
+-profile your_other_profiles,illumina
+```
+
+In case you have a custom protocol, please supply the `three_prime_adapter`, `clip_r1` and `three_prime_clip_r1` manually.
The parameter `--three_prime_adapter` is set to the Illumina TruSeq single index adapter sequence `AGATCGGAAGAGCACACGTCTGAACTCCAGTCA`. This is also to ensure, that the auto-detect functionality of `FASTP` is disabled. Please make sure to adapt this adapter sequence accordingly for your run.
-:warning: At least the `custom` protocol has to be specified, otherwise the pipeline won't run. In case you specify the `custom` protocol, ensure that the parameters above are set accordingly or the defaults will be applied. If you want to auto-detect the adapters using `fastp`, please set `--three_prime_adapter` to `auto-detect`.
+:warning: If you do not choose a profile that sets the `three_prime_adapter`, `clip_r1` and `three_prime_clip_r1` options, the pipeline won't run. If you want to auto-detect the adapters using `fastp`, please set `--three_prime_adapter` to `auto-detect`.
### `mirtrace_species` or `mirgenedb_species`
-It should point to the 3-letter species name used by [miRBase](https://www.mirbase.org/help/genome_summary.shtml) or [MirGeneDB](https://www.mirgenedb.org/browse). Note the difference in case for the two databases.
+It should point to the 3-letter species name used by [miRBase](https://www.mirbase.org/browse) or [MirGeneDB](https://www.mirgenedb.org/browse). Note the difference in case for the two databases.
### miRNA related files
Different parameters can be set for the two supported databases. By default `miRBase` will be used with the parameters below.
- `mirna_gtf`: If not supplied by the user, then `mirna_gtf` will point to the latest GFF3 file in miRbase: `https://mirbase.org/download/CURRENT/genomes/${params.mirtrace_species}.gff3`
-- `mature`: points to the FASTA file of mature miRNA sequences. `https://mirbase.org/download/mature.fa`
-- `hairpin`: points to the FASTA file of precursor miRNA sequences. `https://mirbase.org/download/hairpin.fa`
+- `mature`: points to the FASTA file of mature miRNA sequences. Default: `https://mirbase.org/download/mature.fa`
+- `hairpin`: points to the FASTA file of precursor miRNA sequences. Default: `https://mirbase.org/download/hairpin.fa`
If MirGeneDB should be used instead it needs to be specified using `--mirgenedb` and use the parameters below.
-- `mirgenedb_gff`: The data can not be downloaded automatically (URLs are created with short term tokens in it), thus the user needs to supply the gff file for either his species, or all species downloaded from `https://mirgenedb.org/download`. The total set will automatically be subsetted to the species specified with `--mirgenedb_species`.
-- `mirgenedb_mature`: points to the FASTA file of mature miRNA sequences. Download from `https://mirgenedb.org/download`.
-- `mirgenedb_hairpin`: points to the FASTA file of precursor miRNA sequences. Download from `https://mirgenedb.org/download`. Note that MirGeneDB does not have a dedicated `hairpin` file, but the `Precursor sequences` are to be used.
+- `mirgenedb_gff`: The GFF file cannot be downloaded automatically due to the presence of short-term tokens in the URLs. Therefore, the user must manually provide the GFF file, either for their species of interest or for all species, by downloading it from [MirGeneDB](https://mirgenedb.org/download). The provided dataset will be automatically filtered based on the species specified with the `--mirgenedb_species` parameter.
+- `mirgenedb_mature`: This parameter should point to the FASTA file containing mature miRNA sequences. The file can be manually downloaded from [MirGeneDB](https://mirgenedb.org/download).
+- `mirgenedb_hairpin`: This parameter should point to the FASTA file containing precursor miRNA sequences. Note that MirGeneDB does not offer a dedicated hairpin file, but the precursor sequences can be downloaded from [MirGeneDB](https://mirgenedb.org/download) and used instead.
### Genome
- `fasta`: the reference genome FASTA file
-- `bt_indices`: points to the folder containing the `bowtie2` indices for the genome reference specified by `fasta`. **Note:** if the FASTA file in `fasta` is not the same file used to generate the `bowtie2` indices, then the pipeline will fail.
+- `bowtie_index`: points to the folder containing the `bowtie` indices for the genome reference specified by `fasta`.
+
+> [!NOTE]
+> if the FASTA file in `fasta` is not the same file used to generate the `bowtie` indices, then the pipeline will fail.
### Contamination filtering
@@ -56,6 +66,12 @@ Contamination filtering of the sequencing reads is optional and can be invoked u
- `pirna`: Used to supply a FASTA file containing piRNA contamination sequence. e.g. The FASTA file is first compared to the available miRNA sequences and overlaps are removed.
- `other_contamination`: Used to supply an additional filtering set. The FASTA file is first compared to the available miRNA sequences and overlaps are removed.
+## mirDeep2
+
+If the software encounters an error with exit status 255, it will be ignored, and the pipeline will continue to complete. In such cases, the pipeline will log a note that includes the path to the work directory where the issue occurred. You can inspect this work directory to examine your input data and troubleshoot the issue.
+
+Error 255 is typically related to the core algorithm of miRDeep generating empty output files. This often happens when the reads being processed do not correspond to putative mature miRNA sequences, or if the provided precursors do not meet the criteria for valid miRNA precursors, both of which may stem from the input reads used. A common cause of this error is running the pipeline with a small subset of the input reads.
+
### UMI handling
The pipeline handles UMIs with two tools. Umicollapse to deduplicate on entire read sequence after 3'adapter removal. Followed by Umitools-extract to extract the miRNA adapter and UMI. This can be achieved by using the parameters for UMI handling as follows (in this case for QIAseq miRNA Library Kit):
@@ -64,9 +80,8 @@ The pipeline handles UMIs with two tools. Umicollapse to deduplicate on entire r
--with_umi --umitools_extract_method regex --umitools_bc_pattern = '.+(?PAACTGTAGGCACCATCAAT){s<=2}(?P.{12})(?P.*)'
```
-:::note
-You will have to specify custom umitools_bc_pattern patterns if your UMI read structure is different. Please check the required capability in your UMI handling manual. It should be set in a way, that only the insert sequence of the RNA molecule is left after extraction. Please refer to the manual of the used kit for the expected read structure.
-:::
+> [!NOTE]
+> If your UMI read structure differs, you'll need to specify custom `umitools_bc_pattern` patterns. Ensure that the pattern is set so that only the insert sequence of the RNA molecule remains after extraction. For details, refer to the UMI handling manual or the documentation of the kit you're using for the expected read structure.
## Samplesheet input
@@ -91,9 +106,12 @@ CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz
### Full samplesheet
-The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire. However, there is a strict requirement for the first 3 columns to match those defined in the table below.
+The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet must have at least 2 columns (`sample` and `fastq1`). A third column can be added if the sample is paired-end (`fastq2`).
+
+> [!NOTE]
+> Most of the tools used can't accommodate paired end reads, so whenever paired-end samples are used as inputs, only the R1 files are used by the pipeline.
-A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.
+A final samplesheet file consisting of single-end data and may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.
```console
sample,fastq_1
@@ -106,10 +124,11 @@ TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz
```
-| Column | Description |
-| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
-| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
+| Column | Description | Requirement |
+| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
+| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | Mandatory |
+| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | Mandatory |
+| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | Optional |
An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.
@@ -136,9 +155,8 @@ If you wish to repeatedly use the same parameters for multiple runs, rather than
Pipeline settings can be provided in a `yaml` or `json` file via `-params-file `.
-:::warning
-Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
-:::
+> [!WARNING]
+> Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
The above pipeline run specified with a params file in yaml format:
@@ -146,9 +164,9 @@ The above pipeline run specified with a params file in yaml format:
nextflow run nf-core/smrnaseq -profile docker -params-file params.yaml
```
-with `params.yaml` containing:
+with:
-```yaml
+```yaml title="params.yaml"
input: './samplesheet.csv'
outdir: './results/'
genome: 'GRCh37'
@@ -157,6 +175,10 @@ genome: 'GRCh37'
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
+## Optional parameters
+
+If `--save_intermediates` is specified, the intermediate files generated in the pipeline will be saved in the output directory.
+
### Updating the pipeline
When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
@@ -182,15 +204,13 @@ The `bin` directory contains some scripts used by the pipeline which may also be
To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter.
-:::tip
-If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, nor institutional specific profiles.
-:::
+> [!TIP]
+> If you wish to share such a profile (such as uploading it as supplementary material for academic publications), make sure not to include cluster-specific paths to files, nor institution-specific profiles.
## Core Nextflow arguments
-:::note
-These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
-:::
+> [!NOTE]
+> These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
### `-profile`
@@ -198,9 +218,8 @@ Use this parameter to choose a configuration profile. Profiles can give configur
Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below.
-:::info
-We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
-:::
+> [!TIP]
+> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).
@@ -224,6 +243,8 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)
- `apptainer`
- A generic configuration profile to be used with [Apptainer](https://apptainer.org/)
+- `wave`
+ - A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later).
- `conda`
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.
@@ -265,14 +286,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config
If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs).
-## Azure Resource Requests
-
-To be used with the `azurebatch` profile by specifying the `-profile azurebatch`.
-We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required.
-
-Note that the choice of VM size depends on your quota and the overall workload during the analysis.
-For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes).
-
## Running in the background
Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished.
diff --git a/main.nf b/main.nf
index cd13268a..60791ecb 100644
--- a/main.nf
+++ b/main.nf
@@ -9,8 +9,6 @@
----------------------------------------------------------------------------------------
*/
-nextflow.enable.dsl = 2
-
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS
@@ -18,6 +16,7 @@ nextflow.enable.dsl = 2
*/
include { NFCORE_SMRNASEQ } from './workflows/smrnaseq'
+include { PREPARE_GENOME } from './subworkflows/local/prepare_genome'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_smrnaseq_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_smrnaseq_pipeline'
include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_smrnaseq_pipeline'
@@ -28,9 +27,17 @@ include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_smrn
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
-params.fasta = getGenomeAttribute('fasta')
-params.mirtrace_species = getGenomeAttribute('mirtrace_species')
-params.bowtie_index = getGenomeAttribute('bowtie')
+params.fasta = getGenomeAttribute('fasta')
+params.mirtrace_species = getGenomeAttribute('mirtrace_species')
+params.bowtie_index = getGenomeAttribute('bowtie')
+params.mirna_gtf = getGenomeAttribute('mirna_gtf') //not in igenomes yet
+params.rrna = getGenomeAttribute('rrna') //not in igenomes yet
+params.trna = getGenomeAttribute('trna') //not in igenomes yet
+params.cdna = getGenomeAttribute('cdna') //not in igenomes yet
+params.ncrna = getGenomeAttribute('ncrna') //not in igenomes yet
+params.pirna = getGenomeAttribute('pirna') //not in igenomes yet
+params.other_contamination = getGenomeAttribute('other_contamination') //not in igenomes yet
+
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -42,28 +49,61 @@ workflow {
main:
ch_versions = Channel.empty()
+ //
+ // SUBWORKFLOW : Prepare reference genome files
+ //
+ PREPARE_GENOME (
+ params.fasta,
+ params.bowtie_index,
+ params.mirtrace_species,
+ params.rrna,
+ params.trna,
+ params.cdna,
+ params.ncrna,
+ params.pirna,
+ params.other_contamination,
+ params.fastp_known_mirna_adapters,
+ params.mirna_gtf
+ )
+
//
// SUBWORKFLOW: Run initialisation tasks
//
PIPELINE_INITIALISATION (
params.version,
- params.help,
params.validate_params,
params.monochrome_logs,
args,
params.outdir,
- params.input
+ params.input,
+ params.three_prime_adapter,
+ params.phred_offset
)
//
// WORKFLOW: Run main workflow
//
NFCORE_SMRNASEQ (
- Channel.of(file(params.input, checkIfExists: true)),
+ PREPARE_GENOME.out.has_fasta,
+ PREPARE_GENOME.out.has_mirtrace_species,
+ PREPARE_GENOME.out.mirna_adapters,
+ PREPARE_GENOME.out.mirtrace_species,
+ PREPARE_GENOME.out.reference_mature,
+ PREPARE_GENOME.out.reference_hairpin,
+ PREPARE_GENOME.out.mirna_gtf,
+ PREPARE_GENOME.out.fasta,
+ PREPARE_GENOME.out.bowtie_index,
+ PREPARE_GENOME.out.rrna,
+ PREPARE_GENOME.out.trna,
+ PREPARE_GENOME.out.cdna,
+ PREPARE_GENOME.out.ncrna,
+ PREPARE_GENOME.out.pirna,
+ PREPARE_GENOME.out.other_contamination,
+ ch_versions,
PIPELINE_INITIALISATION.out.samplesheet,
- ch_versions
+ PIPELINE_INITIALISATION.out.three_prime_adapter,
+ PIPELINE_INITIALISATION.out.phred_offset
)
-
//
// SUBWORKFLOW: Run completion tasks
//
diff --git a/modules.json b/modules.json
index 109997b3..6f19df08 100644
--- a/modules.json
+++ b/modules.json
@@ -5,69 +5,159 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
- "cat/cat": {
+ "bioawk": {
"branch": "master",
- "git_sha": "9437e6053dccf4aafa022bfd6e7e9de67e625af8",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "blat": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "bowtie/align": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "bowtie/build": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "bowtie2/align": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "bowtie2/build": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"cat/fastq": {
"branch": "master",
- "git_sha": "0997b47c93c06b49aa7b3fefda87e728312cf2ca",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "csvtk/join": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"fastp": {
"branch": "master",
- "git_sha": "95cf5fe0194c7bf5cb0e3027a2eb7e7c89385080",
- "installed_by": ["fastq_fastqc_umitools_fastp", "modules"]
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["fastq_fastqc_umitools_fastp"]
},
"fastqc": {
"branch": "master",
- "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fastq_fastqc_umitools_fastp"]
},
+ "gawk": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "mirdeep2/mapper": {
+ "branch": "master",
+ "git_sha": "26757a6a54d05c3133c01c564c192ff617c5ea33",
+ "installed_by": ["fastq_find_mirna_mirdeep2"]
+ },
+ "mirdeep2/mirdeep2": {
+ "branch": "master",
+ "git_sha": "757f60e5656283122cd6ec37d4679483bebb7312",
+ "installed_by": ["fastq_find_mirna_mirdeep2"]
+ },
+ "mirtop/counts": {
+ "branch": "master",
+ "git_sha": "196062335bb9ec979075bf2212f64a369b927b0d",
+ "installed_by": ["bam_stats_mirna_mirtop"]
+ },
+ "mirtop/export": {
+ "branch": "master",
+ "git_sha": "196062335bb9ec979075bf2212f64a369b927b0d",
+ "installed_by": ["bam_stats_mirna_mirtop"]
+ },
+ "mirtop/gff": {
+ "branch": "master",
+ "git_sha": "196062335bb9ec979075bf2212f64a369b927b0d",
+ "installed_by": ["bam_stats_mirna_mirtop"]
+ },
+ "mirtop/stats": {
+ "branch": "master",
+ "git_sha": "196062335bb9ec979075bf2212f64a369b927b0d",
+ "installed_by": ["bam_stats_mirna_mirtop"]
+ },
+ "mirtrace/qc": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
"multiqc": {
"branch": "master",
- "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a",
+ "git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
"installed_by": ["modules"]
},
"samtools/flagstat": {
"branch": "master",
- "git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
- "installed_by": ["bam_stats_samtools", "modules"]
+ "git_sha": "b13f07be4c508d6ff6312d354d09f2493243e208",
+ "installed_by": ["bam_stats_samtools"]
},
"samtools/idxstats": {
"branch": "master",
- "git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
- "installed_by": ["bam_stats_samtools", "modules"]
+ "git_sha": "b13f07be4c508d6ff6312d354d09f2493243e208",
+ "installed_by": ["bam_stats_samtools"]
},
"samtools/index": {
"branch": "master",
- "git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
- "installed_by": ["bam_sort_stats_samtools", "modules"]
+ "git_sha": "b13f07be4c508d6ff6312d354d09f2493243e208",
+ "installed_by": ["bam_sort_stats_samtools"]
},
"samtools/sort": {
"branch": "master",
- "git_sha": "4352dbdb09ec40db71e9b172b97a01dcf5622c26",
- "installed_by": ["bam_sort_stats_samtools", "modules"]
+ "git_sha": "b7800db9b069ed505db3f9d91b8c72faea9be17b",
+ "installed_by": ["bam_sort_stats_samtools"]
},
"samtools/stats": {
"branch": "master",
- "git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
- "installed_by": ["bam_stats_samtools", "modules"]
+ "git_sha": "b13f07be4c508d6ff6312d354d09f2493243e208",
+ "installed_by": ["bam_stats_samtools"]
+ },
+ "seqcluster/collapse": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "seqkit/fq2fa": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["fastq_find_mirna_mirdeep2"]
+ },
+ "seqkit/grep": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["modules"]
+ },
+ "seqkit/replace": {
+ "branch": "master",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["fastq_find_mirna_mirdeep2"]
},
"umicollapse": {
"branch": "master",
- "git_sha": "b97197968ac12dde2463fa54541f6350c46f2035",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"umitools/extract": {
"branch": "master",
- "git_sha": "d2c5e76f291379f3dd403e48e46ed7e6ba5da744",
- "installed_by": ["fastq_fastqc_umitools_fastp", "modules"]
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
+ "installed_by": ["fastq_fastqc_umitools_fastp"]
},
- "untarfiles": {
+ "untar": {
"branch": "master",
- "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
+ "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
}
}
@@ -76,32 +166,42 @@
"nf-core": {
"bam_sort_stats_samtools": {
"branch": "master",
- "git_sha": "4352dbdb09ec40db71e9b172b97a01dcf5622c26",
+ "git_sha": "763d4b5c05ffda3ac1ac969dc67f7458cfb2eb1d",
+ "installed_by": ["subworkflows"]
+ },
+ "bam_stats_mirna_mirtop": {
+ "branch": "master",
+ "git_sha": "196062335bb9ec979075bf2212f64a369b927b0d",
"installed_by": ["subworkflows"]
},
"bam_stats_samtools": {
"branch": "master",
- "git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
+ "git_sha": "763d4b5c05ffda3ac1ac969dc67f7458cfb2eb1d",
"installed_by": ["bam_sort_stats_samtools"]
},
"fastq_fastqc_umitools_fastp": {
"branch": "master",
- "git_sha": "cabcc0dadf8366aa7a9930066a7b3dd90d9825d5",
+ "git_sha": "46eca555142d6e597729fcb682adcc791796f514",
+ "installed_by": ["subworkflows"]
+ },
+ "fastq_find_mirna_mirdeep2": {
+ "branch": "master",
+ "git_sha": "757f60e5656283122cd6ec37d4679483bebb7312",
"installed_by": ["subworkflows"]
},
"utils_nextflow_pipeline": {
"branch": "master",
- "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
+ "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082",
"installed_by": ["subworkflows"]
},
"utils_nfcore_pipeline": {
"branch": "master",
- "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
+ "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba",
"installed_by": ["subworkflows"]
},
- "utils_nfvalidation_plugin": {
+ "utils_nfschema_plugin": {
"branch": "master",
- "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
+ "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c",
"installed_by": ["subworkflows"]
}
}
diff --git a/modules/local/blat_mirna.nf b/modules/local/blat_mirna.nf
deleted file mode 100644
index aa0d3d51..00000000
--- a/modules/local/blat_mirna.nf
+++ /dev/null
@@ -1,62 +0,0 @@
-process BLAT_MIRNA {
- tag "$fasta"
- label 'process_medium'
-
- conda 'bioconda::blat=36'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/blat:36--0' :
- 'biocontainers/blat:36--0' }"
-
- input:
- val db_type
- path mirna
- path contaminants
-
-
- output:
- path 'filtered.fa' , emit: filtered_set
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- if ( db_type == "cdna" )
- """
- echo $db_type
- awk '/^>/ { x=index(\$6, "transcript_biotype:miRNA") } { if(!x) print }' $contaminants > subset.fa
- blat -out=blast8 $mirna subset.fa /dev/stdout | awk 'BEGIN{FS="\t"}{if(\$11 < 1e-5)print \$1;}' | uniq > mirnahit.txt
- awk 'BEGIN { while((getline<"mirnahit.txt")>0) l[">"\$1]=1 } /^>/ {x = l[\$1]} {if(!x) print }' subset.fa > filtered.fa
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- blat: \$(echo \$(blat) | grep Standalone | awk '{ if (match(\$0,/[0-9]*[0-9]/,m)) print m[0] }')
- END_VERSIONS
- """
-
- else if ( db_type == "ncrna" )
- """
- echo $db_type
- awk '/^>/ { x=(index(\$6, "transcript_biotype:rRNA") || index(\$6, "transcript_biotype:miRNA")) } { if(!x) print }' $contaminants > subset.fa
- blat -out=blast8 $mirna subset.fa /dev/stdout | awk 'BEGIN{FS="\t"}{if(\$11 < 1e-5)print \$1;}' | uniq > mirnahit.txt
- awk 'BEGIN { while((getline<"mirnahit.txt")>0) l[">"\$1]=1 } /^>/ {x = l[\$1]} {if(!x) print }' subset.fa > filtered.fa
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- blat: \$(echo \$(blat) | grep Standalone | awk '{ if (match(\$0,/[0-9]*[0-9]/,m)) print m[0] }')
- END_VERSIONS
- """
-
- else
- """
- echo $db_type
- blat -out=blast8 $mirna $contaminants /dev/stdout | awk 'BEGIN{FS="\t"}{if(\$11 < 1e-5)print \$1;}' | uniq > mirnahit.txt
- awk 'BEGIN { while((getline<"mirnahit.txt")>0) l[">"\$1]=1 } /^>/ {x = l[\$1]} {if(!x) print }' $contaminants > filtered.fa
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- blat: \$(echo \$(blat) | grep Standalone | awk '{ if (match(\$0,/[0-9]*[0-9]/,m)) print m[0] }')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/bowtie_contaminants.nf b/modules/local/bowtie_contaminants.nf
deleted file mode 100644
index cf02de31..00000000
--- a/modules/local/bowtie_contaminants.nf
+++ /dev/null
@@ -1,29 +0,0 @@
-process INDEX_CONTAMINANTS {
- label 'process_medium'
-
- conda 'bowtie2=2.4.5'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.5--py39hd2f7db1_2' :
- 'biocontainers/bowtie2:2.4.5--py39hd2f7db1_2'}"
-
- input:
- path fasta
-
- output:
- path 'fasta_bidx*' , emit: index
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- bowtie2-build ${fasta} fasta_bidx --threads ${task.cpus}
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/bowtie_genome.nf b/modules/local/bowtie_genome.nf
deleted file mode 100644
index 17ea9253..00000000
--- a/modules/local/bowtie_genome.nf
+++ /dev/null
@@ -1,36 +0,0 @@
-process INDEX_GENOME {
- tag "$fasta"
- label 'process_medium'
-
- conda 'bioconda::bowtie=1.3.1'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/bowtie:1.3.1--py310h7b97f60_6' :
- 'biocontainers/bowtie:1.3.1--py310h7b97f60_6' }"
-
- input:
- tuple val(meta2), path(fasta)
-
- output:
- path 'genome*ebwt' , emit: index
- path 'genome.edited.fa', emit: fasta
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- # Remove any special base characters from reference genome FASTA file
- sed '/^[^>]/s/[^ATGCatgc]/N/g' $fasta > genome.edited.fa
- sed -i 's/ .*//' genome.edited.fa
-
- # Build bowtie index
- bowtie-build genome.edited.fa genome --threads ${task.cpus}
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/bowtie_map_contaminants.nf b/modules/local/bowtie_map_contaminants.nf
deleted file mode 100644
index c9863ab3..00000000
--- a/modules/local/bowtie_map_contaminants.nf
+++ /dev/null
@@ -1,48 +0,0 @@
-process BOWTIE_MAP_CONTAMINANTS {
- label 'process_medium'
-
- conda 'bowtie2=2.4.5'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.5--py39hd2f7db1_2' :
- 'biocontainers/bowtie2:2.4.5--py39hd2f7db1_2' }"
-
- input:
- tuple val(meta), path(reads)
- path index
- val contaminant_type
-
- output:
- tuple val(meta), path("*sam") , emit: bam
- tuple val(meta), path('*.filter.unmapped.contaminant.fastq'), emit: unmapped
- path "versions.yml" , emit: versions
- path "filtered.*.stats" , emit: stats
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def args = task.ext.args ?: ""
-
- """
- INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"`
- bowtie2 \\
- -x \$INDEX \\
- -U ${reads} \\
- --threads ${task.cpus} \\
- --un ${meta.id}.${contaminant_type}.filter.unmapped.contaminant.fastq \\
- --very-sensitive-local \\
- -k 1 \\
- -S ${meta.id}.filter.contaminant.sam \\
- ${args} \\
- > ${meta.id}.contaminant_bowtie.log 2>&1
-
- # extracting number of reads from bowtie logs
- awk -v type=${contaminant_type} 'BEGIN{tot=0} {if(NR==4 || NR == 5){tot += \$1}} END {print "\\""type"\\": "tot }' ${meta.id}.contaminant_bowtie.log | tr -d , > filtered.${meta.id}_${contaminant_type}.stats
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//' | tr -d '\0')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/bowtie_map_mirna.nf b/modules/local/bowtie_map_mirna.nf
deleted file mode 100644
index d6b0ea8f..00000000
--- a/modules/local/bowtie_map_mirna.nf
+++ /dev/null
@@ -1,54 +0,0 @@
-process BOWTIE_MAP_SEQ {
- tag "$meta.id"
- label 'process_medium'
-
- conda 'bowtie=1.3.0 bioconda::samtools=1.13'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:40128b496751b037e2bd85f6789e83d4ff8a4837-0' :
- 'biocontainers/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:40128b496751b037e2bd85f6789e83d4ff8a4837-0' }"
-
- input:
- tuple val(meta), path(reads)
- path index
-
- output:
- tuple val(meta), path("*bam") , emit: bam
- tuple val(meta), path('unmapped/*fq.gz'), emit: unmapped
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- INDEX=`find -L ./ -name "*.3.ebwt" | sed 's/.3.ebwt//'`
- bowtie \\
- -x \$INDEX \\
- -q <(zcat $reads) \\
- -p ${task.cpus} \\
- -t \\
- -k 50 \\
- --best \\
- --strata \\
- -e 99999 \\
- --chunkmbs 2048 \\
- --un ${meta.id}_unmapped.fq -S > ${meta.id}.sam
-
- samtools view -bS ${meta.id}.sam > ${meta.id}.bam
-
- if [ ! -f "${meta.id}_unmapped.fq" ]
- then
- touch ${meta.id}_unmapped.fq
- fi
- gzip ${meta.id}_unmapped.fq
- mkdir unmapped
- mv ${meta.id}_unmapped.fq.gz unmapped/.
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
- samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/bowtie_mirna.nf b/modules/local/bowtie_mirna.nf
deleted file mode 100644
index 733d816e..00000000
--- a/modules/local/bowtie_mirna.nf
+++ /dev/null
@@ -1,29 +0,0 @@
-process INDEX_MIRNA {
- label 'process_medium'
-
- conda 'bioconda::bowtie=1.3.1'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/bowtie:1.3.1--py310h7b97f60_6' :
- 'biocontainers/bowtie:1.3.1--py310h7b97f60_6' }"
-
- input:
- tuple val(meta2), path(fasta)
-
- output:
- path 'fasta_bidx*' , emit: index
- path "versions.yml", emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- bowtie-build ${fasta} fasta_bidx --threads ${task.cpus}
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/datatable_merge.nf b/modules/local/datatable_merge/main.nf
similarity index 86%
rename from modules/local/datatable_merge.nf
rename to modules/local/datatable_merge/main.nf
index c71b9c4d..e231a738 100644
--- a/modules/local/datatable_merge.nf
+++ b/modules/local/datatable_merge/main.nf
@@ -1,13 +1,13 @@
-process TABLE_MERGE {
+process DATATABLE_MERGE {
label 'process_medium'
- conda 'conda-base::r-data.table=1.12.2'
+ conda 'conda-forge::r-data.table=1.12.2'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/r-data.table:1.12.2' :
'biocontainers/r-data.table:1.12.2' }"
input:
- path mirtop
+ tuple val(meta), path(mirtop)
output:
path "mirna.tsv" , emit: mirna_tsv
diff --git a/modules/local/datatable_merge/tests/datatable_merge.nf.test b/modules/local/datatable_merge/tests/datatable_merge.nf.test
new file mode 100644
index 00000000..c7485af8
--- /dev/null
+++ b/modules/local/datatable_merge/tests/datatable_merge.nf.test
@@ -0,0 +1,71 @@
+nextflow_process {
+
+ name "Test Process DATATABLE_MERGE"
+ script "../main.nf"
+ process "DATATABLE_MERGE"
+ tag "modules"
+ tag "modules_local"
+ tag "datatable_merge"
+
+ test("Contains hsa-miR-365b-3p, hsa-miR-7-5p, hsa-miR-103a-3p") {
+
+ when {
+ params {
+ outdir = "${outputDir}"
+ }
+ process {
+ """
+ input[0] = [[],file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/datatable_merge/small_mirtop_dataset.txt", checkIfExists: true)]
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(process.out).match()
+
+ with(process.out.mirna_tsv) {
+ with(get(0)) {
+ assert get(0).endsWith(".tsv")
+
+ // Check for specific miRNAs
+ def lines = path(get(0)).readLines()
+ assert lines.any { it.contains("hsa-miR-365b-3p") }
+ assert lines.any { it.contains("hsa-miR-7-5p") }
+ assert lines.any { it.contains("hsa-miR-103a-3p") }
+ }
+ }
+ }
+ }
+
+ test("Does not contain hsa-miR-107, hsa-miR-365a-3p") {
+
+ when {
+ params {
+ outdir = "${outputDir}"
+ }
+ process {
+ """
+ input[0] = [[],file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/datatable_merge/small_mirtop_dataset.txt", checkIfExists: true)]
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(process.out).match()
+
+ with(process.out.mirna_tsv) {
+ with(get(0)) {
+ assert get(0).endsWith(".tsv")
+
+ // Check for the absence of specific miRNAs
+ def lines = path(get(0)).readLines()
+ assert !lines.any { it.contains("hsa-miR-107") }
+ assert !lines.any { it.contains("hsa-miR-365a-3p") }
+ }
+ }
+ }
+ }
+
+}
diff --git a/modules/local/datatable_merge/tests/datatable_merge.nf.test.snap b/modules/local/datatable_merge/tests/datatable_merge.nf.test.snap
new file mode 100644
index 00000000..7fce7ed9
--- /dev/null
+++ b/modules/local/datatable_merge/tests/datatable_merge.nf.test.snap
@@ -0,0 +1,48 @@
+{
+ "Contains hsa-miR-365b-3p, hsa-miR-7-5p, hsa-miR-103a-3p": {
+ "content": [
+ {
+ "0": [
+ "mirna.tsv:md5,f59a6aeb15588c43c2977950a1b0a080"
+ ],
+ "1": [
+ "versions.yml:md5,13bf3c8bbf1285dfc0ef547dcbb692b2"
+ ],
+ "mirna_tsv": [
+ "mirna.tsv:md5,f59a6aeb15588c43c2977950a1b0a080"
+ ],
+ "versions": [
+ "versions.yml:md5,13bf3c8bbf1285dfc0ef547dcbb692b2"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-30T12:57:47.129770995"
+ },
+ "Does not contain hsa-miR-107, hsa-miR-365a-3p": {
+ "content": [
+ {
+ "0": [
+ "mirna.tsv:md5,f59a6aeb15588c43c2977950a1b0a080"
+ ],
+ "1": [
+ "versions.yml:md5,13bf3c8bbf1285dfc0ef547dcbb692b2"
+ ],
+ "mirna_tsv": [
+ "mirna.tsv:md5,f59a6aeb15588c43c2977950a1b0a080"
+ ],
+ "versions": [
+ "versions.yml:md5,13bf3c8bbf1285dfc0ef547dcbb692b2"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-30T12:57:56.990602055"
+ }
+}
\ No newline at end of file
diff --git a/modules/local/edger_qc.nf b/modules/local/edger_qc/main.nf
similarity index 93%
rename from modules/local/edger_qc.nf
rename to modules/local/edger_qc/main.nf
index 8c311457..2773df80 100644
--- a/modules/local/edger_qc.nf
+++ b/modules/local/edger_qc/main.nf
@@ -1,7 +1,7 @@
process EDGER_QC {
label 'process_medium'
- conda 'bioconda::bioconductor-limma=3.58.1 bioconda::bioconductor-edger=4.0.2 conda-forge::r-data.table=1.14.10 conda-forge::r-gplots=3.1.3 conda-forge::r-statmod=1.5.0'
+ conda 'bioconda::bioconductor-limma=3.58.1 bioconda::bioconductor-edger=4.0.16 conda-forge::r-data.table=1.14.10 conda-forge::r-gplots=3.1.3 conda-forge::r-statmod=1.5.0'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-419bd7f10b2b902489ac63bbaafc7db76f8e0ae1:f5ff7de321749bc7ae12f7e79a4b581497f4c8ce-0' :
'biocontainers/mulled-v2-419bd7f10b2b902489ac63bbaafc7db76f8e0ae1:f5ff7de321749bc7ae12f7e79a4b581497f4c8ce-0' }"
diff --git a/modules/local/edger_qc/tests/edger_qc.nf.test b/modules/local/edger_qc/tests/edger_qc.nf.test
new file mode 100644
index 00000000..d33f9c3b
--- /dev/null
+++ b/modules/local/edger_qc/tests/edger_qc.nf.test
@@ -0,0 +1,73 @@
+nextflow_process {
+
+ name "Test Process EDGER_QC"
+ script "../main.nf"
+ process "EDGER_QC"
+
+ test("Should not produce MDS plot") {
+
+ when {
+ params {
+ outdir = "${outputDir}"
+ }
+ process {
+ """
+ input[0] = [file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Clone1_N1_mature.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Clone1_N1_mature_hairpin.sorted.idxstats")
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(
+ // Snapshot only the stable files (.txt, .csv) and exclude PDFs
+ process.out.edger_files.get(0).findAll { !it.endsWith('pdf')},
+ process.out.versions
+ )
+ .match() }
+ )
+ }
+
+ }
+
+ test("Should produce MDS plot") {
+
+ when {
+ params {
+ outdir = "${outputDir}"
+ }
+ process {
+ """
+ input[0] = [
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Clone1_N1_mature.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Clone1_N1_mature_hairpin.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Clone1_N3_mature.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Clone1_N3_mature_hairpin.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Control_N1_mature.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Control_N1_mature_hairpin.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Control_N2_mature.sorted.idxstats"),
+ file("https://github.com/nf-core/test-datasets/raw/smrnaseq/nf-test_data/edger_qc/Control_N2_mature_hairpin.sorted.idxstats"),
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(
+ // Snapshot only the stable files (.txt, .csv) and exclude PDFs
+ process.out.edger_files.get(0).findAll { !it.endsWith('pdf')},
+ process.out.versions,
+ // Check MDS plot exists
+ file(process.out.edger_files[0].find { file(it).name == "hairpin_edgeR_MDS_plot.pdf" }).exists()
+ )
+ .match() }
+ )
+ }
+ }
+
+}
diff --git a/modules/local/edger_qc/tests/edger_qc.nf.test.snap b/modules/local/edger_qc/tests/edger_qc.nf.test.snap
new file mode 100644
index 00000000..4e1b8a65
--- /dev/null
+++ b/modules/local/edger_qc/tests/edger_qc.nf.test.snap
@@ -0,0 +1,57 @@
+{
+ "Should not produce MDS plot": {
+ "content": [
+ [
+ "hairpin_counts.csv:md5,9a2c4c71862349eee5071cf08a81df52",
+ "hairpin_logtpm.csv:md5,590516d1c7447023933f055446d34552",
+ "hairpin_logtpm.txt:md5,5cbb1258c290d958910db677490596c0",
+ "hairpin_normalized_CPM.txt:md5,2f6685750d4c0aa1dc8150276f8a5a2d",
+ "hairpin_unmapped_read_counts.txt:md5,b3ca3b9f01dbdab1bdbd989769121794",
+ "mature_counts.csv:md5,17b953ef2fb4e58d83acc263f68755fd",
+ "mature_logtpm.csv:md5,b4654e4ec264243156b1ceab73503017",
+ "mature_logtpm.txt:md5,9cba6dd8336de7fe79be641285e92a73",
+ "mature_normalized_CPM.txt:md5,43db2854ec00e6afca25883b64ad67bd",
+ "mature_unmapped_read_counts.txt:md5,0e129ffe42aa32f96250a5071d3a7649"
+ ],
+ [
+ "versions.yml:md5,2e5b1dd3ed5befd1d4c9812a3fcb768a"
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-30T19:57:28.863043452"
+ },
+ "Should produce MDS plot": {
+ "content": [
+ [
+ "hairpin_counts.csv:md5,4b0fa0e52a7b8b40bdc5930378430136",
+ "hairpin_edgeR_MDS_distance_matrix.txt:md5,f0eb20be2b7bae7775ef65e03139f5a9",
+ "hairpin_edgeR_MDS_plot_coordinates.txt:md5,2f1f865b11c4ee5253f80ebe9a1914ee",
+ "hairpin_log2CPM_sample_distances.txt:md5,20592bfa42e23827dfac02eab1e033ff",
+ "hairpin_logtpm.csv:md5,35a5449d3468995e8010907105922898",
+ "hairpin_logtpm.txt:md5,1de707003b6ed2c38372670d69eaf5fb",
+ "hairpin_normalized_CPM.txt:md5,d42e8eb89175107c5dfbfb2c7da98d37",
+ "hairpin_unmapped_read_counts.txt:md5,c587147fb1a5b6681c17eff2d4859022",
+ "mature_counts.csv:md5,f961a9d6749dbf0c84dfb8976e0b6516",
+ "mature_edgeR_MDS_distance_matrix.txt:md5,bfbf327feedbc2e7bbbd57020ae0594c",
+ "mature_edgeR_MDS_plot_coordinates.txt:md5,b89854153c61a348929ea3901a61bd56",
+ "mature_log2CPM_sample_distances.txt:md5,b4ed17084de4711e7fd4a12d221d65ec",
+ "mature_logtpm.csv:md5,850a8ed0e4559d338578f81dc849acf5",
+ "mature_logtpm.txt:md5,9087155e2f4bc7f85ced8ab8c02c77e6",
+ "mature_normalized_CPM.txt:md5,3bc348a1248f9597dfc9e8e465c3c8a8",
+ "mature_unmapped_read_counts.txt:md5,138cf290420edbf9721b9db861204c9c"
+ ],
+ [
+ "versions.yml:md5,2e5b1dd3ed5befd1d4c9812a3fcb768a"
+ ],
+ true
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-30T19:59:15.428541578"
+ }
+}
\ No newline at end of file
diff --git a/modules/local/filter_stats.nf b/modules/local/filter_stats.nf
index 4c46f51d..2c51c35e 100644
--- a/modules/local/filter_stats.nf
+++ b/modules/local/filter_stats.nf
@@ -1,5 +1,6 @@
process FILTER_STATS {
label 'process_medium'
+ tag "$meta.id"
conda 'bowtie2=2.4.5'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
@@ -7,12 +8,11 @@ process FILTER_STATS {
'biocontainers/bowtie2:2.4.5--py39hd2f7db1_2' }"
input:
- tuple val(meta), path(reads)
- path stats_files
+ tuple val(meta), path(reads), path (stats_files)
output:
path "*_mqc.yaml" , emit: stats
- tuple val(meta), path('*.filtered.fastq.gz'), emit: reads
+ tuple val(meta), path('*.filtered.fastq.gz'), emit: reads, optional: true
path "versions.yml" , emit: versions
when:
@@ -20,17 +20,26 @@ process FILTER_STATS {
script:
"""
- readnumber=\$(wc -l ${reads} | awk '{ print \$1/4 }')
- cat ./filtered.${meta.id}_*.stats | \\
- tr '\n' ', ' | \\
+
+ if [[ ${reads} == *.gz ]]; then
+ readnumber=\$(zcat ${reads} | wc -l | awk '{ print \$1/4 }')
+ else
+ readnumber=\$(wc -l ${reads} | awk '{ print \$1/4 }')
+ fi
+
+ cat ./*${meta.id}*.stats | \\
+ tr '\\n' ', ' | \\
awk -v sample=${meta.id} -v readnumber=\$readnumber '{ print "id: \\"my_pca_section\\"\\nsection_name: \\"Contamination Filtering\\"\\ndescription: \\"This plot shows the amount of reads filtered by contaminant type.\\"\\nplot_type: \\"bargraph\\"\\npconfig:\\n id: \\"contamination_filter_plot\\"\\n title: \\"Contamination Plot\\"\\n ylab: \\"Number of reads\\"\\ndata:\\n "sample": {"\$0"\\"remaining reads\\": "readnumber"}" }' > ${meta.id}.contamination_mqc.yaml
- gzip -c ${reads} > ${meta.id}.filtered.fastq.gz
+
+ if [[ ${reads} == *.gz ]]; then
+ cp ${reads} ${meta.id}.filtered.fastq.gz
+ else
+ gzip -c ${reads} > ${meta.id}.filtered.fastq.gz
+ fi
cat <<-END_VERSIONS > versions.yml
"${task.process}":
- cat: \$(cat --version | grep 'cat ' |sed 's/cat (GNU coreutils) //')
- gzip: \$(gzip --version | grep "gzip" | sed 's/gzip //')
- tr: \$(tr --version | grep 'tr ' |sed 's/tr (GNU coreutils) //')
+ BusyBox: \$(busybox | sed -n -E 's/.*v([[:digit:].]+)\\s\\(.*/\\1/p')
END_VERSIONS
"""
}
diff --git a/modules/local/mirdeep2_mapper.nf b/modules/local/mirdeep2_mapper.nf
deleted file mode 100644
index 19a9c5dc..00000000
--- a/modules/local/mirdeep2_mapper.nf
+++ /dev/null
@@ -1,43 +0,0 @@
-def VERSION = '2.0.1'
-
-process MIRDEEP2_MAPPER {
- label 'process_medium'
- tag "$meta.id"
-
- conda 'bioconda::mirdeep2=2.0.1.3'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/mirdeep2:2.0.1.3--hdfd78af_1' :
- 'biocontainers/mirdeep2:2.0.1.3--hdfd78af_1' }"
-
- input:
- tuple val(meta), path(reads)
- path index
-
- output:
- tuple path('*_collapsed.fa'), path('*reads_vs_refdb.arf'), emit: mirdeep2_inputs
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def index_base = index.toString().tokenize(' ')[0].tokenize('.')[0]
- """
- mapper.pl \\
- $reads \\
- -e \\
- -h \\
- -i \\
- -j \\
- -m \\
- -p $index_base \\
- -s ${meta.id}_collapsed.fa \\
- -t ${meta.id}_reads_vs_refdb.arf \\
- -o 4
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- mapper: \$(echo "$VERSION")
- END_VERSIONS
- """
-}
diff --git a/modules/local/mirdeep2_prepare.nf b/modules/local/mirdeep2_prepare.nf
deleted file mode 100644
index ce66b9f1..00000000
--- a/modules/local/mirdeep2_prepare.nf
+++ /dev/null
@@ -1,31 +0,0 @@
-process MIRDEEP2_PIGZ {
- label 'process_low'
- tag "$meta.id"
-
- // TODO maybe create a mulled container and uncompress within mirdeep2_mapper?
- conda 'bioconda::bioconvert=1.1.1'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/bioconvert:1.1.1--pyhdfd78af_0' :
- 'biocontainers/bioconvert:1.1.1--pyhdfd78af_0' }"
-
- input:
- tuple val(meta), path(reads)
-
- output:
- tuple val(meta), path("*.{fastq,fq}"), emit: reads
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- pigz -f -d -p $task.cpus $reads
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/mirdeep2_run.nf b/modules/local/mirdeep2_run.nf
deleted file mode 100644
index ba37a4ac..00000000
--- a/modules/local/mirdeep2_run.nf
+++ /dev/null
@@ -1,42 +0,0 @@
-def VERSION = '2.0.1'
-
-process MIRDEEP2_RUN {
- label 'process_medium'
- errorStrategy 'ignore'
-
- conda 'bioconda::mirdeep2=2.0.1.3'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/mirdeep2:2.0.1.3--hdfd78af_1' :
- 'biocontainers/mirdeep2:2.0.1.3--hdfd78af_1' }"
-
- input:
- path(fasta)
- tuple path(reads), path(arf)
- path(hairpin)
- path(mature)
-
- output:
- path 'result*.{bed,csv,html}', emit: result
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- miRDeep2.pl \\
- $reads \\
- $fasta \\
- $arf \\
- $mature \\
- none \\
- $hairpin \\
- -d \\
- -z _${reads.simpleName}
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- mirdeep2: \$(echo "$VERSION")
- END_VERSIONS
- """
-}
diff --git a/modules/local/mirtop_quant.nf b/modules/local/mirtop_quant.nf
deleted file mode 100644
index ab38c93d..00000000
--- a/modules/local/mirtop_quant.nf
+++ /dev/null
@@ -1,42 +0,0 @@
-process MIRTOP_QUANT {
- label 'process_medium'
-
- conda 'mirtop=0.4.25 bioconda::samtools=1.15.1 conda-base::r-base=4.1.1 conda-base::r-data.table=1.14.2'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/mulled-v2-0c13ef770dd7cc5c76c2ce23ba6669234cf03385:63be019f50581cc5dfe4fc0f73ae50f2d4d661f7-0' :
- 'biocontainers/mulled-v2-0c13ef770dd7cc5c76c2ce23ba6669234cf03385:63be019f50581cc5dfe4fc0f73ae50f2d4d661f7-0' }"
-
- input:
- path ("bams/*")
- path hairpin
- path gtf
-
- output:
- path "mirtop/mirtop.gff" , emit: mirtop_gff
- path "mirtop/mirtop.tsv" , emit: mirtop_table
- path "mirtop/mirtop_rawData.tsv", emit: mirtop_rawdata
- path "mirtop/stats/*" , emit: logs
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def filter_species = params.mirgenedb ? params.mirgenedb_species : params.mirtrace_species
- """
- #Cleanup the GTF if mirbase html form is broken
- GTF="$gtf"
- sed 's/>/>/g' \$GTF | sed 's#
#\\n#g' | sed 's#
##g' | sed 's###g' | sed -e :a -e '/^\\n*\$/{\$d;N;};/\\n\$/ba' > \${GTF}_html_cleaned.gtf
- mirtop gff --hairpin $hairpin --gtf \${GTF}_html_cleaned.gtf -o mirtop --sps $filter_species ./bams/*
- mirtop counts --hairpin $hairpin --gtf \${GTF}_html_cleaned.gtf -o mirtop --sps $filter_species --add-extra --gff mirtop/mirtop.gff
- mirtop export --format isomir --hairpin $hairpin --gtf \${GTF}_html_cleaned.gtf --sps $filter_species -o mirtop mirtop/mirtop.gff
- mirtop stats mirtop/mirtop.gff --out mirtop/stats
- mv mirtop/stats/mirtop_stats.log mirtop/stats/full_mirtop_stats.log
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- mirtop: \$(echo \$(mirtop --version 2>&1) | sed 's/^.*mirtop //')
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/mirtrace.nf b/modules/local/mirtrace.nf
deleted file mode 100644
index 87526016..00000000
--- a/modules/local/mirtrace.nf
+++ /dev/null
@@ -1,46 +0,0 @@
-process MIRTRACE_RUN {
- label 'process_medium'
-
- conda 'bioconda::mirtrace=1.0.1'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/mirtrace:1.0.1--hdfd78af_1' :
- 'biocontainers/mirtrace:1.0.1--hdfd78af_1' }"
-
- input:
- tuple val(adapter), val(ids), path(reads)
- path(mirtrace_config)
-
- output:
- path "mirtrace/*" , emit: mirtrace
- path "versions.yml", emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- // mirtrace protocol defaults to 'params.protocol' if not set
- def protocol = params.protocol == 'custom' ? '' : "--protocol $params.protocol"
- def java_mem = ''
- if(task.memory){
- tmem = task.memory.toBytes()
- java_mem = "-Xms${tmem} -Xmx${tmem}"
- }
-
- """
- export mirtracejar=\$(dirname \$(which mirtrace))
-
- java $java_mem -jar \$mirtracejar/mirtrace.jar --mirtrace-wrapper-name mirtrace qc \\
- --species $params.mirtrace_species \\
- $protocol \\
- --config $mirtrace_config \\
- --write-fasta \\
- --output-dir mirtrace \\
- --force
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- mirtrace: \$(echo \$(mirtrace -v))
- END_VERSIONS
- """
-
-}
diff --git a/modules/local/parse_fasta_mirna.nf b/modules/local/parse_fasta_mirna.nf
index 60665251..b474e1c7 100644
--- a/modules/local/parse_fasta_mirna.nf
+++ b/modules/local/parse_fasta_mirna.nf
@@ -1,13 +1,14 @@
process PARSE_FASTA_MIRNA {
label 'process_medium'
- conda 'bioconda::seqkit=2.6.1'
+ conda 'bioconda::seqkit=2.8.2'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/seqkit:2.6.1--h9ee0642_0' :
'biocontainers/seqkit:2.6.1--h9ee0642_0' }"
input:
tuple val(meta2), path(fasta)
+ val filter_species
output:
tuple val(meta2), path('*_igenome.fa'), emit: parsed_fasta
@@ -17,7 +18,6 @@ process PARSE_FASTA_MIRNA {
task.ext.when == null || task.ext.when
script:
- def filter_species = params.mirgenedb ? params.mirgenedb_species : params.mirtrace_species
"""
# Uncompress FASTA reference files if necessary
FASTA="$fasta"
diff --git a/modules/local/seqcluster_collapse.nf b/modules/local/seqcluster_collapse.nf
deleted file mode 100644
index 4379654c..00000000
--- a/modules/local/seqcluster_collapse.nf
+++ /dev/null
@@ -1,33 +0,0 @@
-process SEQCLUSTER_SEQUENCES {
- label 'process_medium'
- tag "$meta.id"
-
- conda 'bioconda::seqcluster=1.2.9'
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/seqcluster:1.2.9--pyh5e36f6f_0' :
- 'biocontainers/seqcluster:1.2.9--pyh5e36f6f_0' }"
-
- input:
- tuple val(meta), path(reads)
-
- output:
- tuple val(meta), path("final/*.fastq.gz"), emit: collapsed
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- """
- seqcluster collapse -f $reads -m 1 --min_size 15 -o collapsed
- gzip collapsed/*_trimmed.fastq
- mkdir final
- mv collapsed/*.fastq.gz final/.
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- seqcluster: \$(echo \$(seqcluster --version 2>&1) | sed 's/^.*seqcluster //')
- END_VERSIONS
- """
-
-}
diff --git a/modules/nf-core/bioawk/bioawk.diff b/modules/nf-core/bioawk/bioawk.diff
new file mode 100644
index 00000000..bd9ed322
--- /dev/null
+++ b/modules/nf-core/bioawk/bioawk.diff
@@ -0,0 +1,25 @@
+Changes in module 'nf-core/bioawk'
+--- modules/nf-core/bioawk/main.nf
++++ modules/nf-core/bioawk/main.nf
+@@ -11,7 +11,7 @@
+ tuple val(meta), path(input)
+
+ output:
+- tuple val(meta), path("*.gz"), emit: output
++ tuple val(meta), path("*.fasta"), emit: output
+ path "versions.yml" , emit: versions
+
+ when:
+@@ -26,9 +26,7 @@
+ bioawk \\
+ $args \\
+ $input \\
+- > ${prefix}
+-
+- gzip ${prefix}
++ > ${prefix}.fasta
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+
+************************************************************
diff --git a/modules/nf-core/bioawk/environment.yml b/modules/nf-core/bioawk/environment.yml
new file mode 100644
index 00000000..527f6cd4
--- /dev/null
+++ b/modules/nf-core/bioawk/environment.yml
@@ -0,0 +1,5 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::bioawk=1.0
diff --git a/modules/nf-core/bioawk/main.nf b/modules/nf-core/bioawk/main.nf
new file mode 100644
index 00000000..3ae62108
--- /dev/null
+++ b/modules/nf-core/bioawk/main.nf
@@ -0,0 +1,36 @@
+process BIOAWK {
+ tag "$meta.id"
+ label 'process_single'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/bioawk:1.0--h5bf99c6_6':
+ 'biocontainers/bioawk:1.0--h5bf99c6_6' }"
+
+ input:
+ tuple val(meta), path(input)
+
+ output:
+ tuple val(meta), path("*.fasta"), emit: output
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: '' // args is used for the main arguments of the tool
+ prefix = task.ext.prefix ?: "${meta.id}"
+ if ("${input}" == "${prefix}") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!"
+ def VERSION = '1.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
+ """
+ bioawk \\
+ $args \\
+ $input \\
+ > ${prefix}.fasta
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bioawk: $VERSION
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/bioawk/meta.yml b/modules/nf-core/bioawk/meta.yml
new file mode 100644
index 00000000..c691ac0c
--- /dev/null
+++ b/modules/nf-core/bioawk/meta.yml
@@ -0,0 +1,51 @@
+name: "bioawk"
+description: Bioawk is an extension to Brian Kernighan's awk, adding the support of
+ several common biological data formats.
+keywords:
+ - bioawk
+ - fastq
+ - fasta
+ - sam
+ - file manipulation
+ - awk
+tools:
+ - "bioawk":
+ description: "BWK awk modified for biological data"
+ homepage: "https://github.com/lh3/bioawk"
+ documentation: "https://github.com/lh3/bioawk"
+ tool_dev_url: "https://github.com/lh3/bioawk"
+ licence: ["Free software license (https://github.com/lh3/bioawk/blob/master/README.awk#L1)"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - input:
+ type: file
+ description: Input sequence biological sequence file (optionally gzipped) to
+ be manipulated via program specified in `$args`.
+ pattern: "*.{bed,gff,sam,vcf,fastq,fasta,tab,bed.gz,gff.gz,sam.gz,vcf.gz,fastq.gz,fasta.gz,tab.gz}"
+output:
+ - output:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.gz":
+ type: file
+ description: |
+ Manipulated and gzipped version of input sequence file following program specified in `args`.
+ File name will be what is specified in `$prefix`. Do not include `.gz` suffix in `$prefix`! Output files` will be gzipped for you!
+ pattern: "*.gz"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@jfy133"
+maintainers:
+ - "@jfy133"
diff --git a/modules/nf-core/bioawk/tests/main.nf.test b/modules/nf-core/bioawk/tests/main.nf.test
new file mode 100644
index 00000000..270ff1ef
--- /dev/null
+++ b/modules/nf-core/bioawk/tests/main.nf.test
@@ -0,0 +1,35 @@
+
+nextflow_process {
+
+ name "Test Process BIOAWK"
+ script "../main.nf"
+ process "BIOAWK"
+ config "./nextflow.config"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "bioawk"
+
+ test("test-bioawk") {
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+}
diff --git a/modules/nf-core/bioawk/tests/main.nf.test.snap b/modules/nf-core/bioawk/tests/main.nf.test.snap
new file mode 100644
index 00000000..fa9b5930
--- /dev/null
+++ b/modules/nf-core/bioawk/tests/main.nf.test.snap
@@ -0,0 +1,37 @@
+{
+ "test-bioawk": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "sample_1.fa.gz:md5,b558dd15d8940373a032a827d490e693"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,5fe88e58a71f10551df56518c35ba91a"
+ ],
+ "output": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "sample_1.fa.gz:md5,b558dd15d8940373a032a827d490e693"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,5fe88e58a71f10551df56518c35ba91a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-08-28T10:24:46.397249"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/bioawk/tests/nextflow.config b/modules/nf-core/bioawk/tests/nextflow.config
new file mode 100644
index 00000000..5ef017d9
--- /dev/null
+++ b/modules/nf-core/bioawk/tests/nextflow.config
@@ -0,0 +1,6 @@
+process {
+ withName: BIOAWK {
+ ext.args = "-c fastx \'{print \">\" \$name ORS length(\$seq)}\'"
+ ext.prefix = "sample_1.fa"
+ }
+}
diff --git a/modules/nf-core/blat/environment.yml b/modules/nf-core/blat/environment.yml
new file mode 100644
index 00000000..2a85c078
--- /dev/null
+++ b/modules/nf-core/blat/environment.yml
@@ -0,0 +1,5 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::blat=36
diff --git a/modules/nf-core/blat/main.nf b/modules/nf-core/blat/main.nf
new file mode 100644
index 00000000..ad7b7207
--- /dev/null
+++ b/modules/nf-core/blat/main.nf
@@ -0,0 +1,62 @@
+process BLAT {
+ tag "$meta.id"
+ label 'process_single'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/blat:36--0':
+ 'biocontainers/blat:36--0' }"
+
+ input:
+ tuple val(meta) , path(query)
+ tuple val(meta2), path(subject)
+
+ output:
+ tuple val(meta), path("*.psl"), emit: psl
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def unzip = query.toString().endsWith(".gz")
+
+ """
+ in=$query
+ if $unzip
+ then
+ gunzip -cdf $query > ${prefix}.fasta
+ in=${prefix}.fasta
+ fi
+
+ blat \\
+ $args \\
+ $subject \\
+ \$in \\
+ ${prefix}.psl
+
+ if $unzip
+ then
+ rm ${prefix}.fasta
+ fi
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ blat: \$(echo \$(blat 2>&1) | sed 's/^.*BLAT v. //; s/ fast.*\$//')
+ END_VERSIONS
+ """
+
+ stub:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ """
+ touch ${prefix}.psl
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ blat: \$(echo \$(blat 2>&1) | sed 's/^.*BLAT v. //; s/ fast.*\$//')
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/blat/meta.yml b/modules/nf-core/blat/meta.yml
new file mode 100644
index 00000000..70a92c9b
--- /dev/null
+++ b/modules/nf-core/blat/meta.yml
@@ -0,0 +1,55 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/yaml-schema.json
+name: "blat"
+description: Queries a sequence subject
+keywords:
+ - blat
+ - sequence
+ - search
+tools:
+ - "blat":
+ description: "BLAT is a bioinformatics software tool which performs rapid mRNA/DNA
+ and cross-species protein alignments."
+ homepage: "https://kentinformatics.com/"
+ documentation: "https://kentinformatics.com/documentation"
+ doi: "10.1101/gr.229202"
+ licence: ["Free for academic, nonprofit and personal use"]
+ identifier: biotools:blat
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. `[ id:'test', single_end:false ]`
+ - query:
+ type: file
+ description: Sequence file
+ pattern: "*.{fasta,fasta.gz,fa,fa.gz,nib,2bit}"
+ - - meta2:
+ type: map
+ description: |
+ Groovy Map containing subject information
+ e.g. `[ id:'test', single_end:false ]`
+ - subject:
+ type: file
+ description: Sequence file
+ pattern: "*.{fa,nib,2bit}"
+output:
+ - psl:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. `[ id:'test', single_end:false ]`
+ - "*.psl":
+ type: file
+ description: Search results
+ pattern: "*.{psl}"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@d-jch"
+maintainers:
+ - "@d-jch"
diff --git a/modules/nf-core/blat/tests/main.nf.test b/modules/nf-core/blat/tests/main.nf.test
new file mode 100644
index 00000000..8b07e5cf
--- /dev/null
+++ b/modules/nf-core/blat/tests/main.nf.test
@@ -0,0 +1,75 @@
+
+nextflow_process {
+
+ name "Test Process BLAT"
+ script "../main.nf"
+ process "BLAT"
+ config "./nextflow.config"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "blat"
+ tag "seqtk/seq"
+
+ setup {
+ run("SEQTK_SEQ") {
+ script "../../seqtk/seq/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+
+ """
+ }
+ }
+ }
+
+ test("test-blat") {
+
+ when {
+ process {
+ """
+ input[0] = SEQTK_SEQ.out.fastx
+ input[1] = [
+ [ id:'sarscov2' ],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test-blat-stub") {
+ options '-stub'
+ when {
+ process {
+ """
+ input[0] = SEQTK_SEQ.out.fastx
+ input[1] = [
+ [ id:'sarscov2' ],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+}
diff --git a/modules/nf-core/blat/tests/main.nf.test.snap b/modules/nf-core/blat/tests/main.nf.test.snap
new file mode 100644
index 00000000..d46a3320
--- /dev/null
+++ b/modules/nf-core/blat/tests/main.nf.test.snap
@@ -0,0 +1,72 @@
+{
+ "test-blat": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.psl:md5,6e2e5b3be48c84877f3c54b32bb9ec33"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d9cde833b3f9cf6d359ef0f8a119380a"
+ ],
+ "psl": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.psl:md5,6e2e5b3be48c84877f3c54b32bb9ec33"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d9cde833b3f9cf6d359ef0f8a119380a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-06T20:38:03.56409"
+ },
+ "test-blat-stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.psl:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d9cde833b3f9cf6d359ef0f8a119380a"
+ ],
+ "psl": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.psl:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d9cde833b3f9cf6d359ef0f8a119380a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-06T20:38:09.736595"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/blat/tests/nextflow.config b/modules/nf-core/blat/tests/nextflow.config
new file mode 100644
index 00000000..58bc3f25
--- /dev/null
+++ b/modules/nf-core/blat/tests/nextflow.config
@@ -0,0 +1,5 @@
+process {
+ withName: SEQTK_SEQ {
+ ext.args = '-A'
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/bowtie/align/environment.yml b/modules/nf-core/bowtie/align/environment.yml
new file mode 100644
index 00000000..4434c7e7
--- /dev/null
+++ b/modules/nf-core/bowtie/align/environment.yml
@@ -0,0 +1,6 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::bowtie=1.3.0
+ - bioconda::samtools=1.16.1
diff --git a/modules/nf-core/bowtie/align/main.nf b/modules/nf-core/bowtie/align/main.nf
new file mode 100644
index 00000000..5e72b02a
--- /dev/null
+++ b/modules/nf-core/bowtie/align/main.nf
@@ -0,0 +1,77 @@
+process BOWTIE_ALIGN {
+ tag "$meta.id"
+ label 'process_high'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:c84c7c55c45af231883d9ff4fe706ac44c479c36-0' :
+ 'biocontainers/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:c84c7c55c45af231883d9ff4fe706ac44c479c36-0' }"
+
+ input:
+ tuple val(meta), path(reads)
+ tuple val(meta2), path(index)
+ val (save_unaligned)
+
+ output:
+ tuple val(meta), path('*.bam') , emit: bam
+ tuple val(meta), path('*.out') , emit: log
+ tuple val(meta), path('*fastq.gz') , emit: fastq, optional : true
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def args2 = task.ext.args2 ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def unaligned = save_unaligned ? "--un ${prefix}.unmapped.fastq" : ''
+ def endedness = meta.single_end ? "$reads" : "-1 ${reads[0]} -2 ${reads[1]}"
+ """
+ INDEX=\$(find -L ./ -name "*.3.ebwt" | sed 's/\\.3.ebwt\$//')
+ bowtie \\
+ --threads $task.cpus \\
+ --sam \\
+ -x \$INDEX \\
+ -q \\
+ $unaligned \\
+ $args \\
+ $endedness \\
+ 2> >(tee ${prefix}.out >&2) \\
+ | samtools view $args2 -@ $task.cpus -bS -o ${prefix}.bam -
+
+ if [ -f ${prefix}.unmapped.fastq ]; then
+ gzip ${prefix}.unmapped.fastq
+ fi
+ if [ -f ${prefix}.unmapped_1.fastq ]; then
+ gzip ${prefix}.unmapped_1.fastq
+ gzip ${prefix}.unmapped_2.fastq
+ fi
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
+ samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
+ END_VERSIONS
+ """
+
+ stub:
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def unaligned = save_unaligned ?
+ meta.single_end ? "echo '' | gzip > ${prefix}.unmapped.fastq.gz" :
+ "echo '' | gzip > ${prefix}.unmapped_1.fastq.gz; echo '' | gzip > ${prefix}.unmapped_2.fastq.gz"
+ : ''
+ """
+ touch ${prefix}.bam
+ touch ${prefix}.out
+ $unaligned
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
+ samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
+ END_VERSIONS
+ """
+
+
+}
diff --git a/modules/nf-core/bowtie/align/meta.yml b/modules/nf-core/bowtie/align/meta.yml
new file mode 100644
index 00000000..7b346802
--- /dev/null
+++ b/modules/nf-core/bowtie/align/meta.yml
@@ -0,0 +1,80 @@
+name: bowtie_align
+description: Align reads to a reference genome using bowtie
+keywords:
+ - align
+ - map
+ - fastq
+ - fasta
+ - genome
+ - reference
+tools:
+ - bowtie:
+ description: |
+ bowtie is a software package for mapping DNA sequences against
+ a large reference genome, such as the human genome.
+ homepage: http://bowtie-bio.sourceforge.net/index.shtml
+ documentation: http://bowtie-bio.sourceforge.net/manual.shtml
+ arxiv: arXiv:1303.3997
+ licence: ["Artistic-2.0"]
+ identifier: biotools:bowtie
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - reads:
+ type: file
+ description: |
+ List of input FastQ files of size 1 and 2 for single-end and paired-end data,
+ respectively.
+ - - meta2:
+ type: map
+ description: |
+ Groovy Map containing genome information
+ e.g. [ id:'sarscov2' ]
+ - index:
+ type: file
+ description: Bowtie genome index files
+ pattern: "*.ebwt"
+ - - save_unaligned:
+ type: boolean
+ description: Whether to save fastq files containing the reads which did not
+ align.
+output:
+ - bam:
+ - meta:
+ type: file
+ description: Output BAM file containing read alignments
+ pattern: "*.{bam}"
+ - "*.bam":
+ type: file
+ description: Output BAM file containing read alignments
+ pattern: "*.{bam}"
+ - log:
+ - meta:
+ type: file
+ description: Log file
+ pattern: "*.log"
+ - "*.out":
+ type: file
+ description: Log file
+ pattern: "*.log"
+ - fastq:
+ - meta:
+ type: file
+ description: Unaligned FastQ files
+ pattern: "*.fastq.gz"
+ - "*fastq.gz":
+ type: file
+ description: Unaligned FastQ files
+ pattern: "*.fastq.gz"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@kevinmenden"
+maintainers:
+ - "@kevinmenden"
diff --git a/modules/nf-core/bowtie/align/tests/main.nf.test b/modules/nf-core/bowtie/align/tests/main.nf.test
new file mode 100644
index 00000000..3403ae22
--- /dev/null
+++ b/modules/nf-core/bowtie/align/tests/main.nf.test
@@ -0,0 +1,129 @@
+nextflow_process {
+
+ name "Test Process BOWTIE_ALIGN"
+ script "../main.nf"
+ process "BOWTIE_ALIGN"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "bowtie"
+ tag "bowtie/align"
+ tag "bowtie/build"
+
+
+ setup {
+ run("BOWTIE_BUILD") {
+ script "../../../bowtie/build/main.nf"
+ process {
+ """
+ input[0] = [[ id:'sarscov2' ],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ]
+ """
+ }
+ }
+ }
+
+ test("sarscov2 - single_end") {
+
+ when {
+ process {
+ """
+ input[0] = [ [id:"test", single_end:true],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ input[2] = true
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out.versions,
+ process.out.bam.collect { bam(it[1]).getReadsMD5() },
+ process.out.fastq,
+ process.out.log
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - single_end - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = [ [id:"test", single_end:true],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ input[2] = true
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - paired_end") {
+
+ when {
+ process {
+ """
+ input[0] = [ [id:"test", single_end:false],
+ [file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)]
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ input[2] = false
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out.versions,
+ process.out.bam.collect { bam(it[1]).getReads(2) },
+ process.out.log
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - paired_end - stub") {
+
+ options "-stub"
+ when {
+ process {
+ """
+ input[0] = [ [id:"test", single_end:false],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ input[2] = false
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+}
diff --git a/modules/nf-core/bowtie/align/tests/main.nf.test.snap b/modules/nf-core/bowtie/align/tests/main.nf.test.snap
new file mode 100644
index 00000000..de95bb81
--- /dev/null
+++ b/modules/nf-core/bowtie/align/tests/main.nf.test.snap
@@ -0,0 +1,192 @@
+{
+ "sarscov2 - single_end": {
+ "content": [
+ [
+ "versions.yml:md5,96e36b0b99c80da0be8239d03db30ecc"
+ ],
+ [
+ "7bdcfc6f54ae6e8f4570395cc85db9a3"
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.unmapped.fastq.gz:md5,5729a694abd09657da3b9101861090c4"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.out:md5,4b9140ceadb8a18ae9330885370f8a0b"
+ ]
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-06-26T09:25:24.60746041"
+ },
+ "sarscov2 - single_end - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.out:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.unmapped.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "3": [
+ "versions.yml:md5,96e36b0b99c80da0be8239d03db30ecc"
+ ],
+ "bam": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "fastq": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.unmapped.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.out:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,96e36b0b99c80da0be8239d03db30ecc"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-06-25T10:00:28.666281812"
+ },
+ "sarscov2 - paired_end": {
+ "content": [
+ [
+ "versions.yml:md5,96e36b0b99c80da0be8239d03db30ecc"
+ ],
+ [
+ [
+ "ATGTGTACATTGGCGACCCTGCTCAATTACCTGCACCACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTG",
+ "ACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGA"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.out:md5,5e13272d112cef8faeedcdbd7c602de0"
+ ]
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-06-26T11:57:56.604464368"
+ },
+ "sarscov2 - paired_end - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.out:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+
+ ],
+ "3": [
+ "versions.yml:md5,96e36b0b99c80da0be8239d03db30ecc"
+ ],
+ "bam": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.bam:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "fastq": [
+
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.out:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,96e36b0b99c80da0be8239d03db30ecc"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-06-25T10:01:02.043164876"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/bowtie/align/tests/tags.yml b/modules/nf-core/bowtie/align/tests/tags.yml
new file mode 100644
index 00000000..a5753d58
--- /dev/null
+++ b/modules/nf-core/bowtie/align/tests/tags.yml
@@ -0,0 +1,2 @@
+bowtie/align:
+ - "modules/nf-core/bowtie/align/**"
diff --git a/modules/nf-core/bowtie/build/environment.yml b/modules/nf-core/bowtie/build/environment.yml
new file mode 100644
index 00000000..ab5a8422
--- /dev/null
+++ b/modules/nf-core/bowtie/build/environment.yml
@@ -0,0 +1,5 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::bowtie=1.3.0
diff --git a/modules/nf-core/bowtie/build/main.nf b/modules/nf-core/bowtie/build/main.nf
new file mode 100644
index 00000000..d5b4c690
--- /dev/null
+++ b/modules/nf-core/bowtie/build/main.nf
@@ -0,0 +1,50 @@
+process BOWTIE_BUILD {
+ tag "${meta.id}"
+ label 'process_high'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/bowtie:1.3.0--py38hed8969a_1' :
+ 'biocontainers/bowtie:1.3.0--py38hed8969a_1' }"
+
+ input:
+ tuple val(meta), path(fasta)
+
+ output:
+ tuple val(meta), path('bowtie') , emit: index
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ """
+ mkdir -p bowtie
+ bowtie-build --threads $task.cpus $fasta bowtie/${prefix}
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
+ END_VERSIONS
+ """
+
+ stub:
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ """
+ mkdir -p bowtie
+ touch bowtie/${prefix}.1.ebwt
+ touch bowtie/${prefix}.2.ebwt
+ touch bowtie/${prefix}.3.ebwt
+ touch bowtie/${prefix}.4.ebwt
+ touch bowtie/${prefix}.rev.1.ebwt
+ touch bowtie/${prefix}.rev.2.ebwt
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
+ END_VERSIONS
+ """
+
+}
diff --git a/modules/nf-core/bowtie/build/meta.yml b/modules/nf-core/bowtie/build/meta.yml
new file mode 100644
index 00000000..a878a5b7
--- /dev/null
+++ b/modules/nf-core/bowtie/build/meta.yml
@@ -0,0 +1,48 @@
+name: bowtie_build
+description: Create bowtie index for reference genome
+keywords:
+ - index
+ - fasta
+ - genome
+ - reference
+tools:
+ - bowtie:
+ description: |
+ bowtie is a software package for mapping DNA sequences against
+ a large reference genome, such as the human genome.
+ homepage: http://bowtie-bio.sourceforge.net/index.shtml
+ documentation: http://bowtie-bio.sourceforge.net/manual.shtml
+ arxiv: arXiv:1303.3997
+ licence: ["Artistic-2.0"]
+ identifier: biotools:bowtie
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing information about the genome fasta
+ e.g. [ id:'test' ]
+ - fasta:
+ type: file
+ description: Input genome fasta file
+output:
+ - index:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing nformation about the genome fasta
+ e.g. [ id:'test' ]
+ - bowtie:
+ type: file
+ description: Folder containing bowtie genome index files
+ pattern: "*.ebwt"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@kevinmenden"
+ - "@drpatelh"
+maintainers:
+ - "@kevinmenden"
+ - "@drpatelh"
diff --git a/modules/nf-core/bowtie/build/tests/main.nf.test b/modules/nf-core/bowtie/build/tests/main.nf.test
new file mode 100644
index 00000000..25fb3dad
--- /dev/null
+++ b/modules/nf-core/bowtie/build/tests/main.nf.test
@@ -0,0 +1,57 @@
+nextflow_process {
+
+ name "Test Process BOWTIE_BUILD"
+ script "../main.nf"
+ process "BOWTIE_BUILD"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "bowtie"
+ tag "bowtie/build"
+
+ test("sarscov2 - fasta") {
+
+ when {
+ process {
+ """
+ input[0] = [
+ [id: 'sarscov2'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fasta - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = [[id: 'sarscov2'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+}
diff --git a/modules/nf-core/bowtie/build/tests/main.nf.test.snap b/modules/nf-core/bowtie/build/tests/main.nf.test.snap
new file mode 100644
index 00000000..e8061756
--- /dev/null
+++ b/modules/nf-core/bowtie/build/tests/main.nf.test.snap
@@ -0,0 +1,96 @@
+{
+ "sarscov2 - fasta - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "sarscov2"
+ },
+ [
+ "sarscov2.1.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.2.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.3.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.4.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.rev.1.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.rev.2.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,afbd066e1dd5ae4a30b21c49149ea09a"
+ ],
+ "index": [
+ [
+ {
+ "id": "sarscov2"
+ },
+ [
+ "sarscov2.1.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.2.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.3.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.4.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.rev.1.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "sarscov2.rev.2.ebwt:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,afbd066e1dd5ae4a30b21c49149ea09a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-06-18T08:38:14.852528155"
+ },
+ "sarscov2 - fasta": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "sarscov2"
+ },
+ [
+ "sarscov2.1.ebwt:md5,d9b76ecf9fd0413240173273b38d8199",
+ "sarscov2.2.ebwt:md5,02b44af9f94c62ecd3c583048e25d4cf",
+ "sarscov2.3.ebwt:md5,4ed93abba181d8dfab2e303e33114777",
+ "sarscov2.4.ebwt:md5,c25be5f8b0378abf7a58c8a880b87626",
+ "sarscov2.rev.1.ebwt:md5,b37aaf11853e65a3b13561f27a912b06",
+ "sarscov2.rev.2.ebwt:md5,9e6b0c4c1ddb99ae71ff8a4fe5ec6459"
+ ]
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,afbd066e1dd5ae4a30b21c49149ea09a"
+ ],
+ "index": [
+ [
+ {
+ "id": "sarscov2"
+ },
+ [
+ "sarscov2.1.ebwt:md5,d9b76ecf9fd0413240173273b38d8199",
+ "sarscov2.2.ebwt:md5,02b44af9f94c62ecd3c583048e25d4cf",
+ "sarscov2.3.ebwt:md5,4ed93abba181d8dfab2e303e33114777",
+ "sarscov2.4.ebwt:md5,c25be5f8b0378abf7a58c8a880b87626",
+ "sarscov2.rev.1.ebwt:md5,b37aaf11853e65a3b13561f27a912b06",
+ "sarscov2.rev.2.ebwt:md5,9e6b0c4c1ddb99ae71ff8a4fe5ec6459"
+ ]
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,afbd066e1dd5ae4a30b21c49149ea09a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-06-18T08:37:53.65689025"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/bowtie/build/tests/tags.yml b/modules/nf-core/bowtie/build/tests/tags.yml
new file mode 100644
index 00000000..1ccfa30c
--- /dev/null
+++ b/modules/nf-core/bowtie/build/tests/tags.yml
@@ -0,0 +1,2 @@
+bowtie/build:
+ - "modules/nf-core/bowtie/build/**"
diff --git a/modules/nf-core/bowtie2/align/environment.yml b/modules/nf-core/bowtie2/align/environment.yml
new file mode 100644
index 00000000..9090f218
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/environment.yml
@@ -0,0 +1,7 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::bowtie2=2.5.2
+ - bioconda::samtools=1.18
+ - conda-forge::pigz=2.6
diff --git a/modules/nf-core/bowtie2/align/main.nf b/modules/nf-core/bowtie2/align/main.nf
new file mode 100644
index 00000000..809525ad
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/main.nf
@@ -0,0 +1,117 @@
+process BOWTIE2_ALIGN {
+ tag "$meta.id"
+ label 'process_high'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:f70b31a2db15c023d641c32f433fb02cd04df5a6-0' :
+ 'biocontainers/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:f70b31a2db15c023d641c32f433fb02cd04df5a6-0' }"
+
+ input:
+ tuple val(meta) , path(reads)
+ tuple val(meta2), path(index)
+ tuple val(meta3), path(fasta)
+ val save_unaligned
+ val sort_bam
+
+ output:
+ tuple val(meta), path("*.sam") , emit: sam , optional:true
+ tuple val(meta), path("*.bam") , emit: bam , optional:true
+ tuple val(meta), path("*.cram") , emit: cram , optional:true
+ tuple val(meta), path("*.csi") , emit: csi , optional:true
+ tuple val(meta), path("*.crai") , emit: crai , optional:true
+ tuple val(meta), path("*.log") , emit: log
+ tuple val(meta), path("*fastq.gz") , emit: fastq , optional:true
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ""
+ def args2 = task.ext.args2 ?: ""
+ def prefix = task.ext.prefix ?: "${meta.id}"
+
+ def unaligned = ""
+ def reads_args = ""
+ if (meta.single_end) {
+ unaligned = save_unaligned ? "--un-gz ${prefix}.unmapped.fastq.gz" : ""
+ reads_args = "-U ${reads}"
+ } else {
+ unaligned = save_unaligned ? "--un-conc-gz ${prefix}.unmapped.fastq.gz" : ""
+ reads_args = "-1 ${reads[0]} -2 ${reads[1]}"
+ }
+
+ def samtools_command = sort_bam ? 'sort' : 'view'
+ def extension_pattern = /(--output-fmt|-O)+\s+(\S+)/
+ def extension_matcher = (args2 =~ extension_pattern)
+ def extension = extension_matcher.getCount() > 0 ? extension_matcher[0][2].toLowerCase() : "bam"
+ def reference = fasta && extension=="cram" ? "--reference ${fasta}" : ""
+ if (!fasta && extension=="cram") error "Fasta reference is required for CRAM output"
+
+ """
+ INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"`
+ [ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"`
+ [ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1
+
+ bowtie2 \\
+ -x \$INDEX \\
+ $reads_args \\
+ --threads $task.cpus \\
+ $unaligned \\
+ $args \\
+ 2> >(tee ${prefix}.bowtie2.log >&2) \\
+ | samtools $samtools_command $args2 --threads $task.cpus ${reference} -o ${prefix}.${extension} -
+
+ if [ -f ${prefix}.unmapped.fastq.1.gz ]; then
+ mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz
+ fi
+
+ if [ -f ${prefix}.unmapped.fastq.2.gz ]; then
+ mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz
+ fi
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
+ samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
+ pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
+ END_VERSIONS
+ """
+
+ stub:
+ def args2 = task.ext.args2 ?: ""
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def extension_pattern = /(--output-fmt|-O)+\s+(\S+)/
+ def extension = (args2 ==~ extension_pattern) ? (args2 =~ extension_pattern)[0][2].toLowerCase() : "bam"
+ def create_unmapped = ""
+ if (meta.single_end) {
+ create_unmapped = save_unaligned ? "touch ${prefix}.unmapped.fastq.gz" : ""
+ } else {
+ create_unmapped = save_unaligned ? "touch ${prefix}.unmapped_1.fastq.gz && touch ${prefix}.unmapped_2.fastq.gz" : ""
+ }
+ def reference = fasta && extension=="cram" ? "--reference ${fasta}" : ""
+ if (!fasta && extension=="cram") error "Fasta reference is required for CRAM output"
+
+ def create_index = ""
+ if (extension == "cram") {
+ create_index = "touch ${prefix}.crai"
+ } else if (extension == "bam") {
+ create_index = "touch ${prefix}.csi"
+ }
+
+ """
+ touch ${prefix}.${extension}
+ ${create_index}
+ touch ${prefix}.bowtie2.log
+ ${create_unmapped}
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
+ samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
+ pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
+ END_VERSIONS
+ """
+
+}
diff --git a/modules/nf-core/bowtie2/align/meta.yml b/modules/nf-core/bowtie2/align/meta.yml
new file mode 100644
index 00000000..f841f781
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/meta.yml
@@ -0,0 +1,132 @@
+name: bowtie2_align
+description: Align reads to a reference genome using bowtie2
+keywords:
+ - align
+ - map
+ - fasta
+ - fastq
+ - genome
+ - reference
+tools:
+ - bowtie2:
+ description: |
+ Bowtie 2 is an ultrafast and memory-efficient tool for aligning
+ sequencing reads to long reference sequences.
+ homepage: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
+ documentation: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
+ doi: 10.1038/nmeth.1923
+ licence: ["GPL-3.0-or-later"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - reads:
+ type: file
+ description: |
+ List of input FastQ files of size 1 and 2 for single-end and paired-end data,
+ respectively.
+ - - meta2:
+ type: map
+ description: |
+ Groovy Map containing reference information
+ e.g. [ id:'test', single_end:false ]
+ - index:
+ type: file
+ description: Bowtie2 genome index files
+ pattern: "*.ebwt"
+ - - meta3:
+ type: map
+ description: |
+ Groovy Map containing reference information
+ e.g. [ id:'test', single_end:false ]
+ - fasta:
+ type: file
+ description: Bowtie2 genome fasta file
+ pattern: "*.fasta"
+ - - save_unaligned:
+ type: boolean
+ description: |
+ Save reads that do not map to the reference (true) or discard them (false)
+ (default: false)
+ - - sort_bam:
+ type: boolean
+ description: use samtools sort (true) or samtools view (false)
+ pattern: "true or false"
+output:
+ - sam:
+ - meta:
+ type: file
+ description: Output SAM file containing read alignments
+ pattern: "*.sam"
+ - "*.sam":
+ type: file
+ description: Output SAM file containing read alignments
+ pattern: "*.sam"
+ - bam:
+ - meta:
+ type: file
+ description: Output BAM file containing read alignments
+ pattern: "*.bam"
+ - "*.bam":
+ type: file
+ description: Output BAM file containing read alignments
+ pattern: "*.bam"
+ - cram:
+ - meta:
+ type: file
+ description: Output CRAM file containing read alignments
+ pattern: "*.cram"
+ - "*.cram":
+ type: file
+ description: Output CRAM file containing read alignments
+ pattern: "*.cram"
+ - csi:
+ - meta:
+ type: file
+ description: Output SAM/BAM index for large inputs
+ pattern: "*.csi"
+ - "*.csi":
+ type: file
+ description: Output SAM/BAM index for large inputs
+ pattern: "*.csi"
+ - crai:
+ - meta:
+ type: file
+ description: Output CRAM index
+ pattern: "*.crai"
+ - "*.crai":
+ type: file
+ description: Output CRAM index
+ pattern: "*.crai"
+ - log:
+ - meta:
+ type: file
+ description: Aligment log
+ pattern: "*.log"
+ - "*.log":
+ type: file
+ description: Aligment log
+ pattern: "*.log"
+ - fastq:
+ - meta:
+ type: file
+ description: Unaligned FastQ files
+ pattern: "*.fastq.gz"
+ - "*fastq.gz":
+ type: file
+ description: Unaligned FastQ files
+ pattern: "*.fastq.gz"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@joseespinosa"
+ - "@drpatelh"
+maintainers:
+ - "@joseespinosa"
+ - "@drpatelh"
diff --git a/modules/nf-core/bowtie2/align/tests/cram_crai.config b/modules/nf-core/bowtie2/align/tests/cram_crai.config
new file mode 100644
index 00000000..03f1d5e5
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/tests/cram_crai.config
@@ -0,0 +1,5 @@
+process {
+ withName: BOWTIE2_ALIGN {
+ ext.args2 = '--output-fmt cram --write-index'
+ }
+}
diff --git a/modules/nf-core/bowtie2/align/tests/large_index.config b/modules/nf-core/bowtie2/align/tests/large_index.config
new file mode 100644
index 00000000..fdc1c59d
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/tests/large_index.config
@@ -0,0 +1,5 @@
+process {
+ withName: BOWTIE2_BUILD {
+ ext.args = '--large-index'
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/bowtie2/align/tests/main.nf.test b/modules/nf-core/bowtie2/align/tests/main.nf.test
new file mode 100644
index 00000000..0de5950f
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/tests/main.nf.test
@@ -0,0 +1,623 @@
+nextflow_process {
+
+ name "Test Process BOWTIE2_ALIGN"
+ script "../main.nf"
+ process "BOWTIE2_ALIGN"
+ tag "modules"
+ tag "modules_nfcore"
+ tag "bowtie2"
+ tag "bowtie2/build"
+ tag "bowtie2/align"
+
+ test("sarscov2 - fastq, index, fasta, false, false - bam") {
+
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fastq, index, fasta, false, false - sam") {
+
+ config "./sam.config"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.sam[0][1]).readLines()[0..4],
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fastq, index, fasta, false, false - sam2") {
+
+ config "./sam2.config"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.sam[0][1]).readLines()[0..4],
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fastq, index, fasta, false, true - bam") {
+
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - [fastq1, fastq2], index, fasta, false, false - bam") {
+
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - [fastq1, fastq2], index, fasta, false, true - bam") {
+
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fastq, large_index, fasta, false, false - bam") {
+
+ config "./large_index.config"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - [fastq1, fastq2], large_index, fasta, false, false - bam") {
+
+ config "./large_index.config"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - [fastq1, fastq2], index, fasta, true, false - bam") {
+
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fastq, index, fasta, true, false - bam") {
+
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ process.out.log,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+
+ )
+ }
+
+ }
+
+ test("sarscov2 - [fastq1, fastq2], index, fasta, true, true - cram") {
+
+ config "./cram_crai.config"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = true //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.cram[0][1]).name,
+ file(process.out.crai[0][1]).name
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - [fastq1, fastq2], index, fasta, false, false - stub") {
+
+ options "-stub"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:false ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ file(process.out.csi[0][1]).name,
+ file(process.out.log[0][1]).name,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+ test("sarscov2 - fastq, index, fasta, true, false - stub") {
+
+ options "-stub"
+ setup {
+ run("BOWTIE2_BUILD") {
+ script "../../build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'test'],
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ ]
+ """
+ }
+ }
+ }
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ input[1] = BOWTIE2_BUILD.out.index
+ input[2] = [[ id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)]
+ input[3] = false //save_unaligned
+ input[4] = false //sort
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(
+ file(process.out.bam[0][1]).name,
+ file(process.out.csi[0][1]).name,
+ file(process.out.log[0][1]).name,
+ process.out.fastq,
+ process.out.versions
+ ).match() }
+ )
+ }
+
+ }
+
+}
diff --git a/modules/nf-core/bowtie2/align/tests/main.nf.test.snap b/modules/nf-core/bowtie2/align/tests/main.nf.test.snap
new file mode 100644
index 00000000..028e7da6
--- /dev/null
+++ b/modules/nf-core/bowtie2/align/tests/main.nf.test.snap
@@ -0,0 +1,311 @@
+{
+ "sarscov2 - [fastq1, fastq2], large_index, fasta, false, false - bam": {
+ "content": [
+ "test.bam",
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.bowtie2.log:md5,bd89ce1b28c93bf822bae391ffcedd19"
+ ]
+ ],
+ [
+
+ ],
+ [
+ "versions.yml:md5,01d18ab035146ea790e9a0f70adb758f"
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "23.10.1"
+ },
+ "timestamp": "2024-03-18T13:19:25.337323"
+ },
+ "sarscov2 - fastq, index, fasta, false, false - sam2": {
+ "content": [
+ [
+ "ERR5069949.2151832\t16\tMT192765.1\t17453\t42\t150M\t*\t0\t0\tACGCACATTGCTAACTAAGGGCACACTAGAACCAGAATATTTCAATTCAGTGTGTAGACTTATGAAAACTATAGGTCCAGACATGTTCCTCGGAACTTGTCGGCGTTGTCCTGCTGAAATTGTTGACACTGTGAGTGCTTTGGTTTATGA\tAAAA versions.yml
+ "${task.process}":
+ bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
+ END_VERSIONS
+ """
+
+ stub:
+ """
+ mkdir bowtie2
+ touch bowtie2/${fasta.baseName}.{1..4}.bt2
+ touch bowtie2/${fasta.baseName}.rev.{1,2}.bt2
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/bowtie2/build/meta.yml b/modules/nf-core/bowtie2/build/meta.yml
new file mode 100644
index 00000000..2729a92e
--- /dev/null
+++ b/modules/nf-core/bowtie2/build/meta.yml
@@ -0,0 +1,49 @@
+name: bowtie2_build
+description: Builds bowtie index for reference genome
+keywords:
+ - build
+ - index
+ - fasta
+ - genome
+ - reference
+tools:
+ - bowtie2:
+ description: |
+ Bowtie 2 is an ultrafast and memory-efficient tool for aligning
+ sequencing reads to long reference sequences.
+ homepage: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
+ documentation: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
+ doi: 10.1038/nmeth.1923
+ licence: ["GPL-3.0-or-later"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing reference information
+ e.g. [ id:'test', single_end:false ]
+ - fasta:
+ type: file
+ description: Input genome fasta file
+output:
+ - index:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing reference information
+ e.g. [ id:'test', single_end:false ]
+ - bowtie2:
+ type: file
+ description: Bowtie2 genome index files
+ pattern: "*.bt2"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@joseespinosa"
+ - "@drpatelh"
+maintainers:
+ - "@joseespinosa"
+ - "@drpatelh"
diff --git a/modules/nf-core/bowtie2/build/tests/main.nf.test b/modules/nf-core/bowtie2/build/tests/main.nf.test
new file mode 100644
index 00000000..16376025
--- /dev/null
+++ b/modules/nf-core/bowtie2/build/tests/main.nf.test
@@ -0,0 +1,31 @@
+nextflow_process {
+
+ name "Test Process BOWTIE2_BUILD"
+ script "modules/nf-core/bowtie2/build/main.nf"
+ process "BOWTIE2_BUILD"
+ tag "modules"
+ tag "modules_nfcore"
+ tag "bowtie2"
+ tag "bowtie2/build"
+
+ test("Should run without failures") {
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test' ],
+ file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
+ ]
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(process.out).match()
+ }
+
+ }
+
+}
diff --git a/modules/nf-core/bowtie2/build/tests/main.nf.test.snap b/modules/nf-core/bowtie2/build/tests/main.nf.test.snap
new file mode 100644
index 00000000..6875e021
--- /dev/null
+++ b/modules/nf-core/bowtie2/build/tests/main.nf.test.snap
@@ -0,0 +1,45 @@
+{
+ "Should run without failures": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ [
+ "genome.1.bt2:md5,cbe3d0bbea55bc57c99b4bfa25b5fbdf",
+ "genome.2.bt2:md5,47b153cd1319abc88dda532462651fcf",
+ "genome.3.bt2:md5,4ed93abba181d8dfab2e303e33114777",
+ "genome.4.bt2:md5,c25be5f8b0378abf7a58c8a880b87626",
+ "genome.rev.1.bt2:md5,52be6950579598a990570fbcf5372184",
+ "genome.rev.2.bt2:md5,e3b4ef343dea4dd571642010a7d09597"
+ ]
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,1df11e9b82891527271c889c880d3974"
+ ],
+ "index": [
+ [
+ {
+ "id": "test"
+ },
+ [
+ "genome.1.bt2:md5,cbe3d0bbea55bc57c99b4bfa25b5fbdf",
+ "genome.2.bt2:md5,47b153cd1319abc88dda532462651fcf",
+ "genome.3.bt2:md5,4ed93abba181d8dfab2e303e33114777",
+ "genome.4.bt2:md5,c25be5f8b0378abf7a58c8a880b87626",
+ "genome.rev.1.bt2:md5,52be6950579598a990570fbcf5372184",
+ "genome.rev.2.bt2:md5,e3b4ef343dea4dd571642010a7d09597"
+ ]
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,1df11e9b82891527271c889c880d3974"
+ ]
+ }
+ ],
+ "timestamp": "2023-11-23T11:51:01.107681997"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/bowtie2/build/tests/tags.yml b/modules/nf-core/bowtie2/build/tests/tags.yml
new file mode 100644
index 00000000..81aa61da
--- /dev/null
+++ b/modules/nf-core/bowtie2/build/tests/tags.yml
@@ -0,0 +1,2 @@
+bowtie2/build:
+ - modules/nf-core/bowtie2/build/**
diff --git a/modules/nf-core/cat/cat/environment.yml b/modules/nf-core/cat/cat/environment.yml
deleted file mode 100644
index 17a04ef2..00000000
--- a/modules/nf-core/cat/cat/environment.yml
+++ /dev/null
@@ -1,7 +0,0 @@
-name: cat_cat
-channels:
- - conda-forge
- - bioconda
- - defaults
-dependencies:
- - conda-forge::pigz=2.3.4
diff --git a/modules/nf-core/cat/cat/main.nf b/modules/nf-core/cat/cat/main.nf
deleted file mode 100644
index adbdbd7b..00000000
--- a/modules/nf-core/cat/cat/main.nf
+++ /dev/null
@@ -1,79 +0,0 @@
-process CAT_CAT {
- tag "$meta.id"
- label 'process_low'
-
- conda "${moduleDir}/environment.yml"
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/pigz:2.3.4' :
- 'biocontainers/pigz:2.3.4' }"
-
- input:
- tuple val(meta), path(files_in)
-
- output:
- tuple val(meta), path("${prefix}"), emit: file_out
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def args = task.ext.args ?: ''
- def args2 = task.ext.args2 ?: ''
- def file_list = files_in.collect { it.toString() }
-
- // choose appropriate concatenation tool depending on input and output format
-
- // | input | output | command1 | command2 |
- // |-----------|------------|----------|----------|
- // | gzipped | gzipped | cat | |
- // | ungzipped | ungzipped | cat | |
- // | gzipped | ungzipped | zcat | |
- // | ungzipped | gzipped | cat | pigz |
-
- // Use input file ending as default
- prefix = task.ext.prefix ?: "${meta.id}${getFileSuffix(file_list[0])}"
- out_zip = prefix.endsWith('.gz')
- in_zip = file_list[0].endsWith('.gz')
- command1 = (in_zip && !out_zip) ? 'zcat' : 'cat'
- command2 = (!in_zip && out_zip) ? "| pigz -c -p $task.cpus $args2" : ''
- if(file_list.contains(prefix.trim())) {
- error "The name of the input file can't be the same as for the output prefix in the " +
- "module CAT_CAT (currently `$prefix`). Please choose a different one."
- }
- """
- $command1 \\
- $args \\
- ${file_list.join(' ')} \\
- $command2 \\
- > ${prefix}
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
- END_VERSIONS
- """
-
- stub:
- def file_list = files_in.collect { it.toString() }
- prefix = task.ext.prefix ?: "${meta.id}${file_list[0].substring(file_list[0].lastIndexOf('.'))}"
- if(file_list.contains(prefix.trim())) {
- error "The name of the input file can't be the same as for the output prefix in the " +
- "module CAT_CAT (currently `$prefix`). Please choose a different one."
- }
- """
- touch $prefix
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
- END_VERSIONS
- """
-}
-
-// for .gz files also include the second to last extension if it is present. E.g., .fasta.gz
-def getFileSuffix(filename) {
- def match = filename =~ /^.*?((\.\w{1,5})?(\.\w{1,5}\.gz$))/
- return match ? match[0][1] : filename.substring(filename.lastIndexOf('.'))
-}
-
diff --git a/modules/nf-core/cat/cat/meta.yml b/modules/nf-core/cat/cat/meta.yml
deleted file mode 100644
index 00a8db0b..00000000
--- a/modules/nf-core/cat/cat/meta.yml
+++ /dev/null
@@ -1,36 +0,0 @@
-name: cat_cat
-description: A module for concatenation of gzipped or uncompressed files
-keywords:
- - concatenate
- - gzip
- - cat
-tools:
- - cat:
- description: Just concatenation
- documentation: https://man7.org/linux/man-pages/man1/cat.1.html
- licence: ["GPL-3.0-or-later"]
-input:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- - files_in:
- type: file
- description: List of compressed / uncompressed files
- pattern: "*"
-output:
- - versions:
- type: file
- description: File containing software versions
- pattern: "versions.yml"
- - file_out:
- type: file
- description: Concatenated file. Will be gzipped if file_out ends with ".gz"
- pattern: "${file_out}"
-authors:
- - "@erikrikarddaniel"
- - "@FriederikeHanssen"
-maintainers:
- - "@erikrikarddaniel"
- - "@FriederikeHanssen"
diff --git a/modules/nf-core/cat/cat/tests/main.nf.test b/modules/nf-core/cat/cat/tests/main.nf.test
deleted file mode 100644
index fcee2d19..00000000
--- a/modules/nf-core/cat/cat/tests/main.nf.test
+++ /dev/null
@@ -1,178 +0,0 @@
-nextflow_process {
-
- name "Test Process CAT_CAT"
- script "../main.nf"
- process "CAT_CAT"
- tag "modules"
- tag "modules_nfcore"
- tag "cat"
- tag "cat/cat"
-
- test("test_cat_name_conflict") {
- when {
- params {
- outdir = "${outputDir}"
- }
- process {
- """
- input[0] =
- [
- [ id:'genome', single_end:true ],
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true),
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.sizes', checkIfExists: true)
- ]
- ]
- """
- }
- }
- then {
- assertAll(
- { assert !process.success },
- { assert process.stdout.toString().contains("The name of the input file can't be the same as for the output prefix") }
- )
- }
- }
-
- test("test_cat_unzipped_unzipped") {
- when {
- params {
- outdir = "${outputDir}"
- }
- process {
- """
- input[0] =
- [
- [ id:'test', single_end:true ],
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true),
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.sizes', checkIfExists: true)
- ]
- ]
- """
- }
- }
- then {
- assertAll(
- { assert process.success },
- { assert snapshot(process.out).match() }
- )
- }
- }
-
-
- test("test_cat_zipped_zipped") {
- when {
- params {
- outdir = "${outputDir}"
- }
- process {
- """
- input[0] =
- [
- [ id:'test', single_end:true ],
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.gff3.gz', checkIfExists: true),
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/alignment/last/contigs.genome.maf.gz', checkIfExists: true)
- ]
- ]
- """
- }
- }
- then {
- def lines = path(process.out.file_out.get(0).get(1)).linesGzip
- assertAll(
- { assert process.success },
- { assert snapshot(lines[0..5]).match("test_cat_zipped_zipped_lines") },
- { assert snapshot(lines.size()).match("test_cat_zipped_zipped_size")}
- )
- }
- }
-
- test("test_cat_zipped_unzipped") {
- config './nextflow_zipped_unzipped.config'
-
- when {
- params {
- outdir = "${outputDir}"
- }
- process {
- """
- input[0] =
- [
- [ id:'test', single_end:true ],
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.gff3.gz', checkIfExists: true),
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/alignment/last/contigs.genome.maf.gz', checkIfExists: true)
- ]
- ]
- """
- }
- }
-
- then {
- assertAll(
- { assert process.success },
- { assert snapshot(process.out).match() }
- )
- }
-
- }
-
- test("test_cat_unzipped_zipped") {
- config './nextflow_unzipped_zipped.config'
- when {
- params {
- outdir = "${outputDir}"
- }
- process {
- """
- input[0] =
- [
- [ id:'test', single_end:true ],
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true),
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.sizes', checkIfExists: true)
- ]
- ]
- """
- }
- }
- then {
- def lines = path(process.out.file_out.get(0).get(1)).linesGzip
- assertAll(
- { assert process.success },
- { assert snapshot(lines[0..5]).match("test_cat_unzipped_zipped_lines") },
- { assert snapshot(lines.size()).match("test_cat_unzipped_zipped_size")}
- )
- }
- }
-
- test("test_cat_one_file_unzipped_zipped") {
- config './nextflow_unzipped_zipped.config'
- when {
- params {
- outdir = "${outputDir}"
- }
- process {
- """
- input[0] =
- [
- [ id:'test', single_end:true ],
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
- ]
- ]
- """
- }
- }
- then {
- def lines = path(process.out.file_out.get(0).get(1)).linesGzip
- assertAll(
- { assert process.success },
- { assert snapshot(lines[0..5]).match("test_cat_one_file_unzipped_zipped_lines") },
- { assert snapshot(lines.size()).match("test_cat_one_file_unzipped_zipped_size")}
- )
- }
- }
-}
diff --git a/modules/nf-core/cat/cat/tests/main.nf.test.snap b/modules/nf-core/cat/cat/tests/main.nf.test.snap
deleted file mode 100644
index 423571ba..00000000
--- a/modules/nf-core/cat/cat/tests/main.nf.test.snap
+++ /dev/null
@@ -1,121 +0,0 @@
-{
- "test_cat_unzipped_zipped_size": {
- "content": [
- 375
- ],
- "timestamp": "2023-10-16T14:33:08.049445686"
- },
- "test_cat_unzipped_unzipped": {
- "content": [
- {
- "0": [
- [
- {
- "id": "test",
- "single_end": true
- },
- "test.fasta:md5,f44b33a0e441ad58b2d3700270e2dbe2"
- ]
- ],
- "1": [
- "versions.yml:md5,115ed6177ebcff24eb99d503fa5ef894"
- ],
- "file_out": [
- [
- {
- "id": "test",
- "single_end": true
- },
- "test.fasta:md5,f44b33a0e441ad58b2d3700270e2dbe2"
- ]
- ],
- "versions": [
- "versions.yml:md5,115ed6177ebcff24eb99d503fa5ef894"
- ]
- }
- ],
- "timestamp": "2023-10-16T14:32:18.500464399"
- },
- "test_cat_zipped_unzipped": {
- "content": [
- {
- "0": [
- [
- {
- "id": "test",
- "single_end": true
- },
- "cat.txt:md5,c439d3b60e7bc03e8802a451a0d9a5d9"
- ]
- ],
- "1": [
- "versions.yml:md5,115ed6177ebcff24eb99d503fa5ef894"
- ],
- "file_out": [
- [
- {
- "id": "test",
- "single_end": true
- },
- "cat.txt:md5,c439d3b60e7bc03e8802a451a0d9a5d9"
- ]
- ],
- "versions": [
- "versions.yml:md5,115ed6177ebcff24eb99d503fa5ef894"
- ]
- }
- ],
- "timestamp": "2023-10-16T14:32:49.642741302"
- },
- "test_cat_zipped_zipped_lines": {
- "content": [
- [
- "MT192765.1\tGenbank\ttranscript\t259\t29667\t.\t+\t.\tID=unknown_transcript_1;geneID=orf1ab;gene_name=orf1ab",
- "MT192765.1\tGenbank\tgene\t259\t21548\t.\t+\t.\tParent=unknown_transcript_1",
- "MT192765.1\tGenbank\tCDS\t259\t13461\t.\t+\t0\tParent=unknown_transcript_1;exception=\"ribosomal slippage\";gbkey=CDS;gene=orf1ab;note=\"pp1ab;translated=by -1 ribosomal frameshift\";product=\"orf1ab polyprotein\";protein_id=QIK50426.1",
- "MT192765.1\tGenbank\tCDS\t13461\t21548\t.\t+\t0\tParent=unknown_transcript_1;exception=\"ribosomal slippage\";gbkey=CDS;gene=orf1ab;note=\"pp1ab;translated=by -1 ribosomal frameshift\";product=\"orf1ab polyprotein\";protein_id=QIK50426.1",
- "MT192765.1\tGenbank\tCDS\t21556\t25377\t.\t+\t0\tParent=unknown_transcript_1;gbkey=CDS;gene=S;note=\"structural protein\";product=\"surface glycoprotein\";protein_id=QIK50427.1",
- "MT192765.1\tGenbank\tgene\t21556\t25377\t.\t+\t.\tParent=unknown_transcript_1"
- ]
- ],
- "timestamp": "2023-10-16T14:32:33.629048645"
- },
- "test_cat_unzipped_zipped_lines": {
- "content": [
- [
- ">MT192765.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/PC00101P/2020, complete genome",
- "GTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGT",
- "GTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAG",
- "TAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTTGTCCGG",
- "GTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTT",
- "ACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAG"
- ]
- ],
- "timestamp": "2023-10-16T14:33:08.038830506"
- },
- "test_cat_one_file_unzipped_zipped_lines": {
- "content": [
- [
- ">MT192765.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/PC00101P/2020, complete genome",
- "GTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGT",
- "GTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAG",
- "TAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTTGTCCGG",
- "GTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTT",
- "ACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAG"
- ]
- ],
- "timestamp": "2023-10-16T14:33:21.39642399"
- },
- "test_cat_zipped_zipped_size": {
- "content": [
- 78
- ],
- "timestamp": "2023-10-16T14:32:33.641869244"
- },
- "test_cat_one_file_unzipped_zipped_size": {
- "content": [
- 374
- ],
- "timestamp": "2023-10-16T14:33:21.4094373"
- }
-}
\ No newline at end of file
diff --git a/modules/nf-core/cat/cat/tests/nextflow_unzipped_zipped.config b/modules/nf-core/cat/cat/tests/nextflow_unzipped_zipped.config
deleted file mode 100644
index ec26b0fd..00000000
--- a/modules/nf-core/cat/cat/tests/nextflow_unzipped_zipped.config
+++ /dev/null
@@ -1,6 +0,0 @@
-
-process {
- withName: CAT_CAT {
- ext.prefix = 'cat.txt.gz'
- }
-}
diff --git a/modules/nf-core/cat/cat/tests/nextflow_zipped_unzipped.config b/modules/nf-core/cat/cat/tests/nextflow_zipped_unzipped.config
deleted file mode 100644
index fbc79783..00000000
--- a/modules/nf-core/cat/cat/tests/nextflow_zipped_unzipped.config
+++ /dev/null
@@ -1,8 +0,0 @@
-
-process {
-
- withName: CAT_CAT {
- ext.prefix = 'cat.txt'
- }
-
-}
diff --git a/modules/nf-core/cat/cat/tests/tags.yml b/modules/nf-core/cat/cat/tests/tags.yml
deleted file mode 100644
index 37b578f5..00000000
--- a/modules/nf-core/cat/cat/tests/tags.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-cat/cat:
- - modules/nf-core/cat/cat/**
diff --git a/modules/nf-core/cat/fastq/environment.yml b/modules/nf-core/cat/fastq/environment.yml
index 8c69b121..c7eb9bd1 100644
--- a/modules/nf-core/cat/fastq/environment.yml
+++ b/modules/nf-core/cat/fastq/environment.yml
@@ -1,7 +1,5 @@
-name: cat_fastq
channels:
- conda-forge
- bioconda
- - defaults
dependencies:
- conda-forge::coreutils=8.30
diff --git a/modules/nf-core/cat/fastq/main.nf b/modules/nf-core/cat/fastq/main.nf
index f132b2ad..b68e5f91 100644
--- a/modules/nf-core/cat/fastq/main.nf
+++ b/modules/nf-core/cat/fastq/main.nf
@@ -53,9 +53,9 @@ process CAT_FASTQ {
def prefix = task.ext.prefix ?: "${meta.id}"
def readList = reads instanceof List ? reads.collect{ it.toString() } : [reads.toString()]
if (meta.single_end) {
- if (readList.size > 1) {
+ if (readList.size >= 1) {
"""
- touch ${prefix}.merged.fastq.gz
+ echo '' | gzip > ${prefix}.merged.fastq.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
@@ -64,10 +64,10 @@ process CAT_FASTQ {
"""
}
} else {
- if (readList.size > 2) {
+ if (readList.size >= 2) {
"""
- touch ${prefix}_1.merged.fastq.gz
- touch ${prefix}_2.merged.fastq.gz
+ echo '' | gzip > ${prefix}_1.merged.fastq.gz
+ echo '' | gzip > ${prefix}_2.merged.fastq.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
diff --git a/modules/nf-core/cat/fastq/meta.yml b/modules/nf-core/cat/fastq/meta.yml
index db4ac3c7..91ff2fb5 100644
--- a/modules/nf-core/cat/fastq/meta.yml
+++ b/modules/nf-core/cat/fastq/meta.yml
@@ -10,30 +10,33 @@ tools:
The cat utility reads files sequentially, writing them to the standard output.
documentation: https://www.gnu.org/software/coreutils/manual/html_node/cat-invocation.html
licence: ["GPL-3.0-or-later"]
+ identifier: ""
input:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- - reads:
- type: file
- description: |
- List of input FastQ files to be concatenated.
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - reads:
+ type: file
+ description: |
+ List of input FastQ files to be concatenated.
output:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- reads:
- type: file
- description: Merged fastq file
- pattern: "*.{merged.fastq.gz}"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.merged.fastq.gz":
+ type: file
+ description: Merged fastq file
+ pattern: "*.{merged.fastq.gz}"
- versions:
- type: file
- description: File containing software versions
- pattern: "versions.yml"
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
authors:
- "@joseespinosa"
- "@drpatelh"
diff --git a/modules/nf-core/cat/fastq/tests/main.nf.test b/modules/nf-core/cat/fastq/tests/main.nf.test
index dab2e14c..f88a78b6 100644
--- a/modules/nf-core/cat/fastq/tests/main.nf.test
+++ b/modules/nf-core/cat/fastq/tests/main.nf.test
@@ -1,3 +1,5 @@
+// NOTE The version snaps may not be consistant
+// https://github.com/nf-core/modules/pull/4087#issuecomment-1767948035
nextflow_process {
name "Test Process CAT_FASTQ"
@@ -11,9 +13,6 @@ nextflow_process {
test("test_cat_fastq_single_end") {
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
input[0] = Channel.of([
@@ -36,9 +35,6 @@ nextflow_process {
test("test_cat_fastq_paired_end") {
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
input[0] = Channel.of([
@@ -63,9 +59,6 @@ nextflow_process {
test("test_cat_fastq_single_end_same_name") {
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
input[0] = Channel.of([
@@ -88,9 +81,6 @@ nextflow_process {
test("test_cat_fastq_paired_end_same_name") {
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
input[0] = Channel.of([
@@ -115,9 +105,129 @@ nextflow_process {
test("test_cat_fastq_single_end_single_file") {
when {
- params {
- outdir = "$outputDir"
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)]
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_cat_fastq_single_end - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)]
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_cat_fastq_paired_end - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:false ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true)]
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_cat_fastq_single_end_same_name - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)]
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_cat_fastq_paired_end_same_name - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:false ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)]
+ ])
+ """
}
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_cat_fastq_single_end_single_file - stub") {
+
+ options "-stub"
+
+ when {
process {
"""
input[0] = Channel.of([
diff --git a/modules/nf-core/cat/fastq/tests/main.nf.test.snap b/modules/nf-core/cat/fastq/tests/main.nf.test.snap
index 43dfe28f..aec119a9 100644
--- a/modules/nf-core/cat/fastq/tests/main.nf.test.snap
+++ b/modules/nf-core/cat/fastq/tests/main.nf.test.snap
@@ -28,6 +28,10 @@
]
}
],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
"timestamp": "2024-01-17T17:30:39.816981"
},
"test_cat_fastq_single_end_same_name": {
@@ -59,6 +63,10 @@
]
}
],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
"timestamp": "2024-01-17T17:32:35.229332"
},
"test_cat_fastq_single_end_single_file": {
@@ -90,6 +98,10 @@
]
}
],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
"timestamp": "2024-01-17T17:34:00.058829"
},
"test_cat_fastq_paired_end_same_name": {
@@ -127,8 +139,123 @@
]
}
],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
"timestamp": "2024-01-17T17:33:33.031555"
},
+ "test_cat_fastq_single_end - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-07-05T12:07:28.244999"
+ },
+ "test_cat_fastq_paired_end_same_name - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-07-05T12:07:57.070911"
+ },
+ "test_cat_fastq_single_end_same_name - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-07-05T12:07:46.796254"
+ },
"test_cat_fastq_paired_end": {
"content": [
{
@@ -164,6 +291,86 @@
]
}
],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
"timestamp": "2024-01-17T17:32:02.270935"
+ },
+ "test_cat_fastq_paired_end - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-07-05T12:07:37.807553"
+ },
+ "test_cat_fastq_single_end_single_file - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,d42d6e24d67004608495883e00bd501b"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-07-05T12:14:51.861264"
}
}
\ No newline at end of file
diff --git a/modules/nf-core/csvtk/join/environment.yml b/modules/nf-core/csvtk/join/environment.yml
new file mode 100644
index 00000000..ea951bdb
--- /dev/null
+++ b/modules/nf-core/csvtk/join/environment.yml
@@ -0,0 +1,5 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::csvtk=0.30.0
diff --git a/modules/nf-core/csvtk/join/main.nf b/modules/nf-core/csvtk/join/main.nf
new file mode 100644
index 00000000..5f3afeea
--- /dev/null
+++ b/modules/nf-core/csvtk/join/main.nf
@@ -0,0 +1,49 @@
+process CSVTK_JOIN {
+ tag "$meta.id"
+ label 'process_single'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/csvtk:0.30.0--h9ee0642_0':
+ 'biocontainers/csvtk:0.30.0--h9ee0642_0' }"
+
+ input:
+ tuple val(meta), path(csv)
+
+ output:
+ tuple val(meta), path("${prefix}.${out_extension}"), emit: csv
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ prefix = task.ext.prefix ?: "${meta.id}"
+ out_extension = args.contains('--out-delimiter "\t"') || args.contains('-D "\t"') || args.contains("-D \$'\t'") ? "tsv" : "csv"
+ """
+ csvtk \\
+ join \\
+ $args \\
+ --num-cpus $task.cpus \\
+ --out-file ${prefix}.${out_extension} \\
+ $csv
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ csvtk: \$(echo \$( csvtk version | sed -e "s/csvtk v//g" ))
+ END_VERSIONS
+ """
+
+ stub:
+ prefix = task.ext.prefix ?: "${meta.id}"
+ out_extension = args.contains('--out-delimiter "\t"') || args.contains('-D "\t"') || args.contains("-D \$'\t'") ? "tsv" : "csv"
+ """
+ touch ${prefix}.${out_extension}
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ csvtk: \$(echo \$( csvtk version | sed -e "s/csvtk v//g" ))
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/csvtk/join/meta.yml b/modules/nf-core/csvtk/join/meta.yml
new file mode 100644
index 00000000..d8671b17
--- /dev/null
+++ b/modules/nf-core/csvtk/join/meta.yml
@@ -0,0 +1,45 @@
+name: csvtk_join
+description: Join two or more CSV (or TSV) tables by selected fields into a single
+ table
+keywords:
+ - join
+ - tsv
+ - csv
+tools:
+ - csvtk:
+ description: A cross-platform, efficient, practical CSV/TSV toolkit
+ homepage: http://bioinf.shenwei.me/csvtk
+ documentation: http://bioinf.shenwei.me/csvtk
+ tool_dev_url: https://github.com/shenwei356/csvtk
+ licence: ["MIT"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - csv:
+ type: file
+ description: CSV/TSV formatted files
+ pattern: "*.{csv,tsv}"
+output:
+ - csv:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - ${prefix}.${out_extension}:
+ type: file
+ description: Joined CSV/TSV file
+ pattern: "*.{csv,tsv}"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "version.yml"
+authors:
+ - "@anoronh4"
+maintainers:
+ - "@anoronh4"
diff --git a/modules/nf-core/csvtk/join/tests/main.nf.test b/modules/nf-core/csvtk/join/tests/main.nf.test
new file mode 100644
index 00000000..3cf178c4
--- /dev/null
+++ b/modules/nf-core/csvtk/join/tests/main.nf.test
@@ -0,0 +1,64 @@
+nextflow_process {
+
+ name "Test Process CSVTK_JOIN"
+ script "../main.nf"
+ process "CSVTK_JOIN"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "csvtk"
+ tag "csvtk/join"
+
+ test("join - csv") {
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test' ], // meta map
+ [
+ file("https://github.com/nf-core/test-datasets/raw/bacass/bacass_hybrid.csv", checkIfExists: true),
+ file("https://github.com/nf-core/test-datasets/raw/bacass/bacass_short.csv", checkIfExists: true),
+ ]
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+ test("join - csv - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test' ], // meta map
+ [
+ file("https://github.com/nf-core/test-datasets/raw/bacass/bacass_hybrid.csv", checkIfExists: true),
+ file("https://github.com/nf-core/test-datasets/raw/bacass/bacass_short.csv", checkIfExists: true),
+ ]
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+}
diff --git a/modules/nf-core/csvtk/join/tests/main.nf.test.snap b/modules/nf-core/csvtk/join/tests/main.nf.test.snap
new file mode 100644
index 00000000..b124788b
--- /dev/null
+++ b/modules/nf-core/csvtk/join/tests/main.nf.test.snap
@@ -0,0 +1,60 @@
+{
+ "join - csv": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ "test.csv:md5,d0ad82ca096c7e05eb9f9a04194c9e30"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,e76147e4eca968d23543e7007522f1d3"
+ ],
+ "csv": [
+ [
+ {
+ "id": "test"
+ },
+ "test.csv:md5,d0ad82ca096c7e05eb9f9a04194c9e30"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e76147e4eca968d23543e7007522f1d3"
+ ]
+ }
+ ],
+ "timestamp": "2024-05-21T15:45:44.045434"
+ },
+ "join - csv - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ "test.csv:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,e76147e4eca968d23543e7007522f1d3"
+ ],
+ "csv": [
+ [
+ {
+ "id": "test"
+ },
+ "test.csv:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e76147e4eca968d23543e7007522f1d3"
+ ]
+ }
+ ],
+ "timestamp": "2024-05-21T15:45:55.59201"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/csvtk/join/tests/nextflow.config b/modules/nf-core/csvtk/join/tests/nextflow.config
new file mode 100644
index 00000000..1b14393a
--- /dev/null
+++ b/modules/nf-core/csvtk/join/tests/nextflow.config
@@ -0,0 +1,5 @@
+process {
+ withName: CSVTK_JOIN {
+ ext.args = "--fields 'ID;ID' -p -e -d \"\t\" -D \",\""
+ }
+}
diff --git a/modules/nf-core/csvtk/join/tests/tags.yml b/modules/nf-core/csvtk/join/tests/tags.yml
new file mode 100644
index 00000000..6c3a0fa6
--- /dev/null
+++ b/modules/nf-core/csvtk/join/tests/tags.yml
@@ -0,0 +1,2 @@
+csvtk/join:
+ - "modules/nf-core/csvtk/join/**"
diff --git a/modules/nf-core/fastp/environment.yml b/modules/nf-core/fastp/environment.yml
index 70389e66..26d4aca5 100644
--- a/modules/nf-core/fastp/environment.yml
+++ b/modules/nf-core/fastp/environment.yml
@@ -1,7 +1,5 @@
-name: fastp
channels:
- conda-forge
- bioconda
- - defaults
dependencies:
- bioconda::fastp=0.23.4
diff --git a/modules/nf-core/fastp/main.nf b/modules/nf-core/fastp/main.nf
index 4fc19b74..e1b9f565 100644
--- a/modules/nf-core/fastp/main.nf
+++ b/modules/nf-core/fastp/main.nf
@@ -10,6 +10,7 @@ process FASTP {
input:
tuple val(meta), path(reads)
path adapter_fasta
+ val discard_trimmed_pass
val save_trimmed_fail
val save_merged
@@ -18,9 +19,9 @@ process FASTP {
tuple val(meta), path('*.json') , emit: json
tuple val(meta), path('*.html') , emit: html
tuple val(meta), path('*.log') , emit: log
- path "versions.yml" , emit: versions
tuple val(meta), path('*.fail.fastq.gz') , optional:true, emit: reads_fail
tuple val(meta), path('*.merged.fastq.gz'), optional:true, emit: reads_merged
+ path "versions.yml" , emit: versions
when:
task.ext.when == null || task.ext.when
@@ -30,6 +31,8 @@ process FASTP {
def prefix = task.ext.prefix ?: "${meta.id}"
def adapter_list = adapter_fasta ? "--adapter_fasta ${adapter_fasta}" : ""
def fail_fastq = save_trimmed_fail && meta.single_end ? "--failed_out ${prefix}.fail.fastq.gz" : save_trimmed_fail && !meta.single_end ? "--failed_out ${prefix}.paired.fail.fastq.gz --unpaired1 ${prefix}_1.fail.fastq.gz --unpaired2 ${prefix}_2.fail.fastq.gz" : ''
+ def out_fq1 = discard_trimmed_pass ?: ( meta.single_end ? "--out1 ${prefix}.fastp.fastq.gz" : "--out1 ${prefix}_1.fastp.fastq.gz" )
+ def out_fq2 = discard_trimmed_pass ?: "--out2 ${prefix}_2.fastp.fastq.gz"
// Added soft-links to original fastqs for consistent naming in MultiQC
// Use single ended for interleaved. Add --interleaved_in in config.
if ( task.ext.args?.contains('--interleaved_in') ) {
@@ -59,7 +62,7 @@ process FASTP {
fastp \\
--in1 ${prefix}.fastq.gz \\
- --out1 ${prefix}.fastp.fastq.gz \\
+ $out_fq1 \\
--thread $task.cpus \\
--json ${prefix}.fastp.json \\
--html ${prefix}.fastp.html \\
@@ -81,8 +84,8 @@ process FASTP {
fastp \\
--in1 ${prefix}_1.fastq.gz \\
--in2 ${prefix}_2.fastq.gz \\
- --out1 ${prefix}_1.fastp.fastq.gz \\
- --out2 ${prefix}_2.fastp.fastq.gz \\
+ $out_fq1 \\
+ $out_fq2 \\
--json ${prefix}.fastp.json \\
--html ${prefix}.fastp.html \\
$adapter_list \\
@@ -103,14 +106,16 @@ process FASTP {
stub:
def prefix = task.ext.prefix ?: "${meta.id}"
def is_single_output = task.ext.args?.contains('--interleaved_in') || meta.single_end
- def touch_reads = is_single_output ? "${prefix}.fastp.fastq.gz" : "${prefix}_1.fastp.fastq.gz ${prefix}_2.fastp.fastq.gz"
- def touch_merged = (!is_single_output && save_merged) ? "touch ${prefix}.merged.fastq.gz" : ""
+ def touch_reads = (discard_trimmed_pass) ? "" : (is_single_output) ? "echo '' | gzip > ${prefix}.fastp.fastq.gz" : "echo '' | gzip > ${prefix}_1.fastp.fastq.gz ; echo '' | gzip > ${prefix}_2.fastp.fastq.gz"
+ def touch_merged = (!is_single_output && save_merged) ? "echo '' | gzip > ${prefix}.merged.fastq.gz" : ""
+ def touch_fail_fastq = (!save_trimmed_fail) ? "" : meta.single_end ? "echo '' | gzip > ${prefix}.fail.fastq.gz" : "echo '' | gzip > ${prefix}.paired.fail.fastq.gz ; echo '' | gzip > ${prefix}_1.fail.fastq.gz ; echo '' | gzip > ${prefix}_2.fail.fastq.gz"
"""
- touch $touch_reads
+ $touch_reads
+ $touch_fail_fastq
+ $touch_merged
touch "${prefix}.fastp.json"
touch "${prefix}.fastp.html"
touch "${prefix}.fastp.log"
- $touch_merged
cat <<-END_VERSIONS > versions.yml
"${task.process}":
diff --git a/modules/nf-core/fastp/meta.yml b/modules/nf-core/fastp/meta.yml
index c22a16ab..159404d0 100644
--- a/modules/nf-core/fastp/meta.yml
+++ b/modules/nf-core/fastp/meta.yml
@@ -11,62 +11,100 @@ tools:
documentation: https://github.com/OpenGene/fastp
doi: 10.1093/bioinformatics/bty560
licence: ["MIT"]
+ identifier: biotools:fastp
input:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads.
- e.g. [ id:'test', single_end:false ]
- - reads:
- type: file
- description: |
- List of input FastQ files of size 1 and 2 for single-end and paired-end data,
- respectively. If you wish to run interleaved paired-end data, supply as single-end data
- but with `--interleaved_in` in your `modules.conf`'s `ext.args` for the module.
- - adapter_fasta:
- type: file
- description: File in FASTA format containing possible adapters to remove.
- pattern: "*.{fasta,fna,fas,fa}"
- - save_trimmed_fail:
- type: boolean
- description: Specify true to save files that failed to pass trimming thresholds ending in `*.fail.fastq.gz`
- - save_merged:
- type: boolean
- description: Specify true to save all merged reads to the a file ending in `*.merged.fastq.gz`
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads.
+ e.g. [ id:'test', single_end:false ]
+ - reads:
+ type: file
+ description: |
+ List of input FastQ files of size 1 and 2 for single-end and paired-end data,
+ respectively. If you wish to run interleaved paired-end data, supply as single-end data
+ but with `--interleaved_in` in your `modules.conf`'s `ext.args` for the module.
+ - - adapter_fasta:
+ type: file
+ description: File in FASTA format containing possible adapters to remove.
+ pattern: "*.{fasta,fna,fas,fa}"
+ - - discard_trimmed_pass:
+ type: boolean
+ description: Specify true to not write any reads that pass trimming thresholds.
+ | This can be used to use fastp for the output report only.
+ - - save_trimmed_fail:
+ type: boolean
+ description: Specify true to save files that failed to pass trimming thresholds
+ ending in `*.fail.fastq.gz`
+ - - save_merged:
+ type: boolean
+ description: Specify true to save all merged reads to a file ending in `*.merged.fastq.gz`
output:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- reads:
- type: file
- description: The trimmed/modified/unmerged fastq reads
- pattern: "*fastp.fastq.gz"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.fastp.fastq.gz":
+ type: file
+ description: The trimmed/modified/unmerged fastq reads
+ pattern: "*fastp.fastq.gz"
- json:
- type: file
- description: Results in JSON format
- pattern: "*.json"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.json":
+ type: file
+ description: Results in JSON format
+ pattern: "*.json"
- html:
- type: file
- description: Results in HTML format
- pattern: "*.html"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.html":
+ type: file
+ description: Results in HTML format
+ pattern: "*.html"
- log:
- type: file
- description: fastq log file
- pattern: "*.log"
- - versions:
- type: file
- description: File containing software versions
- pattern: "versions.yml"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.log":
+ type: file
+ description: fastq log file
+ pattern: "*.log"
- reads_fail:
- type: file
- description: Reads the failed the preprocessing
- pattern: "*fail.fastq.gz"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.fail.fastq.gz":
+ type: file
+ description: Reads the failed the preprocessing
+ pattern: "*fail.fastq.gz"
- reads_merged:
- type: file
- description: Reads that were successfully merged
- pattern: "*.{merged.fastq.gz}"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.merged.fastq.gz":
+ type: file
+ description: Reads that were successfully merged
+ pattern: "*.{merged.fastq.gz}"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
authors:
- "@drpatelh"
- "@kevinmenden"
diff --git a/modules/nf-core/fastp/tests/main.nf.test b/modules/nf-core/fastp/tests/main.nf.test
index 6f1f4897..30dbb8aa 100644
--- a/modules/nf-core/fastp/tests/main.nf.test
+++ b/modules/nf-core/fastp/tests/main.nf.test
@@ -10,221 +10,290 @@ nextflow_process {
test("test_fastp_single_end") {
when {
- params {
- outdir = "$outputDir"
- }
+
process {
"""
- adapter_fasta = []
- save_trimmed_fail = false
- save_merged = false
-
input[0] = Channel.of([
[ id:'test', single_end:true ],
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = false
"""
}
}
then {
- def html_text = [ "Q20 bases:12.922000 K (92.984097%)",
- "single end (151 cycles)" ]
- def log_text = [ "Q20 bases: 12922(92.9841%)",
- "reads passed filter: 99" ]
- def read_lines = ["@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1",
- "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT",
- "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE
- { assert path(process.out.reads.get(0).get(1)).linesGzip.contains(read_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { assert snapshot(process.out.json).match("test_fastp_single_end_json") },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { file(it[1]).getName() } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_single_end-_match")
- },
- { assert snapshot(process.out.versions).match("versions_single_end") }
+ { assert path(process.out.html.get(0).get(1)).getText().contains("single end (151 cycles)") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 99") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.versions).match() }
)
}
}
- test("test_fastp_single_end-stub") {
-
- options '-stub'
+ test("test_fastp_paired_end") {
when {
- params {
- outdir = "$outputDir"
- }
+
process {
"""
adapter_fasta = []
+ save_trimmed_pass = true
save_trimmed_fail = false
save_merged = false
input[0] = Channel.of([
- [ id:'test', single_end:true ],
- [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
+ [ id:'test', single_end:false ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = false
"""
}
}
then {
+ assertAll(
+ { assert process.success },
+ { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("Q30 bases: 12281(88.3716%)") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.versions).match() }
+ )
+ }
+ }
+ test("fastp test_fastp_interleaved") {
+
+ config './nextflow.interleaved.config'
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) ]
+ ])
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = false
+ """
+ }
+ }
+
+ then {
assertAll(
{ assert process.success },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { file(it[1]).getName() } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_single_end-for_stub_match")
- },
- { assert snapshot(process.out.versions).match("versions_single_end_stub") }
+ { assert path(process.out.html.get(0).get(1)).getText().contains("paired end (151 cycles + 151 cycles)") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 162") },
+ { assert process.out.reads_fail == [] },
+ { assert process.out.reads_merged == [] },
+ { assert snapshot(
+ process.out.reads,
+ process.out.json,
+ process.out.versions).match() }
)
}
}
- test("test_fastp_paired_end") {
+ test("test_fastp_single_end_trim_fail") {
when {
- params {
- outdir = "$outputDir"
+
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
+ ])
+ input[1] = []
+ input[2] = false
+ input[3] = true
+ input[4] = false
+ """
}
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert path(process.out.html.get(0).get(1)).getText().contains("single end (151 cycles)") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 99") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.versions).match() }
+ )
+ }
+ }
+
+ test("test_fastp_paired_end_trim_fail") {
+
+ config './nextflow.save_failed.config'
+ when {
process {
"""
- adapter_fasta = []
- save_trimmed_fail = false
- save_merged = false
+ input[0] = Channel.of([
+ [ id:'test', single_end:false ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)]
+ ])
+ input[1] = []
+ input[2] = false
+ input[3] = true
+ input[4] = false
+ """
+ }
+ }
+ then {
+ assertAll(
+ { assert process.success },
+ { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 162") },
+ { assert snapshot(
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.json,
+ process.out.versions).match() }
+ )
+ }
+ }
+
+ test("test_fastp_paired_end_merged") {
+
+ when {
+ process {
+ """
input[0] = Channel.of([
[ id:'test', single_end:false ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = true
"""
}
}
then {
- def html_text = [ "Q20 bases: | 25.719000 K (93.033098%)",
- "The input has little adapter percentage (~0.000000%), probably it's trimmed before."]
- def log_text = [ "No adapter detected for read1",
- "Q30 bases: 12281(88.3716%)"]
- def json_text = ['"passed_filter_reads": 198']
- def read1_lines = ["@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1",
- "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT",
- "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE
- { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) }
- }
- },
- { read2_lines.each { read2_line ->
- { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { json_text.each { json_part ->
- { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) }
- }
- },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { it[1].collect { item -> file(item).getName() } } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_paired_end_match")
- },
- { assert snapshot(process.out.versions).match("versions_paired_end") }
+ { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("total reads: 75") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.versions).match() },
)
}
}
- test("test_fastp_paired_end-stub") {
-
- options '-stub'
+ test("test_fastp_paired_end_merged_adapterlist") {
when {
- params {
- outdir = "$outputDir"
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:false ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
+ ])
+ input[1] = Channel.of([ file(params.modules_testdata_base_path + 'delete_me/fastp/adapters.fasta', checkIfExists: true) ])
+ input[2] = false
+ input[3] = false
+ input[4] = true
+ """
}
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert path(process.out.html.get(0).get(1)).getText().contains(" ") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("total bases: 13683") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.versions).match() }
+ )
+ }
+ }
+
+ test("test_fastp_single_end_qc_only") {
+
+ when {
process {
"""
- adapter_fasta = []
- save_trimmed_fail = false
- save_merged = false
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ],
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
+ ])
+ input[1] = []
+ input[2] = true
+ input[3] = false
+ input[4] = false
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert path(process.out.html.get(0).get(1)).getText().contains("single end (151 cycles)") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 99") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.reads_merged,
+ process.out.versions).match() }
+ )
+ }
+ }
+ test("test_fastp_paired_end_qc_only") {
+
+ when {
+ process {
+ """
input[0] = Channel.of([
[ id:'test', single_end:false ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = true
+ input[3] = false
+ input[4] = false
"""
}
}
@@ -232,114 +301,99 @@ nextflow_process {
then {
assertAll(
{ assert process.success },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { it[1].collect { item -> file(item).getName() } } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_paired_end-for_stub_match")
- },
- { assert snapshot(process.out.versions).match("versions_paired_end-stub") }
+ { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") },
+ { assert path(process.out.log.get(0).get(1)).getText().contains("Q30 bases: 12281(88.3716%)") },
+ { assert snapshot(
+ process.out.json,
+ process.out.reads,
+ process.out.reads,
+ process.out.reads_fail,
+ process.out.reads_fail,
+ process.out.reads_merged,
+ process.out.reads_merged,
+ process.out.versions).match() }
)
}
}
- test("fastp test_fastp_interleaved") {
+ test("test_fastp_single_end - stub") {
+
+ options "-stub"
- config './nextflow.interleaved.config'
when {
- params {
- outdir = "$outputDir"
+
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ],
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
+ ])
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = false
+ """
}
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_fastp_paired_end - stub") {
+
+ options "-stub"
+
+ when {
+
process {
"""
adapter_fasta = []
+ save_trimmed_pass = true
save_trimmed_fail = false
save_merged = false
input[0] = Channel.of([
- [ id:'test', single_end:true ], // meta map
- [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) ]
+ [ id:'test', single_end:false ], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = false
"""
}
}
then {
- def html_text = [ "Q20 bases: | 25.719000 K (93.033098%)",
- "paired end (151 cycles + 151 cycles)"]
- def log_text = [ "Q20 bases: 12922(92.9841%)",
- "reads passed filter: 162"]
- def read_lines = [ "@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1",
- "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT",
- "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE
- { assert path(process.out.reads.get(0).get(1)).linesGzip.contains(read_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { assert snapshot(process.out.json).match("fastp test_fastp_interleaved_json") },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { file(it[1]).getName() } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_interleaved-_match")
- },
- { assert snapshot(process.out.versions).match("versions_interleaved") }
+ { assert snapshot(process.out).match() }
)
}
}
- test("fastp test_fastp_interleaved-stub") {
+ test("fastp - stub test_fastp_interleaved") {
- options '-stub'
+ options "-stub"
config './nextflow.interleaved.config'
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
- adapter_fasta = []
- save_trimmed_fail = false
- save_merged = false
-
input[0] = Channel.of([
[ id:'test', single_end:true ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = false
"""
}
}
@@ -347,277 +401,112 @@ nextflow_process {
then {
assertAll(
{ assert process.success },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { file(it[1]).getName() } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_interleaved-for_stub_match")
- },
- { assert snapshot(process.out.versions).match("versions_interleaved-stub") }
+ { assert snapshot(process.out).match() }
)
}
}
- test("test_fastp_single_end_trim_fail") {
+ test("test_fastp_single_end_trim_fail - stub") {
+
+ options "-stub"
when {
- params {
- outdir = "$outputDir"
- }
+
process {
"""
- adapter_fasta = []
- save_trimmed_fail = true
- save_merged = false
-
input[0] = Channel.of([
[ id:'test', single_end:true ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = true
+ input[4] = false
"""
}
}
then {
- def html_text = [ "Q20 bases: | 12.922000 K (92.984097%)",
- "single end (151 cycles)"]
- def log_text = [ "Q20 bases: 12922(92.9841%)",
- "reads passed filter: 99" ]
- def read_lines = [ "@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1",
- "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT",
- "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE
- { assert path(process.out.reads.get(0).get(1)).linesGzip.contains(read_line) }
- }
- },
- { failed_read_lines.each { failed_read_line ->
- { assert path(process.out.reads_fail.get(0).get(1)).linesGzip.contains(failed_read_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { assert snapshot(process.out.json).match("test_fastp_single_end_trim_fail_json") },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- { assert snapshot(process.out.versions).match("versions_single_end_trim_fail") }
+ { assert snapshot(process.out).match() }
)
}
}
- test("test_fastp_paired_end_trim_fail") {
+ test("test_fastp_paired_end_trim_fail - stub") {
+
+ options "-stub"
config './nextflow.save_failed.config'
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
- adapter_fasta = []
- save_trimmed_fail = true
- save_merged = false
-
input[0] = Channel.of([
[ id:'test', single_end:false ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = true
+ input[4] = false
"""
}
}
then {
- def html_text = [ "Q20 bases: | 25.719000 K (93.033098%)",
- "The input has little adapter percentage (~0.000000%), probably it's trimmed before."]
- def log_text = [ "No adapter detected for read1",
- "Q30 bases: 12281(88.3716%)"]
- def json_text = ['"passed_filter_reads": 162']
- def read1_lines = ["@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1",
- "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT",
- "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE
- { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) }
- }
- },
- { read2_lines.each { read2_line ->
- { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) }
- }
- },
- { failed_read2_lines.each { failed_read2_line ->
- { assert path(process.out.reads_fail.get(0).get(1).get(2)).linesGzip.contains(failed_read2_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { json_text.each { json_part ->
- { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) }
- }
- },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- { assert snapshot(process.out.versions).match("versions_paired_end_trim_fail") }
+ { assert snapshot(process.out).match() }
)
}
}
- test("test_fastp_paired_end_merged") {
+ test("test_fastp_paired_end_merged - stub") {
+
+ options "-stub"
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
- adapter_fasta = []
- save_trimmed_fail = false
- save_merged = true
input[0] = Channel.of([
[ id:'test', single_end:false ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = false
+ input[3] = false
+ input[4] = true
"""
}
}
then {
- def html_text = [ ""]
- def log_text = [ "Merged and filtered:",
- "total reads: 75",
- "total bases: 13683"]
- def json_text = ['"merged_and_filtered": {', '"total_reads": 75', '"total_bases": 13683']
- def read1_lines = [ "@ERR5069949.1066259 NS500628:121:HK3MMAFX2:1:11312:18369:8333/1",
- "CCTTATGACAGCAAGAACTGTGTATGATGATGGTGCTAGGAGAGTGTGGACACTTATGAATGTCTTGACACTCGTTTATAAAGTTTATTATGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTC",
- "AAAAAEAEEAEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEAEEEEEEEEEEEEEEEEE/EAEEEEEE/6EEEEEEEEEEAEEAEEE/EE/AEEAEEEEEAEEEA/EEAAEAE
- { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) }
- }
- },
- { read2_lines.each { read2_line ->
- { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) }
- }
- },
- { read_merged_lines.each { read_merged_line ->
- { assert path(process.out.reads_merged.get(0).get(1)).linesGzip.contains(read_merged_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { json_text.each { json_part ->
- { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) }
- }
- },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { it[1].collect { item -> file(item).getName() } } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_paired_end_merged_match")
- },
- { assert snapshot(process.out.versions).match("versions_paired_end_merged") }
+ { assert snapshot(process.out).match() }
)
}
}
- test("test_fastp_paired_end_merged-stub") {
+ test("test_fastp_paired_end_merged_adapterlist - stub") {
- options '-stub'
+ options "-stub"
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
- adapter_fasta = []
- save_trimmed_fail = false
- save_merged = true
-
input[0] = Channel.of([
[ id:'test', single_end:false ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = Channel.of([ file(params.modules_testdata_base_path + 'delete_me/fastp/adapters.fasta', checkIfExists: true) ])
+ input[2] = false
+ input[3] = false
+ input[4] = true
"""
}
}
@@ -625,101 +514,63 @@ nextflow_process {
then {
assertAll(
{ assert process.success },
- {
- assert snapshot(
- (
- [process.out.reads[0][0].toString()] + // meta
- process.out.reads.collect { it[1].collect { item -> file(item).getName() } } +
- process.out.json.collect { file(it[1]).getName() } +
- process.out.html.collect { file(it[1]).getName() } +
- process.out.log.collect { file(it[1]).getName() } +
- process.out.reads_fail.collect { file(it[1]).getName() } +
- process.out.reads_merged.collect { file(it[1]).getName() }
- ).sort()
- ).match("test_fastp_paired_end_merged-for_stub_match")
- },
- { assert snapshot(process.out.versions).match("versions_paired_end_merged_stub") }
+ { assert snapshot(process.out).match() }
)
}
}
- test("test_fastp_paired_end_merged_adapterlist") {
+ test("test_fastp_single_end_qc_only - stub") {
+
+ options "-stub"
when {
- params {
- outdir = "$outputDir"
- }
process {
"""
- adapter_fasta = Channel.of([ file(params.modules_testdata_base_path + 'delete_me/fastp/adapters.fasta', checkIfExists: true) ])
- save_trimmed_fail = false
- save_merged = true
+ input[0] = Channel.of([
+ [ id:'test', single_end:true ],
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ]
+ ])
+ input[1] = []
+ input[2] = true
+ input[3] = false
+ input[4] = false
+ """
+ }
+ }
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("test_fastp_paired_end_qc_only - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
input[0] = Channel.of([
[ id:'test', single_end:false ], // meta map
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
])
- input[1] = adapter_fasta
- input[2] = save_trimmed_fail
- input[3] = save_merged
+ input[1] = []
+ input[2] = true
+ input[3] = false
+ input[4] = false
"""
}
}
then {
- def html_text = [ ""]
- def log_text = [ "Merged and filtered:",
- "total reads: 75",
- "total bases: 13683"]
- def json_text = ['"merged_and_filtered": {', '"total_reads": 75', '"total_bases": 13683',"--adapter_fasta"]
- def read1_lines = ["@ERR5069949.1066259 NS500628:121:HK3MMAFX2:1:11312:18369:8333/1",
- "CCTTATGACAGCAAGAACTGTGTATGATGATGGTGCTAGGAGAGTGTGGACACTTATGAATGTCTTGACACTCGTTTATAAAGTTTATTATGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTC",
- "AAAAAEAEEAEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEAEEEEEEEEEEEEEEEEE/EAEEEEEE/6EEEEEEEEEEAEEAEEE/EE/AEEAEEEEEAEEEA/EEAAEAE
- { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) }
- }
- },
- { read2_lines.each { read2_line ->
- { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) }
- }
- },
- { read_merged_lines.each { read_merged_line ->
- { assert path(process.out.reads_merged.get(0).get(1)).linesGzip.contains(read_merged_line) }
- }
- },
- { html_text.each { html_part ->
- { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) }
- }
- },
- { json_text.each { json_part ->
- { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) }
- }
- },
- { log_text.each { log_part ->
- { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) }
- }
- },
- { assert snapshot(process.out.versions).match("versions_paired_end_merged_adapterlist") }
+ { assert snapshot(process.out).match() }
)
}
}
-}
+}
\ No newline at end of file
diff --git a/modules/nf-core/fastp/tests/main.nf.test.snap b/modules/nf-core/fastp/tests/main.nf.test.snap
index 3e876288..54be7e45 100644
--- a/modules/nf-core/fastp/tests/main.nf.test.snap
+++ b/modules/nf-core/fastp/tests/main.nf.test.snap
@@ -1,55 +1,178 @@
{
- "fastp test_fastp_interleaved_json": {
+ "test_fastp_single_end_qc_only - stub": {
"content": [
- [
- [
- {
- "id": "test",
- "single_end": true
- },
- "test.fastp.json:md5,b24e0624df5cc0b11cd5ba21b726fb22"
+ {
+ "0": [
+
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
- ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-03-18T16:19:15.063001"
+ "timestamp": "2024-07-05T14:31:10.841098"
},
- "test_fastp_paired_end_merged-for_stub_match": {
+ "test_fastp_paired_end": {
"content": [
[
[
- "test_1.fastp.fastq.gz",
- "test_2.fastp.fastq.gz"
- ],
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "test.merged.fastq.gz",
- "{id=test, single_end=false}"
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,67b2bbae47f073e05a97a9c2edce23c7",
+ "test_2.fastp.fastq.gz:md5,25cbdca08e2083dbd4f0502de6b62f39"
+ ]
+ ]
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-01-17T18:10:13.467574"
+ "timestamp": "2024-07-05T13:43:28.665779"
},
- "versions_interleaved": {
+ "test_fastp_paired_end_merged_adapterlist": {
"content": [
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,5914ca3f21ce162123a824e33e8564f6"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,54b726a55e992a869fd3fa778afe1672",
+ "test_2.fastp.fastq.gz:md5,29d3b33b869f7b63417b8ff07bb128ba"
+ ]
+ ]
+ ],
+ [
+
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.merged.fastq.gz:md5,c873bb1ab3fa859dcc47306465e749d5"
+ ]
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:56:24.615634793"
+ "timestamp": "2024-07-05T13:44:18.210375"
},
- "test_fastp_single_end_json": {
+ "test_fastp_single_end_qc_only": {
"content": [
[
[
@@ -57,274 +180,1152 @@
"id": "test",
"single_end": true
},
- "test.fastp.json:md5,c852d7a6dba5819e4ac8d9673bedcacc"
+ "test.fastp.json:md5,5cc5f01e449309e0e689ed6f51a2294a"
]
- ]
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-03-18T16:18:43.526412"
- },
- "versions_paired_end": {
- "content": [
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:55:42.333545689"
+ "timestamp": "2024-07-05T13:44:27.380974"
},
- "test_fastp_paired_end_match": {
+ "test_fastp_paired_end_trim_fail": {
"content": [
[
[
- "test_1.fastp.fastq.gz",
- "test_2.fastp.fastq.gz"
- ],
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "{id=test, single_end=false}"
- ]
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-02-01T12:03:06.431833729"
- },
- "test_fastp_interleaved-_match": {
- "content": [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,6ff32a64c5188b9a9192be1398c262c7",
+ "test_2.fastp.fastq.gz:md5,db0cb7c9977e94ac2b4b446ebd017a8a"
+ ]
+ ]
+ ],
[
- "test.fastp.fastq.gz",
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "{id=test, single_end=true}"
- ]
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-03-18T16:19:15.111894"
- },
- "test_fastp_paired_end_merged_match": {
- "content": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test.paired.fail.fastq.gz:md5,409b687c734cedd7a1fec14d316e1366",
+ "test_1.fail.fastq.gz:md5,4f273cf3159c13f79e8ffae12f5661f6",
+ "test_2.fail.fastq.gz:md5,f97b9edefb5649aab661fbc9e71fc995"
+ ]
+ ]
+ ],
+ [
+
+ ],
[
[
- "test_1.fastp.fastq.gz",
- "test_2.fastp.fastq.gz"
- ],
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "test.merged.fastq.gz",
- "{id=test, single_end=false}"
- ]
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-02-01T12:08:44.496251446"
- },
- "versions_single_end_stub": {
- "content": [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,4c3268ddb50ea5b33125984776aa3519"
+ ]
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:55:27.354051299"
+ "timestamp": "2024-07-05T13:43:58.749589"
},
- "versions_interleaved-stub": {
+ "fastp - stub test_fastp_interleaved": {
"content": [
- [
- "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:56:46.535528418"
+ "timestamp": "2024-07-05T13:50:00.270029"
},
- "versions_single_end_trim_fail": {
+ "test_fastp_single_end - stub": {
"content": [
- [
- "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:59:03.724591407"
+ "timestamp": "2024-07-05T13:49:42.502789"
},
- "test_fastp_paired_end-for_stub_match": {
+ "test_fastp_paired_end_merged_adapterlist - stub": {
"content": [
- [
- [
- "test_1.fastp.fastq.gz",
- "test_2.fastp.fastq.gz"
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
],
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "{id=test, single_end=false}"
- ]
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-01-17T18:07:15.398827"
+ "timestamp": "2024-07-05T13:54:53.458252"
},
- "versions_paired_end-stub": {
+ "test_fastp_paired_end_merged - stub": {
"content": [
- [
- "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:56:06.50017282"
+ "timestamp": "2024-07-05T13:50:27.689379"
},
- "versions_single_end": {
+ "test_fastp_paired_end_merged": {
"content": [
[
- "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
- ]
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-02-01T11:55:07.67921647"
- },
- "versions_paired_end_merged_stub": {
- "content": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,b712fd68ed0322f4bec49ff2a5237fcc"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,54b726a55e992a869fd3fa778afe1672",
+ "test_2.fastp.fastq.gz:md5,29d3b33b869f7b63417b8ff07bb128ba"
+ ]
+ ]
+ ],
+ [
+
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.merged.fastq.gz:md5,c873bb1ab3fa859dcc47306465e749d5"
+ ]
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:59:47.350653154"
+ "timestamp": "2024-07-05T13:44:08.68476"
},
- "test_fastp_interleaved-for_stub_match": {
+ "test_fastp_paired_end - stub": {
"content": [
- [
- "test.fastp.fastq.gz",
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "{id=test, single_end=true}"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-01-17T18:08:06.127974"
+ "timestamp": "2024-07-05T13:49:51.679221"
},
- "versions_paired_end_trim_fail": {
+ "test_fastp_single_end": {
"content": [
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,c852d7a6dba5819e4ac8d9673bedcacc"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,67b2bbae47f073e05a97a9c2edce23c7"
+ ]
+ ],
+ [
+
+ ],
+ [
+
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:59:18.140484878"
+ "timestamp": "2024-07-05T13:43:18.834322"
},
- "test_fastp_single_end-for_stub_match": {
+ "test_fastp_single_end_trim_fail - stub": {
"content": [
- [
- "test.fastp.fastq.gz",
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "{id=test, single_end=true}"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "reads_fail": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-01-17T18:06:00.244202"
+ "timestamp": "2024-07-05T14:05:36.898142"
},
- "test_fastp_single_end-_match": {
+ "test_fastp_paired_end_trim_fail - stub": {
"content": [
- [
- "test.fastp.fastq.gz",
- "test.fastp.html",
- "test.fastp.json",
- "test.fastp.log",
- "{id=test, single_end=true}"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test.paired.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_1.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "reads_fail": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ [
+ "test.paired.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_1.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940",
+ "test_2.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
+ ]
+ ]
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-03-18T16:18:43.580336"
+ "timestamp": "2024-07-05T14:05:49.212847"
},
- "versions_paired_end_merged_adapterlist": {
+ "fastp test_fastp_interleaved": {
"content": [
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,217d62dc13a23e92513a1bd8e1bcea39"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,b24e0624df5cc0b11cd5ba21b726fb22"
+ ]
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T12:05:37.845370554"
+ "timestamp": "2024-07-05T13:43:38.910832"
},
- "versions_paired_end_merged": {
+ "test_fastp_single_end_trim_fail": {
"content": [
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.json:md5,9a7ee180f000e8d00c7fb67f06293eb5"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fastp.fastq.gz:md5,67b2bbae47f073e05a97a9c2edce23c7"
+ ]
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.fail.fastq.gz:md5,3e4aaadb66a5b8fc9b881bf39c227abd"
+ ]
+ ],
+ [
+
+ ],
[
"versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-02-01T11:59:32.860543858"
+ "timestamp": "2024-07-05T13:43:48.22378"
},
- "test_fastp_single_end_trim_fail_json": {
+ "test_fastp_paired_end_qc_only": {
"content": [
[
[
{
"id": "test",
- "single_end": true
+ "single_end": false
},
- "test.fastp.json:md5,9a7ee180f000e8d00c7fb67f06293eb5"
+ "test.fastp.json:md5,623064a45912dac6f2b64e3f2e9901df"
]
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+
+ ],
+ [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
]
],
"meta": {
"nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nextflow": "24.04.2"
+ },
+ "timestamp": "2024-07-05T13:44:36.334938"
+ },
+ "test_fastp_paired_end_qc_only - stub": {
+ "content": [
+ {
+ "0": [
+
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "3": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "4": [
+
+ ],
+ "5": [
+
+ ],
+ "6": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "json": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "log": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "reads": [
+
+ ],
+ "reads_fail": [
+
+ ],
+ "reads_merged": [
+
+ ],
+ "versions": [
+ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.04.2"
},
- "timestamp": "2024-01-17T18:08:41.942317"
+ "timestamp": "2024-07-05T14:31:27.096468"
}
}
\ No newline at end of file
diff --git a/modules/nf-core/fastqc/environment.yml b/modules/nf-core/fastqc/environment.yml
index 1787b38a..691d4c76 100644
--- a/modules/nf-core/fastqc/environment.yml
+++ b/modules/nf-core/fastqc/environment.yml
@@ -1,7 +1,5 @@
-name: fastqc
channels:
- conda-forge
- bioconda
- - defaults
dependencies:
- bioconda::fastqc=0.12.1
diff --git a/modules/nf-core/fastqc/main.nf b/modules/nf-core/fastqc/main.nf
index d79f1c86..d8989f48 100644
--- a/modules/nf-core/fastqc/main.nf
+++ b/modules/nf-core/fastqc/main.nf
@@ -26,7 +26,10 @@ process FASTQC {
def rename_to = old_new_pairs*.join(' ').join(' ')
def renamed_files = old_new_pairs.collect{ old_name, new_name -> new_name }.join(' ')
- def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB')
+ // The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory)
+ // https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222
+ // Dividing the task.memory by task.cpu allows to stick to requested amount of RAM in the label
+ def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB') / task.cpus
// FastQC memory value allowed range (100 - 10000)
def fastqc_memory = memory_in_mb > 10000 ? 10000 : (memory_in_mb < 100 ? 100 : memory_in_mb)
diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml
index ee5507e0..4827da7a 100644
--- a/modules/nf-core/fastqc/meta.yml
+++ b/modules/nf-core/fastqc/meta.yml
@@ -16,35 +16,44 @@ tools:
homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
licence: ["GPL-2.0-only"]
+ identifier: biotools:fastqc
input:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- - reads:
- type: file
- description: |
- List of input FastQ files of size 1 and 2 for single-end and paired-end data,
- respectively.
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - reads:
+ type: file
+ description: |
+ List of input FastQ files of size 1 and 2 for single-end and paired-end data,
+ respectively.
output:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- html:
- type: file
- description: FastQC report
- pattern: "*_{fastqc.html}"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.html":
+ type: file
+ description: FastQC report
+ pattern: "*_{fastqc.html}"
- zip:
- type: file
- description: FastQC report archive
- pattern: "*_{fastqc.zip}"
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - "*.zip":
+ type: file
+ description: FastQC report archive
+ pattern: "*_{fastqc.zip}"
- versions:
- type: file
- description: File containing software versions
- pattern: "versions.yml"
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
authors:
- "@drpatelh"
- "@grst"
diff --git a/modules/nf-core/fastqc/tests/main.nf.test b/modules/nf-core/fastqc/tests/main.nf.test
index 70edae4d..e9d79a07 100644
--- a/modules/nf-core/fastqc/tests/main.nf.test
+++ b/modules/nf-core/fastqc/tests/main.nf.test
@@ -23,17 +23,14 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
-
- // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it.
- // looks like this:
- // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039
-
- { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
- { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
- { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
-
- { assert snapshot(process.out.versions).match("fastqc_versions_single") }
+ { assert process.success },
+ // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it.
+ // looks like this:
+ // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039
+ { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
+ { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
+ { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
+ { assert snapshot(process.out.versions).match() }
)
}
}
@@ -54,16 +51,14 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
-
- { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" },
- { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" },
- { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" },
- { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" },
- { assert path(process.out.html[0][1][0]).text.contains("File type | Conventional base calls | ") },
- { assert path(process.out.html[0][1][1]).text.contains("File type | Conventional base calls | ") },
-
- { assert snapshot(process.out.versions).match("fastqc_versions_paired") }
+ { assert process.success },
+ { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" },
+ { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" },
+ { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" },
+ { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" },
+ { assert path(process.out.html[0][1][0]).text.contains("File type | Conventional base calls | ") },
+ { assert path(process.out.html[0][1][1]).text.contains("File type | Conventional base calls | ") },
+ { assert snapshot(process.out.versions).match() }
)
}
}
@@ -83,13 +78,11 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
-
- { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
- { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
- { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
-
- { assert snapshot(process.out.versions).match("fastqc_versions_interleaved") }
+ { assert process.success },
+ { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
+ { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
+ { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
+ { assert snapshot(process.out.versions).match() }
)
}
}
@@ -109,13 +102,11 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
-
- { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
- { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
- { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
-
- { assert snapshot(process.out.versions).match("fastqc_versions_bam") }
+ { assert process.success },
+ { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
+ { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
+ { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
+ { assert snapshot(process.out.versions).match() }
)
}
}
@@ -138,22 +129,20 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
-
- { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" },
- { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" },
- { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" },
- { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" },
- { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" },
- { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" },
- { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" },
- { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" },
- { assert path(process.out.html[0][1][0]).text.contains("File type | Conventional base calls | ") },
- { assert path(process.out.html[0][1][1]).text.contains("File type | Conventional base calls | ") },
- { assert path(process.out.html[0][1][2]).text.contains("File type | Conventional base calls | ") },
- { assert path(process.out.html[0][1][3]).text.contains("File type | Conventional base calls | ") },
-
- { assert snapshot(process.out.versions).match("fastqc_versions_multiple") }
+ { assert process.success },
+ { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" },
+ { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" },
+ { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" },
+ { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" },
+ { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" },
+ { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" },
+ { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" },
+ { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" },
+ { assert path(process.out.html[0][1][0]).text.contains("File type | Conventional base calls | ") },
+ { assert path(process.out.html[0][1][1]).text.contains("File type | Conventional base calls | ") },
+ { assert path(process.out.html[0][1][2]).text.contains("File type | Conventional base calls | ") },
+ { assert path(process.out.html[0][1][3]).text.contains("File type | Conventional base calls | ") },
+ { assert snapshot(process.out.versions).match() }
)
}
}
@@ -173,21 +162,18 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
-
- { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" },
- { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" },
- { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
-
- { assert snapshot(process.out.versions).match("fastqc_versions_custom_prefix") }
+ { assert process.success },
+ { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" },
+ { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" },
+ { assert path(process.out.html[0][1]).text.contains("File type | Conventional base calls | ") },
+ { assert snapshot(process.out.versions).match() }
)
}
}
test("sarscov2 single-end [fastq] - stub") {
- options "-stub"
-
+ options "-stub"
when {
process {
"""
@@ -201,12 +187,123 @@ nextflow_process {
then {
assertAll (
- { assert process.success },
- { assert snapshot(process.out.html.collect { file(it[1]).getName() } +
- process.out.zip.collect { file(it[1]).getName() } +
- process.out.versions ).match("fastqc_stub") }
+ { assert process.success },
+ { assert snapshot(process.out).match() }
)
}
}
+ test("sarscov2 paired-end [fastq] - stub") {
+
+ options "-stub"
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [id: 'test', single_end: false], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ]
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("sarscov2 interleaved [fastq] - stub") {
+
+ options "-stub"
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [id: 'test', single_end: false], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true)
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("sarscov2 paired-end [bam] - stub") {
+
+ options "-stub"
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [id: 'test', single_end: false], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true)
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("sarscov2 multiple [fastq] - stub") {
+
+ options "-stub"
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [id: 'test', single_end: false], // meta map
+ [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) ]
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("sarscov2 custom_prefix - stub") {
+
+ options "-stub"
+ when {
+ process {
+ """
+ input[0] = Channel.of([
+ [ id:'mysample', single_end:true ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ])
+ """
+ }
+ }
+
+ then {
+ assertAll (
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
}
diff --git a/modules/nf-core/fastqc/tests/main.nf.test.snap b/modules/nf-core/fastqc/tests/main.nf.test.snap
index 86f7c311..d5db3092 100644
--- a/modules/nf-core/fastqc/tests/main.nf.test.snap
+++ b/modules/nf-core/fastqc/tests/main.nf.test.snap
@@ -1,88 +1,392 @@
{
- "fastqc_versions_interleaved": {
+ "sarscov2 custom_prefix": {
"content": [
[
"versions.yml:md5,e1cc25ca8af856014824abd842e93978"
]
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:40:07.293713"
+ "timestamp": "2024-07-22T11:02:16.374038"
},
- "fastqc_stub": {
+ "sarscov2 single-end [fastq] - stub": {
"content": [
- [
- "test.html",
- "test.zip",
- "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
- ]
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "zip": [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
+ },
+ "timestamp": "2024-07-22T11:02:24.993809"
+ },
+ "sarscov2 custom_prefix - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "mysample",
+ "single_end": true
+ },
+ "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "mysample",
+ "single_end": true
+ },
+ "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "html": [
+ [
+ {
+ "id": "mysample",
+ "single_end": true
+ },
+ "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "zip": [
+ [
+ {
+ "id": "mysample",
+ "single_end": true
+ },
+ "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ }
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:31:01.425198"
+ "timestamp": "2024-07-22T11:03:10.93942"
},
- "fastqc_versions_multiple": {
+ "sarscov2 interleaved [fastq]": {
"content": [
[
"versions.yml:md5,e1cc25ca8af856014824abd842e93978"
]
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:40:55.797907"
+ "timestamp": "2024-07-22T11:01:42.355718"
},
- "fastqc_versions_bam": {
+ "sarscov2 paired-end [bam]": {
"content": [
[
"versions.yml:md5,e1cc25ca8af856014824abd842e93978"
]
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:40:26.795862"
+ "timestamp": "2024-07-22T11:01:53.276274"
},
- "fastqc_versions_single": {
+ "sarscov2 multiple [fastq]": {
"content": [
[
"versions.yml:md5,e1cc25ca8af856014824abd842e93978"
]
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:39:27.043675"
+ "timestamp": "2024-07-22T11:02:05.527626"
},
- "fastqc_versions_paired": {
+ "sarscov2 paired-end [fastq]": {
"content": [
[
"versions.yml:md5,e1cc25ca8af856014824abd842e93978"
]
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
+ },
+ "timestamp": "2024-07-22T11:01:31.188871"
+ },
+ "sarscov2 paired-end [fastq] - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "zip": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
+ },
+ "timestamp": "2024-07-22T11:02:34.273566"
+ },
+ "sarscov2 multiple [fastq] - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "zip": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:39:47.584191"
+ "timestamp": "2024-07-22T11:03:02.304411"
},
- "fastqc_versions_custom_prefix": {
+ "sarscov2 single-end [fastq]": {
"content": [
[
"versions.yml:md5,e1cc25ca8af856014824abd842e93978"
]
],
"meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
+ },
+ "timestamp": "2024-07-22T11:01:19.095607"
+ },
+ "sarscov2 interleaved [fastq] - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "zip": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
+ },
+ "timestamp": "2024-07-22T11:02:44.640184"
+ },
+ "sarscov2 paired-end [bam] - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "2": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "html": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.html:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,e1cc25ca8af856014824abd842e93978"
+ ],
+ "zip": [
+ [
+ {
+ "id": "test",
+ "single_end": false
+ },
+ "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.3"
},
- "timestamp": "2024-01-31T17:41:14.576531"
+ "timestamp": "2024-07-22T11:02:53.550742"
}
}
\ No newline at end of file
diff --git a/modules/nf-core/gawk/environment.yml b/modules/nf-core/gawk/environment.yml
new file mode 100644
index 00000000..315f6dc6
--- /dev/null
+++ b/modules/nf-core/gawk/environment.yml
@@ -0,0 +1,5 @@
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - conda-forge::gawk=5.3.0
diff --git a/modules/nf-core/gawk/main.nf b/modules/nf-core/gawk/main.nf
new file mode 100644
index 00000000..ca468929
--- /dev/null
+++ b/modules/nf-core/gawk/main.nf
@@ -0,0 +1,55 @@
+process GAWK {
+ tag "$meta.id"
+ label 'process_single'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/gawk:5.3.0' :
+ 'biocontainers/gawk:5.3.0' }"
+
+ input:
+ tuple val(meta), path(input)
+ path(program_file)
+
+ output:
+ tuple val(meta), path("${prefix}.${suffix}"), emit: output
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: '' // args is used for the main arguments of the tool
+ def args2 = task.ext.args2 ?: '' // args2 is used to specify a program when no program file has been given
+ prefix = task.ext.prefix ?: "${meta.id}"
+ suffix = task.ext.suffix ?: "${input.getExtension()}"
+
+ program = program_file ? "-f ${program_file}" : "${args2}"
+
+ """
+ awk \\
+ ${args} \\
+ ${program} \\
+ ${input} \\
+ > ${prefix}.${suffix}
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//')
+ END_VERSIONS
+ """
+
+ stub:
+ prefix = task.ext.prefix ?: "${meta.id}"
+ suffix = task.ext.suffix ?: "${input.getExtension()}"
+ def create_cmd = suffix.endsWith("gz") ? "echo '' | gzip >" : "touch"
+
+ """
+ ${create_cmd} ${prefix}.${suffix}
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//')
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/gawk/meta.yml b/modules/nf-core/gawk/meta.yml
new file mode 100644
index 00000000..05170082
--- /dev/null
+++ b/modules/nf-core/gawk/meta.yml
@@ -0,0 +1,56 @@
+name: "gawk"
+description: |
+ If you are like many computer users, you would frequently like to make changes in various text files
+ wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest.
+ The job is easy with awk, especially the GNU implementation gawk.
+keywords:
+ - gawk
+ - awk
+ - txt
+ - text
+ - file parsing
+tools:
+ - "gawk":
+ description: "GNU awk"
+ homepage: "https://www.gnu.org/software/gawk/"
+ documentation: "https://www.gnu.org/software/gawk/manual/"
+ tool_dev_url: "https://www.gnu.org/prep/ftp.html"
+ licence: ["GPL v3"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - input:
+ type: file
+ description: The input file - Specify the logic that needs to be executed on
+ this file on the `ext.args2` or in the program file
+ pattern: "*"
+ - - program_file:
+ type: file
+ description: Optional file containing logic for awk to execute. If you don't
+ wish to use a file, you can use `ext.args2` to specify the logic.
+ pattern: "*"
+output:
+ - output:
+ - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ - ${prefix}.${suffix}:
+ type: file
+ description: The output file - specify the name of this file using `ext.prefix`
+ and the extension using `ext.suffix`
+ pattern: "*"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@nvnieuwk"
+maintainers:
+ - "@nvnieuwk"
diff --git a/modules/nf-core/gawk/tests/main.nf.test b/modules/nf-core/gawk/tests/main.nf.test
new file mode 100644
index 00000000..fce82ca9
--- /dev/null
+++ b/modules/nf-core/gawk/tests/main.nf.test
@@ -0,0 +1,56 @@
+nextflow_process {
+
+ name "Test Process GAWK"
+ script "../main.nf"
+ process "GAWK"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "gawk"
+
+ test("convert fasta to bed") {
+ config "./nextflow.config"
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test' ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta.fai', checkIfExists: true)
+ ]
+ input[1] = []
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+
+ test("convert fasta to bed with program file") {
+ config "./nextflow_with_program_file.config"
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test' ], // meta map
+ file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta.fai', checkIfExists: true)
+ ]
+ input[1] = Channel.of('BEGIN {FS="\t"}; {print \$1 FS "0" FS \$2}').collectFile(name:"program.txt")
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/gawk/tests/main.nf.test.snap b/modules/nf-core/gawk/tests/main.nf.test.snap
new file mode 100644
index 00000000..4f3a759c
--- /dev/null
+++ b/modules/nf-core/gawk/tests/main.nf.test.snap
@@ -0,0 +1,68 @@
+{
+ "convert fasta to bed with program file": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,842acc9870dc8ac280954047cb2aa23a"
+ ],
+ "output": [
+ [
+ {
+ "id": "test"
+ },
+ "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,842acc9870dc8ac280954047cb2aa23a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.03.0"
+ },
+ "timestamp": "2024-05-17T15:20:02.495430346"
+ },
+ "convert fasta to bed": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,842acc9870dc8ac280954047cb2aa23a"
+ ],
+ "output": [
+ [
+ {
+ "id": "test"
+ },
+ "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,842acc9870dc8ac280954047cb2aa23a"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.8.4",
+ "nextflow": "24.03.0"
+ },
+ "timestamp": "2024-05-17T15:19:53.291809648"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/gawk/tests/nextflow.config b/modules/nf-core/gawk/tests/nextflow.config
new file mode 100644
index 00000000..6e5d43a3
--- /dev/null
+++ b/modules/nf-core/gawk/tests/nextflow.config
@@ -0,0 +1,6 @@
+process {
+ withName: GAWK {
+ ext.suffix = "bed"
+ ext.args2 = '\'BEGIN {FS="\t"}; {print \$1 FS "0" FS \$2}\''
+ }
+}
diff --git a/modules/nf-core/gawk/tests/nextflow_with_program_file.config b/modules/nf-core/gawk/tests/nextflow_with_program_file.config
new file mode 100644
index 00000000..693ad419
--- /dev/null
+++ b/modules/nf-core/gawk/tests/nextflow_with_program_file.config
@@ -0,0 +1,5 @@
+process {
+ withName: GAWK {
+ ext.suffix = "bed"
+ }
+}
diff --git a/modules/nf-core/gawk/tests/tags.yml b/modules/nf-core/gawk/tests/tags.yml
new file mode 100644
index 00000000..72e4531d
--- /dev/null
+++ b/modules/nf-core/gawk/tests/tags.yml
@@ -0,0 +1,2 @@
+gawk:
+ - "modules/nf-core/gawk/**"
diff --git a/modules/nf-core/mirdeep2/mapper/environment.yml b/modules/nf-core/mirdeep2/mapper/environment.yml
new file mode 100644
index 00000000..fafc6663
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mapper/environment.yml
@@ -0,0 +1,7 @@
+---
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - "bioconda::mirdeep2=2.0.1.2"
diff --git a/modules/nf-core/mirdeep2/mapper/main.nf b/modules/nf-core/mirdeep2/mapper/main.nf
new file mode 100644
index 00000000..d52820a3
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mapper/main.nf
@@ -0,0 +1,53 @@
+process MIRDEEP2_MAPPER {
+ tag "$meta.id"
+ label 'process_medium'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/mirdeep2:2.0.1.2--0':
+ 'biocontainers/mirdeep2:2.0.1.2--0' }"
+
+ input:
+ tuple val(meta), path(reads)
+ tuple val(meta2), path(index, stageAs: '*')
+
+ output:
+ tuple val(meta), path('*.fa'), path('*.arf'), emit: outputs
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def VERSION = '2.0.1'
+
+ """
+ mapper.pl \\
+ ${reads} \\
+ $args \\
+ -p ${index}/${meta2.id} \\
+ -s ${prefix}_collapsed.fa \\
+ -t ${prefix}_reads_collapsed_vs_${meta2.id}_genome.arf
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ mirdeep2: \$(echo "$VERSION")
+ END_VERSIONS
+ """
+
+ stub:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def VERSION = '2.0.1'
+ """
+ touch ${prefix}.fa
+ touch ${prefix}reads_vs_refdb.arf
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ mirdeep2: \$(echo "$VERSION")
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/mirdeep2/mapper/meta.yml b/modules/nf-core/mirdeep2/mapper/meta.yml
new file mode 100644
index 00000000..a482c480
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mapper/meta.yml
@@ -0,0 +1,59 @@
+name: "mirdeep2_mapper"
+description: |
+ miRDeep2 Mapper is a tool that prepares deep sequencing reads for downstream miRNA detection by collapsing reads, mapping them to a genome, and outputting the required files for miRNA discovery.
+keywords:
+ - mirdeep2
+ - mapper
+ - RNA sequencing
+tools:
+ - "mirdeep2":
+ description: |
+ miRDeep2 Mapper (`mapper.pl`) is part of the miRDeep2 suite. It collapses identical reads, maps them to a reference genome, and outputs both collapsed FASTA and ARF files for downstream miRNA detection and analysis.
+ homepage: "https://www.mdc-berlin.de/content/mirdeep2-documentation"
+ documentation: "https://www.mdc-berlin.de/content/mirdeep2-documentation"
+ tool_dev_url: "https://github.com/rajewsky-lab/mirdeep2"
+ doi: "10.1093/nar/gkn491"
+ licence: ["GPL V3"]
+ identifier: biotools:mirdeep2
+
+input:
+ - - meta:
+ type: map
+ description: Groovy Map containing sample information, e.g. `[ id:'sample1',
+ single_end:false ]`
+ - reads:
+ type: file
+ description: File containing the raw sequencing reads that need to be collapsed
+ and mapped to a reference genome.
+ pattern: "*.fa"
+ - - meta2:
+ type: map
+ description: Groovy Map containing information about the genome index.
+ - index:
+ type: file
+ description: Path to the genome index file used for mapping the reads to the
+ genome.
+ pattern: "*"
+output:
+ - outputs:
+ - meta:
+ type: map
+ description: Groovy Map containing sample information, e.g. `[ id:'sample1', single_end:false ]`
+ - "*.fa":
+ type: file
+ description: Collapsed reads in FASTA format.
+ pattern: "*.fa"
+ - "*.arf":
+ type: file
+ description: Alignment Read Format (ARF) file containing the mapping of reads
+ to the genome.
+ pattern: "*.arf"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions for tracking.
+ pattern: "versions.yml"
+authors:
+ - "@atrigila"
+maintainers:
+ - "@atrigila"
diff --git a/modules/nf-core/mirdeep2/mapper/tests/main.nf.test b/modules/nf-core/mirdeep2/mapper/tests/main.nf.test
new file mode 100644
index 00000000..62e3e615
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mapper/tests/main.nf.test
@@ -0,0 +1,141 @@
+
+nextflow_process {
+
+ name "Test Process MIRDEEP2_MAPPER"
+ script "../main.nf"
+ process "MIRDEEP2_MAPPER"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "mirdeep2"
+ tag "bowtie/build"
+ tag "mirdeep2/mapper"
+ tag "seqkit/fq2fa"
+ tag "seqkit/replace"
+
+
+ setup {
+ run("BOWTIE_BUILD") {
+ script "../../../bowtie/build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'genome_cel_cluster' ], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/cel_cluster.fa', checkIfExists: true)
+ ]
+ """
+ }
+ }
+
+ run("SEQKIT_FQ2FA") {
+ script "../../../seqkit/fq2fa/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'small_Clone1_N1' ], // meta map
+ file('https://github.com/nf-core/test-datasets/raw/smrnaseq/testdata/trimmed/small_Clone1_N1.fastp.fastq.gz', checkIfExists: true)
+ ]
+ """
+ }
+ }
+
+ run("SEQKIT_REPLACE") {
+ script "../../../seqkit/replace/main.nf"
+ config "./nextflow.config"
+ process {
+ """
+ input[0] = SEQKIT_FQ2FA.out.fasta
+ """
+ }
+ }
+
+ }
+
+ test("mirdeep2 - mapper - fasta celegans") {
+ config "./nextflow.config"
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test_reads', single_end:false ], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/reads.fa', checkIfExists: true)
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out.versions).match() },
+
+ // md5sum not stable - IDs change while sequences are the same
+
+ // Assert TCACCGGGGGTACATCAGCTAA occurs once
+ { assert file(process.out.outputs[0][1]).readLines().findAll { it.contains("TCACCGGGGGTACATCAGCTAA") }.size() == 1 },
+
+ // Assert seq_347479_x287 occurs once
+ { assert file(process.out.outputs[0][1]).readLines().findAll { it.contains("seq_347479_x287") }.size() == 1 },
+
+ // Assert that specific content occurs 4 times
+ { assert file(process.out.outputs[0][2]).readLines().findAll { it.contains("21\t1\t21\ttcaccgggtgtaaatcagctt\tchrII:11534525-11540624\t21\t3535\t3555\ttcaccgggtgtaaatcagctt\t+\t0\tmmmmmmmmmmmmmmmmmmmmm") }.size() == 4 }
+ )
+ }
+
+ }
+
+ test("mirdeep2 - mapper - fasta smrnaseq") {
+ config "./nextflow.config"
+
+ when {
+ process {
+ """
+ input[0] = SEQKIT_REPLACE.out.fastx
+ input[1] = BOWTIE_BUILD.out.index
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+
+ // Assert reads occurs once
+ { assert file(process.out.outputs[0][1]).readLines().findAll { it.contains("TACCTGAGGTAGCAGGTTGTATAGTTGGGG") }.size() == 1 },
+
+ // Assert ID occurs once
+ { assert file(process.out.outputs[0][1]).readLines().findAll { it.contains("seq_996152_x1") }.size() == 1 }
+
+ )
+ }
+
+ }
+
+ test("mirdeep2 - fasta - stub") {
+
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = [
+ [ id:'test_reads', single_end:false ], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/reads.fa', checkIfExists: true)
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out).match() }
+ )
+ }
+
+ }
+
+}
diff --git a/modules/nf-core/mirdeep2/mapper/tests/main.nf.test.snap b/modules/nf-core/mirdeep2/mapper/tests/main.nf.test.snap
new file mode 100644
index 00000000..4c3697d9
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mapper/tests/main.nf.test.snap
@@ -0,0 +1,51 @@
+{
+ "mirdeep2 - fasta - stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test_reads",
+ "single_end": false
+ },
+ "test_reads.fa:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "test_readsreads_vs_refdb.arf:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,33c794292d6772d67fa8001439394614"
+ ],
+ "outputs": [
+ [
+ {
+ "id": "test_reads",
+ "single_end": false
+ },
+ "test_reads.fa:md5,d41d8cd98f00b204e9800998ecf8427e",
+ "test_readsreads_vs_refdb.arf:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,33c794292d6772d67fa8001439394614"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-20T20:58:19.544297445"
+ },
+ "mirdeep2 - mapper - fasta celegans": {
+ "content": [
+ [
+ "versions.yml:md5,33c794292d6772d67fa8001439394614"
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.9.0",
+ "nextflow": "24.04.4"
+ },
+ "timestamp": "2024-09-17T17:41:05.101661825"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/mirdeep2/mapper/tests/nextflow.config b/modules/nf-core/mirdeep2/mapper/tests/nextflow.config
new file mode 100644
index 00000000..ec097561
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mapper/tests/nextflow.config
@@ -0,0 +1,11 @@
+process {
+ withName: 'MIRDEEP2_MAPPER' {
+ ext.args = "-c -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -v"
+ }
+
+ withName: 'SEQKIT_REPLACE' {
+ ext.args = "-p '\s.+'"
+ ext.suffix = "fasta"
+ }
+
+}
diff --git a/modules/nf-core/mirdeep2/mirdeep2/environment.yml b/modules/nf-core/mirdeep2/mirdeep2/environment.yml
new file mode 100644
index 00000000..fafc6663
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mirdeep2/environment.yml
@@ -0,0 +1,7 @@
+---
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - "bioconda::mirdeep2=2.0.1.2"
diff --git a/modules/nf-core/mirdeep2/mirdeep2/main.nf b/modules/nf-core/mirdeep2/mirdeep2/main.nf
new file mode 100644
index 00000000..66c85968
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mirdeep2/main.nf
@@ -0,0 +1,64 @@
+process MIRDEEP2_MIRDEEP2 {
+ tag "$meta.id"
+ label 'process_medium'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/mirdeep2:2.0.1.2--0':
+ 'biocontainers/mirdeep2:2.0.1.2--0' }"
+
+ input:
+ tuple val(meta), path(processed_reads), path(genome_mappings)
+ tuple val(meta2), path(fasta)
+ tuple val(meta3), path(mature), path(hairpin), path(mature_other_species)
+
+ output:
+ tuple val(meta), path("result*.{bed,csv,html}") , emit: outputs
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def VERSION = '2.0.1'
+ def mature_species = mature ? "${mature}" : "none"
+ def mature_other = mature_other_species ? "${mature_other_species}": "none"
+ def precursors = hairpin ? "${hairpin}" : "none"
+
+ """
+ miRDeep2.pl \\
+ $processed_reads \\
+ $fasta \\
+ $genome_mappings \\
+ $mature_species \\
+ $mature_other \\
+ $precursors \\
+ $args
+
+ mv result_*.bed result_${prefix}.bed
+ mv result_*.csv result_${prefix}.csv
+ mv result_*.html result_${prefix}.html
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ mirdeep2: \$(echo "$VERSION")
+ END_VERSIONS
+ """
+
+ stub:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def VERSION = '2.0.1'
+ """
+ touch result_${prefix}.html
+ touch result_${prefix}.bed
+ touch result_${prefix}.csv
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ mirdeep2: \$(echo "$VERSION")
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/mirdeep2/mirdeep2/meta.yml b/modules/nf-core/mirdeep2/mirdeep2/meta.yml
new file mode 100644
index 00000000..adf14101
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mirdeep2/meta.yml
@@ -0,0 +1,76 @@
+name: "mirdeep2_mirdeep2"
+description: |
+ miRDeep2 is a tool for identifying known and novel miRNAs in deep sequencing data by analyzing sequenced RNAs. It integrates the mapping of sequencing reads to the genome and predicts miRNA precursors and mature miRNAs.
+keywords:
+ - mirdeep2
+ - miRNA
+ - RNA sequencing
+tools:
+ - "mirdeep2":
+ description: |
+ miRDeep2 is a tool that discovers microRNA genes by analyzing sequenced RNAs.
+ It includes three main scripts: `miRDeep2.pl`, `mapper.pl`, and `quantifier.pl` for comprehensive miRNA detection and quantification.
+ homepage: "https://www.mdc-berlin.de/content/mirdeep2-documentation"
+ documentation: "https://www.mdc-berlin.de/content/mirdeep2-documentation"
+ tool_dev_url: "https://github.com/rajewsky-lab/mirdeep2"
+ doi: "10.1093/nar/gkn491"
+ licence: ["GPL V3"]
+ identifier: biotools:mirdeep2
+
+input:
+ - - meta:
+ type: map
+ description: Groovy Map containing sample information, e.g. `[ id:'sample1',
+ single_end:false ]`
+ - processed_reads:
+ type: file
+ description: FASTA file containing the processed sequencing reads.
+ pattern: "*.fa"
+ - genome_mappings:
+ type: file
+ description: ARF format file with mapped reads to the genome.
+ pattern: "*.arf"
+ - - meta2:
+ type: map
+ description: Groovy Map for genome FASTA file metadata, e.g. `[ id:'genome']`
+ - fasta:
+ type: file
+ description: FASTA file of the corresponding genome.
+ pattern: "*.fa"
+ - - meta3:
+ type: map
+ description: Groovy Map for miRNA metadata, e.g. `[ id:'mirbase', single_end:false
+ ]`
+ - mature:
+ type: file
+ description: FASTA file containing known mature miRNAs of the species being
+ analyzed.
+ pattern: "*.fa"
+ - hairpin:
+ type: file
+ description: FASTA file containing hairpin sequences (miRNA precursors).
+ pattern: "*.fa"
+ - mature_other_species:
+ type: file
+ description: FASTA file containing known mature miRNAs of other species.
+ pattern: "*.fa"
+output:
+ - outputs:
+ - meta:
+ type: map
+ description: Groovy Map containing sample information e.g. `[ id:'sample1',
+ single_end:false ]`
+ - result*.{bed,csv,html}:
+ type: file
+ description: Output files, including BED, CSV, and HTML results files with an
+ overview of detected miRNAs.
+ pattern: "result*.{bed,csv,html}"
+ - versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+authors:
+ - "@atrigila"
+maintainers:
+ - "@atrigila"
diff --git a/modules/nf-core/mirdeep2/mirdeep2/tests/main.nf.test b/modules/nf-core/mirdeep2/mirdeep2/tests/main.nf.test
new file mode 100644
index 00000000..b7b73ec1
--- /dev/null
+++ b/modules/nf-core/mirdeep2/mirdeep2/tests/main.nf.test
@@ -0,0 +1,111 @@
+nextflow_process {
+
+ name "Test Process MIRDEEP2_MIRDEEP2"
+ script "../main.nf"
+ process "MIRDEEP2_MIRDEEP2"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "mirdeep2"
+ tag "mirdeep2/mirdeep2"
+ tag "bowtie/build"
+ tag "mirdeep2/mapper"
+
+
+ setup {
+ run("BOWTIE_BUILD") {
+ script "../../../bowtie/build/main.nf"
+ process {
+ """
+ input[0] = [
+ [ id:'genome_cel_cluster' ], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/cel_cluster.fa', checkIfExists: true)
+ ]
+ """
+ }
+ }
+
+ run("MIRDEEP2_MAPPER") {
+ script "../../../mirdeep2/mapper/main.nf"
+ config "./nextflow.config"
+
+ process {
+ """
+ input[0] = [
+ [ id:'test_reads', single_end:false ], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/reads.fa', checkIfExists: true)
+ ]
+ input[1] = BOWTIE_BUILD.out.index
+ """
+ }
+ }
+
+ }
+
+ test("mirdeep2 - mirdeep2 - fa") {
+
+ when {
+ process {
+ """
+ input[0] = MIRDEEP2_MAPPER.out.outputs
+ input[1] = [
+ [ id:'genome_cel_cluster' ], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/cel_cluster.fa', checkIfExists: true)
+ ]
+ input[2] = [
+ [ id:'hairpin_mature'], // meta map
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/mature_ref_this_species.fa', checkIfExists: true),
+ file('https://github.com/rajewsky-lab/mirdeep2/raw/master/tutorial_dir/precursors_ref_this_species.fa', checkIfExists: true),
+ []
+ ]
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(process.out.versions,
+ path(process.out.outputs.get(0).get(1)[2]).readLines().last().contains(' |