diff --git a/.editorconfig b/.editorconfig new file mode 100644 index 0000000..d31bd2a --- /dev/null +++ b/.editorconfig @@ -0,0 +1,31 @@ +root = true + +[*] +charset = utf-8 +end_of_line = lf +insert_final_newline = true +trim_trailing_whitespace = true +indent_size = 4 +indent_style = space + +[*.{yml,yaml}] +indent_size = 2 + +[*.json] +insert_final_newline = unset + +# These files are edited and tested upstream in nf-core/modules +[/modules/nf-core/**] +charset = unset +end_of_line = unset +insert_final_newline = unset +trim_trailing_whitespace = unset +indent_style = unset +indent_size = unset + +[/assets/email*] +indent_size = unset + +[/bin/*.{R,r}] +indent_style = space +indent_size = 2 diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml index 030138a..191fabd 100644 --- a/.github/.dockstore.yml +++ b/.github/.dockstore.yml @@ -3,3 +3,4 @@ version: 1.2 workflows: - subclass: nfl primaryDescriptorPath: /nextflow.config + publish: True diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 0d3b729..1d3f596 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -18,8 +18,9 @@ If you'd like to write some code for nf-core/scflow, the standard workflow is as 1. Check that there isn't already an issue about your idea in the [nf-core/scflow issues](https://github.com/nf-core/scflow/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/scflow repository](https://github.com/nf-core/scflow) to your GitHub account -3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) +4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -30,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t There are typically two types of tests that run: -### Lint Tests +### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. -### Pipeline Tests +### Pipeline tests Each `nf-core` pipeline should be set up with a minimal set of test-data. `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. @@ -55,3 +56,73 @@ These tests are run both with the latest available version of `Nextflow` and als ## Getting help For further information/help, please consult the [nf-core/scflow documentation](https://nf-co.re/scflow/usage) and don't hesitate to get in touch on the nf-core Slack [#scflow](https://nfcore.slack.com/channels/scflow) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Pipeline contribution conventions + +To make the nf-core/scflow code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. + +### Adding a new step + +If you wish to contribute a new step, please use the following coding standards: + +1. Define the corresponding input channel into your new process from the expected previous process channel +2. Write the process block (see below). +3. Define the output channel if needed (see below). +4. Add any new flags/options to `nextflow.config` with a default (see below). +5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build`). +6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). +7. Add sanity checks for all relevant parameters. +8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. +9. Do local tests that the new code works properly and as expected. +10. Add a new test command in `.github/workflow/ci.yml`. +11. If applicable add a [MultiQC](https://https://multiqc.info/) module. +12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. +13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. + +### Default values + +Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. + +Once there, use `nf-core schema build` to add to `nextflow_schema.json`. + +### Default processes resource requirements + +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. + +The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block. + +### Naming schemes + +Please use the following naming schemes, to make it easy to understand what is going where. + +* initial process channel: `ch_output_from_` +* intermediate and terminal channels: `ch__for_` + +### Nextflow version bumping + +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` + +### Software version reporting + +If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process. + +Add to the script block of the process, something like the following: + +```bash + --version &> v_.txt 2>&1 || true +``` + +or + +```bash + --help | head -n 1 &> v_.txt 2>&1 || true +``` + +You then need to edit the script `bin/scrape_software_versions.py` to: + +1. Add a Python regex for your tool's `--version` output (as in stored in the `v_.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1` +2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC. + +### Images and figures + +For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 88a1faa..717946f 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,13 +1,25 @@ +--- +name: Bug report +about: Report something that is broken or incorrect +labels: bug +--- + +## Check Documentation + +I have checked the following places for your error: + +- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) +- [ ] [nf-core/scflow pipeline documentation](https://nf-co.re/scflow/usage) + ## Description of the bug @@ -23,6 +35,13 @@ Steps to reproduce the behaviour: +## Log files + +Have you provided the following extra information/files: + +- [ ] The command used to run the pipeline +- [ ] The `.nextflow.log` file + ## System - Hardware: @@ -32,13 +51,12 @@ Steps to reproduce the behaviour: ## Nextflow Installation -- Version: +- Version: ## Container engine -- Engine: +- Engine: - version: -- Image tag: ## Additional context diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000..94a859d --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,8 @@ +blank_issues_enabled: false +contact_links: + - name: Join nf-core + url: https://nf-co.re/join + about: Please join the nf-core community here + - name: "Slack #scflow channel" + url: https://nfcore.slack.com/channels/scflow + about: Discussion about the nf-core/scflow pipeline diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 2116e2e..200c476 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,3 +1,9 @@ +--- +name: Feature request +about: Suggest an idea for the nf-core/scflow pipeline +labels: enhancement +--- + + ## PR checklist -- [ ] This comment contains a description of changes (with reason) -- [ ] `CHANGELOG.md` is updated +- [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] Documentation in `docs` is updated -- [ ] If necessary, also make a PR on the [nf-core/scflow branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/scflow) + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/scflow/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/scflow _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Usage Documentation in `docs/usage.md` is updated. +- [ ] Output Documentation in `docs/output.md` is updated. +- [ ] `CHANGELOG.md` is updated. +- [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml deleted file mode 100644 index 96b12a7..0000000 --- a/.github/markdownlint.yml +++ /dev/null @@ -1,5 +0,0 @@ -# Markdownlint configuration file -default: true, -line-length: false -no-duplicate-header: - siblings_only: true diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 9dfa5d0..f8073aa 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,42 +1,29 @@ name: nf-core AWS full size tests # This workflow is triggered on published releases. -# It can be additionally triggered manually with GitHub actions workflow dispatch. +# It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: release: types: [published] workflow_dispatch: - jobs: - run-awstest: + run-tower: name: Run AWS full tests if: github.repository == 'nf-core/scflow' runs-on: ubuntu-latest steps: - - name: Setup Miniconda - uses: goanpeca/setup-miniconda@v1.0.2 + - name: Launch workflow via tower + uses: nf-core/tower-action@master with: - auto-update-conda: true - python-version: 3.7 - - name: Install awscli - run: conda install -c conda-forge awscli - - name: Start AWS batch job - # TODO nf-core: You can customise AWS full pipeline tests as required - # Add full size test data (but still relatively small datasets for few samples) - # on the `test_full.config` test runs with only one set of parameters - # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command - env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} - run: | - aws batch submit-job \ - --region eu-west-1 \ - --job-name nf-core-scflow \ - --job-queue $AWS_JOB_QUEUE \ - --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/scflow", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/scflow/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/scflow/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + pipeline: ${{ github.repository }} + revision: ${{ github.sha }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/scflow/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/scflow/results-${{ github.sha }}" + } + profiles: '[ "test_full", "aws_tower" ]' diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 6ec5573..f5b5945 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -1,39 +1,27 @@ name: nf-core AWS test -# This workflow is triggered on push to the master branch. -# It can be additionally triggered manually with GitHub actions workflow dispatch. -# It runs the -profile 'test' on AWS batch. +# This workflow can be triggered manually with the GitHub actions workflow dispatch button. +# It runs the -profile 'test' on AWS batch on: workflow_dispatch: - jobs: - run-awstest: + run-tower: name: Run AWS tests if: github.repository == 'nf-core/scflow' runs-on: ubuntu-latest steps: - - name: Setup Miniconda - uses: goanpeca/setup-miniconda@v1.0.2 + - name: Launch workflow via tower + uses: nf-core/tower-action@master + with: - auto-update-conda: true - python-version: 3.7 - - name: Install awscli - run: conda install -c conda-forge awscli - - name: Start AWS batch job - # TODO nf-core: You can customise CI pipeline run tests as required - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix - env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} - run: | - aws batch submit-job \ - --region eu-west-1 \ - --job-name nf-core-scflow \ - --job-queue $AWS_JOB_QUEUE \ - --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/scflow", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/scflow/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/scflow/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + pipeline: ${{ github.repository }} + revision: ${{ github.sha }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/scflow/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/scflow/results-${{ github.sha }}" + } + profiles: '[ "test", "aws_tower" ]' diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index 9e9e903..21e5584 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -2,7 +2,7 @@ name: nf-core branch protection # This workflow is triggered on PRs to master branch on the repository # It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` on: - pull_request: + pull_request_target: branches: [master] jobs: @@ -13,7 +13,7 @@ jobs: - name: Check PRs if: github.repository == 'nf-core/scflow' run: | - { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/scflow ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/scflow ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] # If the above check failed, post a comment on the PR explaining the failure @@ -23,13 +23,22 @@ jobs: uses: mshick/add-pr-comment@v1 with: message: | + ## This PR is against the `master` branch :x: + + * Do not close this PR + * Click _Edit_ and change the `base` to `dev` + * This CI test will remain failed until you push a new commit + + --- + Hi @${{ github.event.pull_request.user.login }}, - It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch. + It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch. The `master` branch on nf-core repositories should always contain code from the latest release. - Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch. + Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch. You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. + Note that even after this, the test will continue to show as failing until you push a new commit. Thanks again for your contribution! repo-token: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index f14f412..5ba6b78 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -8,6 +8,9 @@ on: release: types: [published] +# Uncomment if we need an edge release of Nextflow again +# env: NXF_EDGE: 1 + jobs: test: name: Run workflow tests @@ -20,36 +23,27 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['19.10.0', ''] + nxf_ver: ['21.04.2', ''] steps: - name: Check out pipeline code uses: actions/checkout@v2 - - name: Check if Dockerfile or Conda environment changed - uses: technote-space/get-diff-action@v1 - with: - PREFIX_FILTER: | - Dockerfile - environment.yml - - - name: Build new docker image - if: env.GIT_DIFF - run: docker build --no-cache . -t nfcore/scflow:dev - - - name: Pull docker image - if: ${{ !env.GIT_DIFF }} - run: | - docker pull nfcore/scflow:dev - docker tag nfcore/scflow:dev nfcore/scflow:dev - - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: Run pipeline with test data - # TODO nf-core: You can customise CI pipeline run tests as required - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix run: | nextflow run ${GITHUB_WORKSPACE} -profile test,docker + + - name: Build new docker image + if: ${{ github.event.workflow_run.conclusion == 'thiswillnevertrigger' }} + run: docker build --no-cache . -t almurphy/scfdev:dev + - name: Pull docker image + if: ${{ github.event.workflow_run.conclusion == 'thiswillnevertrigger' }} + run: | + docker pull almurphy/scfdev:dev + docker tag almurphy/scfdev:dev almurphy/scfdev:dev diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 8e8d5bb..3b44877 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -18,7 +18,49 @@ jobs: - name: Install markdownlint run: npm install -g markdownlint-cli - name: Run Markdownlint - run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + run: markdownlint . + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## Markdown linting is failing + + To keep the code consistent with lots of contributors, we run automated code consistency checks. + To fix this CI test, please run: + + * Install `markdownlint-cli` + * On Mac: `brew install markdownlint-cli` + * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`) + * Fix the markdown errors + * Automatically: `markdownlint . --fix` + * Manually resolve anything left from `markdownlint .` + + Once you push these changes the test should pass, and you can hide this comment :+1: + + We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + + EditorConfig: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + + - uses: actions/setup-node@v1 + with: + node-version: '10' + + - name: Install editorconfig-checker + run: npm install -g editorconfig-checker + + - name: Run ECLint check + run: editorconfig-checker -exclude README.md $(git ls-files | grep -v test) + YAML: runs-on: ubuntu-latest steps: @@ -29,7 +71,33 @@ jobs: - name: Install yaml-lint run: npm install -g yaml-lint - name: Run yaml-lint - run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml") + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## YAML linting is failing + + To keep the code consistent with lots of contributors, we run automated code consistency checks. + To fix this CI test, please run: + + * Install `yaml-lint` + * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`) + * Fix the markdown errors + * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")` + * Fix any reported errors in your YAML files + + Once you push these changes the test should pass, and you can hide this comment :+1: + + We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + nf-core: runs-on: ubuntu-latest steps: @@ -38,6 +106,8 @@ jobs: uses: actions/checkout@v2 - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ @@ -57,12 +127,19 @@ jobs: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} + run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + + - name: Save PR number + if: ${{ always() }} + run: echo ${{ github.event.pull_request.number }} > PR_number.txt - name: Upload linting log file artifact if: ${{ always() }} uses: actions/upload-artifact@v2 with: - name: linting-log-file - path: lint_log.txt + name: linting-logs + path: | + lint_log.txt + lint_results.md + PR_number.txt diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml new file mode 100644 index 0000000..90f03c6 --- /dev/null +++ b/.github/workflows/linting_comment.yml @@ -0,0 +1,29 @@ + +name: nf-core linting comment +# This workflow is triggered after the linting action is complete +# It posts an automated comment to the PR, even if the PR is coming from a fork + +on: + workflow_run: + workflows: ["nf-core linting"] + +jobs: + test: + runs-on: ubuntu-latest + steps: + - name: Download lint results + uses: dawidd6/action-download-artifact@v2 + with: + workflow: linting.yml + + - name: Get PR number + id: pr_number + run: echo "::set-output name=pr_number::$(cat linting-logs/PR_number.txt)" + + - name: Post PR comment + uses: marocchino/sticky-pull-request-comment@v2 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + number: ${{ steps.pr_number.outputs.pr_number }} + path: linting-logs/lint_results.md + diff --git a/.github/workflows/push_dockerhub.yml b/.github/workflows/push_dockerhub.yml deleted file mode 100644 index d7c3e61..0000000 --- a/.github/workflows/push_dockerhub.yml +++ /dev/null @@ -1,40 +0,0 @@ -name: nf-core Docker push -# This builds the docker image and pushes it to DockerHub -# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) -on: - push: - branches: - - dev - release: - types: [published] - -jobs: - push_dockerhub: - name: Push new Docker image to Docker Hub - runs-on: ubuntu-latest - # Only run for the nf-core repo, for releases and merged PRs - if: ${{ github.repository == 'nf-core/scflow' }} - env: - DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} - DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} - steps: - - name: Check out pipeline code - uses: actions/checkout@v2 - - - name: Build new docker image - run: docker build --no-cache . -t nfcore/scflow:latest - - - name: Push Docker image to DockerHub (dev) - if: ${{ github.event_name == 'push' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker tag nfcore/scflow:latest nfcore/scflow:dev - docker push nfcore/scflow:dev - - - name: Push Docker image to DockerHub (release) - if: ${{ github.event_name == 'release' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker push nfcore/scflow:latest - docker tag nfcore/scflow:latest nfcore/scflow:${{ github.event.release.tag_name }} - docker push nfcore/scflow:${{ github.event.release.tag_name }} diff --git a/.gitignore b/.gitignore index 8427b91..3c8db70 100644 --- a/.gitignore +++ b/.gitignore @@ -3,7 +3,6 @@ work/ data/ results/ .DS_Store -tests/ testing/ testing* *.pyc @@ -12,4 +11,3 @@ data/ results/ singularity/ *.sif - diff --git a/.markdownlint.yml b/.markdownlint.yml new file mode 100644 index 0000000..9e605fc --- /dev/null +++ b/.markdownlint.yml @@ -0,0 +1,14 @@ +# Markdownlint configuration file +default: true +line-length: false +ul-indent: + indent: 4 +no-duplicate-header: + siblings_only: true +no-inline-html: + allowed_elements: + - img + - p + - kbd + - details + - summary diff --git a/.nf-core.yml b/.nf-core.yml new file mode 100644 index 0000000..17acfd2 --- /dev/null +++ b/.nf-core.yml @@ -0,0 +1,15 @@ +lint: + files_unchanged: + - Dockerfile + + files_exist: + - igenomes.config + - modules/local/get_software_versions.nf + - Dockerfile + + schema_params: + - input + +actions_awsfulltest: False +pipeline_todos: False +actions_ci: False diff --git a/CHANGELOG.md b/CHANGELOG.md index 6ac2a0c..f64400e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0dev - [date] +## v0.7.0dev - [date] Initial release of nf-core/scflow, created with the [nf-core](https://nf-co.re/) template. diff --git a/CITATIONS.md b/CITATIONS.md new file mode 100644 index 0000000..7a2a4dd --- /dev/null +++ b/CITATIONS.md @@ -0,0 +1,32 @@ +# nf-core/scflow: Citations + +## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. + +## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) + +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools + +* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + +* [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + +## Software packaging/containerisation tools + +* [Anaconda](https://anaconda.com) + > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. + +* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) + > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + +* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) + > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. + +* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) + +* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 405fb1b..f4fd052 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -1,46 +1,111 @@ -# Contributor Covenant Code of Conduct +# Code of Conduct at nf-core (v1.0) ## Our Pledge -In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. +In the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of: -## Our Standards +- Age +- Body size +- Familial status +- Gender identity and expression +- Geographical location +- Level of experience +- Nationality and national origins +- Native language +- Physical and neurological ability +- Race or ethnicity +- Religion +- Sexual identity and orientation +- Socioeconomic status -Examples of behavior that contributes to creating a positive environment include: +Please note that the list above is alphabetised and is therefore not ranked in any order of preference or importance. -* Using welcoming and inclusive language -* Being respectful of differing viewpoints and experiences -* Gracefully accepting constructive criticism -* Focusing on what is best for the community -* Showing empathy towards other community members +## Preamble -Examples of unacceptable behavior by participants include: +> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. "We", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply. -* The use of sexualized language or imagery and unwelcome sexual attention or advances -* Trolling, insulting/derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or electronic address, without explicit permission -* Other conduct which could reasonably be considered inappropriate in a professional setting +An up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva. + +nf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals. + +We have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc. + +Our CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities. + +We ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC. + +Questions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re ## Our Responsibilities -Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. +The safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour. + +The safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +Members of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC. + +## When are where does this Code of Conduct apply? + +Participation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference: + +- Communicating with an official project email address. +- Communicating with community members within the nf-core Slack channel. +- Participating in hackathons organised by nf-core (both online and in-person events). +- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence. +- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc. +- Representing nf-core on social media. This includes both official and personal accounts. + +## nf-core cares 😊 + +nf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order): + +- Ask for consent before sharing another community member’s personal information (including photographs) on social media. +- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity. +- Celebrate your accomplishments at events! (Get creative with your use of emojis 🎉 🥳 💯 🙌 !) +- Demonstrate empathy towards other community members. (We don’t all have the same amount of time to dedicate to nf-core. If tasks are pending, don’t hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.) +- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so let’s do this the best we can) +- Focus on what is best for the team and the community. (When in doubt, ask) +- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn. +- Introduce yourself to members of the community. (We’ve all been outsiders and we know that talking to strangers can be hard for some, but remember we’re interested in getting to know you and your visions for open science!) +- Show appreciation and **provide clear feedback**. (This is especially important because we don’t see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**) +- Take breaks when you feel like you need them. +- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.) + +## nf-core frowns on 😕 + +The following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces. + +- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom. +- “Doxing” i.e. posting (or threatening to post) another person’s personal identifying information online. +- Spamming or trolling of individuals on social media. +- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention. +- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience. + +### Online Trolling + +The majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately. + +All community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls. + +## Procedures for Reporting CoC violations -Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. +If someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible. -## Scope +You can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s). -This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. +Issues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course. -## Enforcement +All reports will be handled with utmost discretion and confidentially. -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. +## Attribution and Acknowledgements -Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. +- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4) +- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition) +- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/) +- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla) -## Attribution +## Changelog -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] +### v1.0 - March 12th, 2021 -[homepage]: https://contributor-covenant.org -[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/ +- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC. diff --git a/Dockerfile b/Dockerfile deleted file mode 100644 index fe24217..0000000 --- a/Dockerfile +++ /dev/null @@ -1,17 +0,0 @@ -FROM nfcore/base:dev -LABEL authors="Dr Combiz Khozoie" \ - description="Docker image containing all software requirements for the nf-core/scflow pipeline" - -# Install the conda environment -COPY environment.yml / -RUN conda env create --quiet -f /environment.yml && conda clean -a - -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-scflow-1.0dev/bin:$PATH - -# Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-scflow-1.0dev > nf-core-scflow-1.0dev.yml - -# Instruct R processes to use these empty files instead of clashing with a local version -RUN touch .Rprofile -RUN touch .Renviron diff --git a/README.md b/README.md index 2691f74..f469759 100644 --- a/README.md +++ b/README.md @@ -1,52 +1,102 @@ # ![nf-core/scflow](docs/images/nf-core-scflow_logo.png) -**Complete analysis workflow for single-cell/nuclei RNA-sequencing data.**. +[![GitHub Actions CI Status](https://github.com/nf-core/scflow/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/scflow/actions?query=workflow%3A%22nf-core+CI%22) +[![GitHub Actions Linting Status](https://github.com/nf-core/scflow/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/scflow/actions?query=workflow%3A%22nf-core+linting%22) +[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/scflow/results) +[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) -[![GitHub Actions CI Status](https://github.com/nf-core/scflow/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/scflow/actions) -[![GitHub Actions Linting Status](https://github.com/nf-core/scflow/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/scflow/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.2-23aa62.svg?labelColor=000000)](https://www.nextflow.io/) +[![Run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) +[![Run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) +[![Run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) -[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/) -[![Docker](https://img.shields.io/docker/automated/nfcore/scflow.svg)](https://hub.docker.com/r/nfcore/scflow) -[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23scflow-4A154B?logo=slack)](https://nfcore.slack.com/channels/scflow) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23scflow-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/scflow) +[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core) +[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) ## Introduction -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. +**nf-core/scflow** is a bioinformatics pipeline for scalable, reproducible, best-practice analyses of single-cell/nuclei RNA-sequencing data. + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. Full case/control sc/sn-RNAseq analyses can be orchestrated with a single line of code on a local workstation, high-performance computing cluster (HPC), or on Cloud services including Google Cloud, Amazon Web Services, Microsoft Azure, and Kubernetes. It uses Docker/Singularity containers making installation trivial and results highly reproducible. + +Each new release of **nf-core/scflow** triggers automated continuous integration tests which run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/scflow/results). + +## Pipeline summary + +The **nf-core/scflow** pipeline takes post de-multiplexed sparse gene-cell counts matrices as input and performs a complete case/control analysis including the following steps: - + +![nf-core/scflow](docs/images/scflow_workflow.png) + +(a) Individual sample quality control including ambient RNA profiling, thresholding, and doublet/multiplet identification. +(b) Merged quality control including inter-sample quality metrics and sample outlier identification. +(c) Dataset integration with visualization and quantitative metrics of integration performance. +(d) Flexible dimensionality reduction with UMAP and/or tSNE. +(e) Clustering using Leiden/Louvain community detection. +(f) Automated cell-type annotation with rich cell-type metrics and marker gene characterization. +(g) Flexible differential gene expression for categorical and numerical dependent variables. +(h) Impacted pathway analysis with multiple methods and databases. +(i) Dirichlet modeling of cell-type composition changes. + +Additionally, a high-quality, fully annotated, quality-controlled SingleCellExperiment (SCE) object is output for additional downstream tertiary analyses.  Simply read into R with the `read_sce()` function in the [scFlow](https://www.github.com/combiz/scFlow) R package. + +Interactive HTML reports are generated for each analytical step indicated (grey icon).  See the manuscript for examples (see citations below). + +Analyses are efficiently parallelized where possible (steps a,g,h, and i) and all steps benefit from NextFlow cache enabling parameter tuning with pipeline resume, i.e. you can stop the pipeline at any time, revise analytical parameters, and resume with the `-resume` parameter with only impacted/downstream steps restarted. This is particularly useful for optimizing parameters for quality-control, clustering, dimensionality reduction, or to manually revise automated cell-type annotations. + +For more details, see the pre-print: [scFlow: A Scalable and Reproducible Analysis Pipeline for Single-Cell RNA Sequencing Data](https://doi.org/10.22541/au.162912533.38489960/v1). ## Quick Start -1. Install [`nextflow`](https://nf-co.re/usage/installation) +### Analyse a test dataset + +Try the pipeline on an in-built, minimal test dataset (all inputs will be automatically downloaded): - -2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`Podman`](https://podman.io/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`) + +2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility. 3. Download the pipeline and test it on a minimal dataset with a single command: - ```bash - nextflow run nf-core/scflow -profile test, + ```console + nextflow run nf-core/scflow -profile test, ``` - > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > - If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. + > - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. + +### Analyse your own data -4. Start running your own analysis! +The **nf-core/scflow** pipeline requires three inputs: (1) a two-column manifest file with paths to gene-cell matrices and a unique sample key; (2) a sample sheet with sample information for each input matrix in the manifest file; and, (3) a parameters configuration file ([see parameter documentation](https://nf-co.re/scflow/dev/parameters)). - +A complete, automated, scalable, and reproducible case-control analysis can then be performed with a single line of code: - + +1. Start running your own analysis! ```bash - nextflow run nf-core/scflow -profile --input '*_R{1,2}.fastq.gz' --genome GRCh37 + nextflow run nf-core/scflow \ + --manifest Manifest.tsv \ + --input Samplesheet.tsv \ + -c scflow_params.config \ + -profile local ``` -See [usage docs](https://nf-co.re/scflow/usage) for all of the available options when running the pipeline. +Switching from a local workstation analysis to a Cloud based analysis can be achieved simply by changing the `profile` parameter. For example, a Google Cloud analysis with automated staging of input matrices from Cloud storage (e.g. a Google Storage Bucket) can be achieved using `-profile gcp`.  Additionally, pre-configured institutional profiles for a range of university and research institution HPC systems are readily available via nf-core [configs](https://github.com/nf-core/configs). ## Documentation -The nf-core/scflow pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/scflow/usage) and [output](https://nf-co.re/scflow/output). +The nf-core/scflow pipeline comes with documentation about the pipeline [usage](https://nf-co.re/scflow/usage), [parameters](https://nf-co.re/scflow/parameters) and [output](https://nf-co.re/scflow/output). - +A general usage manual is available at [https://combiz.github.io/scflow-manual/](https://combiz.github.io/scflow-manual/). Code for the underlying scFlow R package toolkit is available on [GitHub](https://github.com/combiz/scflow) with associated function documentation at [https://combiz.github.io/scFlow](https://combiz.github.io/scFlow).  All code is open-source and available under the GNU General Public License v3.0 (GPL-3). ## Credits -nf-core/scflow was originally written by Dr Combiz Khozoie. +nf-core/scflow was originally written by Dr Combiz Khozoie for use at the UK Dementia Research Institute and the Department of Brain Sciences at Imperial College London. + +Dr Nurun Fancy and Dr Mahdi M. Marjaneh made valuable contributions to the impacted pathway analysis and integration (LIGER) modules, respectively. + +Many thanks to other who have helped out along the way too, including (but not limited to): Paolo Di Tommaso, Philip A. Ewels, Harshil Patel, Alexander Peltzer, and Maxime Ulysse Garcia, and lab members including Johanna Jackson, Amy Smith, Karen Davey, and Stergios Tsartsalis. ## Contributions and Support @@ -54,10 +104,17 @@ If you would like to contribute to this pipeline, please see the [contributing g For further information or help, don't hesitate to get in touch on the [Slack `#scflow` channel](https://nfcore.slack.com/channels/scflow) (you can join with [this invite](https://nf-co.re/join/slack)). -## Citation +## Citations + +If you use nf-core/scflow for your analysis, please cite it as follows: + +> **scFlow: A Scalable and Reproducible Analysis Pipeline for Single-Cell RNA Sequencing Data.** +> +> Combiz Khozoie, Nurun Fancy, Mahdi M. Marjaneh, Alan E. Murphy, Paul M. Matthews, Nathan Skene +> +> _bioRxiv_ 2021 August 19. doi: [10.22541/au.162912533.38489960/v1](https://doi.org/10.22541/au.162912533.38489960/v1). - - +An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file and in the analysis reports. You can cite the `nf-core` publication as follows: @@ -66,4 +123,3 @@ You can cite the `nf-core` publication as follows: > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). -> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/assets/email_template.html b/assets/email_template.html index 30324c2..5f3b38f 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -1,6 +1,5 @@ - diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml deleted file mode 100644 index b229496..0000000 --- a/assets/multiqc_config.yaml +++ /dev/null @@ -1,11 +0,0 @@ -report_comment: > - This report has been generated by the nf-core/scflow - analysis pipeline. For information about how to interpret these results, please see the - documentation. -report_section_order: - software_versions: - order: -1000 - nf-core-scflow-summary: - order: -1001 - -export_plots: true diff --git a/assets/nf-core-scflow_logo.png b/assets/nf-core-scflow_logo.png index 0db6026..9b34f32 100644 Binary files a/assets/nf-core-scflow_logo.png and b/assets/nf-core-scflow_logo.png differ diff --git a/assets/schema_input.json b/assets/schema_input.json new file mode 100644 index 0000000..c2327a8 --- /dev/null +++ b/assets/schema_input.json @@ -0,0 +1,20 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/scflow/master/assets/schema_input.json", + "title": "nf-core/scflow pipeline - params.input schema", + "description": "Schema for the file provided with params.input", + "type": "array", + "items": { + "type": "object", + "properties": { + "manifest": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Manifest key must be provided and cannot contain spaces" + } + }, + "required": [ + "manifest" + ] + } +} diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index a147f2e..de966bf 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -14,16 +14,16 @@ Content-Transfer-Encoding: base64 Content-ID: Content-Disposition: inline; filename="nf-core-scflow_logo.png" -<% out << new File("$baseDir/assets/nf-core-scflow_logo.png"). - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' ) %> +<% out << new File("$projectDir/assets/nf-core-scflow_logo.png"). + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> <% if (mqcFile){ @@ -37,15 +37,15 @@ Content-ID: Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" ${mqcFileObj. - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' )} + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} """ }} %> diff --git a/bin/check_inputs.r b/bin/check_inputs.r index 1c66638..3196fa4 100755 --- a/bin/check_inputs.r +++ b/bin/check_inputs.r @@ -1,5 +1,5 @@ #!/usr/bin/env Rscript -# Check the manifest and samplesheet inputs are complete +# Check the manifest and input samplesheet inputs are complete # Combiz Khozoie ## ............................................................................ @@ -16,16 +16,16 @@ parser <- ArgumentParser() required <- parser$add_argument_group("Required", "required arguments") required$add_argument( - "--samplesheet", + "--input", help = "full path to the sample sheet tsv file", - metavar = "SampleSheet.tsv", + metavar = "SampleSheet.tsv", required = TRUE ) required$add_argument( "--manifest", help = "full path to the manifest file", - metavar = "manifest", + metavar = "manifest", required = TRUE ) @@ -34,47 +34,53 @@ required$add_argument( args <- parser$parse_args() -if(!file.exists(args$samplesheet)) { - stop("The samplesheet was not found.") +if (!file.exists(args$input)) { + stop("The input samplesheet was not found.") } -if(!file.exists(args$manifest)) { +if (!file.exists(args$manifest)) { stop("The manifest was not found.") } -samplesheet <- read.delim(args$samplesheet) +input <- read.delim(args$input) manifest <- read.delim(args$manifest) # check manifest paths exist -dir_exists <- purrr::pmap_lgl(manifest, ~ dir.exists(as.character(..2))) -if(!all(dir_exists)){ + +check_exists <- function(filepath) { + RCurl::url.exists(filepath) | dir.exists(filepath) +} + +dir_exists <- purrr::pmap_lgl(manifest, ~ check_exists(as.character(..2))) + +if (!all(dir_exists)) { cat("The following paths were not found: -\n") - print(manifest[!dir_exists,]) + print(manifest[!dir_exists, ]) stop("Folder paths specified in the manifest were not found.") } else { cat("✓ All paths specified in the manifest were found.\n") } -# check samplesheet data present for all keys in manifest -key_in_samplesheet <- purrr::map_lgl( - manifest$key, - ~ . %in% samplesheet$manifest +# check input samplesheet data present for all keys in manifest +key_in_input <- purrr::map_lgl( + manifest$key, + ~ . %in% input$manifest ) -if(!(all(key_in_samplesheet))) { - cat("Samplesheet data was not found for the following keys: - \n") - print(manifest[!key_in_samplesheet, ]$key) - stop("Sample sheet does not contain data for all keys in manifest.") +if (!(all(key_in_input))) { + cat("Input samplesheet data was not found for the following keys: - \n") + print(manifest[!key_in_input, ]$key) + stop("Input sample sheet does not contain data for all keys in manifest.") } else { - cat("✓ Samplesheet contains data for all keys in the manifest.\n") + cat("✓ Input samplesheet contains data for all keys in the manifest.\n") } cat("Checks passed!\n") # write the same manifest back out -write.table(manifest, - "checked_manifest.txt", +write.table(manifest, + "checked_manifest.txt", sep = "\t", quote = FALSE, - col.names = TRUE, + col.names = TRUE, row.names = FALSE) diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py deleted file mode 100755 index a26d1ff..0000000 --- a/bin/markdown_to_html.py +++ /dev/null @@ -1,91 +0,0 @@ -#!/usr/bin/env python -from __future__ import print_function -import argparse -import markdown -import os -import sys -import io - - -def convert_markdown(in_fn): - input_md = io.open(in_fn, mode="r", encoding="utf-8").read() - html = markdown.markdown( - "[TOC]\n" + input_md, - extensions=["pymdownx.extra", "pymdownx.b64", "pymdownx.highlight", "pymdownx.emoji", "pymdownx.tilde", "toc"], - extension_configs={ - "pymdownx.b64": {"base_path": os.path.dirname(in_fn)}, - "pymdownx.highlight": {"noclasses": True}, - "toc": {"title": "Table of Contents"}, - }, - ) - return html - - -def wrap_html(contents): - header = """ - - - - - -
- """ - footer = """ -
- - - """ - return header + contents + footer - - -def parse_args(args=None): - parser = argparse.ArgumentParser() - parser.add_argument("mdfile", type=argparse.FileType("r"), nargs="?", help="File to convert. Defaults to stdin.") - parser.add_argument( - "-o", "--out", type=argparse.FileType("w"), default=sys.stdout, help="Output file name. Defaults to stdout." - ) - return parser.parse_args(args) - - -def main(args=None): - args = parse_args(args) - converted_md = convert_markdown(args.mdfile.name) - html = wrap_html(converted_md) - args.out.write(html) - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/bin/merge_tables.r b/bin/merge_tables.r index bd1b16e..0e72412 100755 --- a/bin/merge_tables.r +++ b/bin/merge_tables.r @@ -22,7 +22,7 @@ required <- parser$add_argument_group("Required", "required arguments") required$add_argument( "--filepaths", help = "-paths to tsv files", - metavar = "1.tsv,2.tsv", + metavar = "1.tsv,2.tsv", required = TRUE ) diff --git a/bin/scflow_annotate_integrated.R b/bin/scflow_annotate_integrated.R index c4f5306..4e823a6 100755 --- a/bin/scflow_annotate_integrated.R +++ b/bin/scflow_annotate_integrated.R @@ -1,85 +1,85 @@ -#!/usr/bin/env Rscript -#' Annotate integrated, dims reduced and clustered sce object -# Mahdi Moradi Marjaneh - -# ____________________________________________________________________________ -# Initialization #### - -options(mc.cores = future::availableCores()) - -## ............................................................................ -## Load packages #### -library(argparse) -library(scFlow) - -## ............................................................................ -## Parse command-line arguments #### - -# create parser object -parser <- ArgumentParser() - -# specify options -required <- parser$add_argument_group("Required", "required arguments") -optional <- parser$add_argument_group("Optional", "required arguments") - -required$add_argument( - "--sce_path", - help = "-path to the SingleCellExperiment", - metavar = "dir", - required = TRUE -) - -required$add_argument( - "--categorical_covariates", - help = "-categorical covariates", - metavar = "individual,diagnosis,region,sex", - required = TRUE -) - -### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. -### Pre-process args #### - -args <- parser$parse_args() -args <- purrr::map(args, function(x) { -if (length(x) == 1) { -if (toupper(x) == "TRUE") { -return(TRUE) -} -if (toupper(x) == "FALSE") { -return(FALSE) -} -if (toupper(x) == "NULL") { -return(NULL) -} -} -return(x) -}) - -## ............................................................................ -## Annotate integrated sce #### - -sce <- read_sce(args$sce_path) - -sce <- annotate_integrated_sce( -sce, -categorical_covariates = args$categorical_covariates -) - -dir.create(file.path(getwd(), "integration_report")) - -report_integrated_sce( - sce = sce, - report_folder_path = file.path(getwd(), "integration_report"), - report_file = "integrate_reduceDims_cluster_report_scflow", -) - -print("Annotation complete, saving outputs..") - -## ............................................................................ -## Save Outputs #### - -# Save SingleCellExperiment -write_sce( -sce = sce, -folder_path = file.path(getwd(), "integrated_sce") -) \ No newline at end of file +#!/usr/bin/env Rscript +#' Annotate integrated, dims reduced and clustered sce object +# Mahdi Moradi Marjaneh + +# ____________________________________________________________________________ +# Initialization #### + +options(mc.cores = future::availableCores()) + +## ............................................................................ +## Load packages #### +library(argparse) +library(scFlow) + +## ............................................................................ +## Parse command-line arguments #### + +# create parser object +parser <- ArgumentParser() + +# specify options +required <- parser$add_argument_group("Required", "required arguments") +optional <- parser$add_argument_group("Optional", "required arguments") + +required$add_argument( + "--sce_path", + help = "-path to the SingleCellExperiment", + metavar = "dir", + required = TRUE +) + +required$add_argument( + "--categorical_covariates", + help = "-categorical covariates", + metavar = "individual,diagnosis,region,sex", + required = TRUE +) + +### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. +### Pre-process args #### + +args <- parser$parse_args() +args <- purrr::map(args, function(x) { +if (length(x) == 1) { +if (toupper(x) == "TRUE") { +return(TRUE) +} +if (toupper(x) == "FALSE") { +return(FALSE) +} +if (toupper(x) == "NULL") { +return(NULL) +} +} +return(x) +}) + +## ............................................................................ +## Annotate integrated sce #### + +sce <- read_sce(args$sce_path) + +sce <- annotate_integrated_sce( +sce, +categorical_covariates = args$categorical_covariates +) + +dir.create(file.path(getwd(), "integration_report")) + +report_integrated_sce( + sce = sce, + report_folder_path = file.path(getwd(), "integration_report"), + report_file = "integrate_reduceDims_cluster_report_scflow", +) + +print("Annotation complete, saving outputs..") + +## ............................................................................ +## Save Outputs #### + +# Save SingleCellExperiment +write_sce( +sce = sce, +folder_path = file.path(getwd(), "integrated_sce") +) diff --git a/bin/scflow_cluster.r b/bin/scflow_cluster.r index 0005cc7..7c5ccbd 100755 --- a/bin/scflow_cluster.r +++ b/bin/scflow_cluster.r @@ -28,48 +28,48 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_path", help = "-path to the SingleCellExperiment", - metavar = "dir", + metavar = "dir", required = TRUE ) required$add_argument( "--cluster_method", help = "method to use for clustering", - metavar = "louvain", + metavar = "louvain", required = TRUE ) required$add_argument( "--reduction_method", help = "reduced dimension embedding to use for clustering", - metavar = "UMAP", + metavar = "UMAP", required = TRUE ) required$add_argument( "--res", - type = "double", + type = "double", default = 0.00001, help = "clustering resolution", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--k", - type = "integer", + type = "integer", default = 100, help = "the number of kNN", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--louvain_iter", - type = "integer", + type = "integer", default = 1, help = "number of iterations used for Louvain clustering", - metavar = "N", + metavar = "N", required = TRUE ) @@ -84,13 +84,13 @@ args <- parser$parse_args() sce <- read_sce(args$sce_path, read_metadata = TRUE) sce <- cluster_sce( - sce, - cluster_method = args$cluster_method, - reduction_method = args$reduction_method, - res = args$res, - k = args$k, - louvain_iter = args$louvain_iter - ) + sce, + cluster_method = args$cluster_method, + reduction_method = args$reduction_method, + res = args$res, + k = args$k, + louvain_iter = args$louvain_iter +) ## ............................................................................ ## Save Outputs #### @@ -100,7 +100,7 @@ write_sce( sce = sce, folder_path = file.path(getwd(), "clustered_sce"), write_metadata = TRUE - ) +) ## ............................................................................ ## Clean up #### diff --git a/bin/scflow_dge.r b/bin/scflow_dge.r index 7f086ac..1893c7a 100755 --- a/bin/scflow_dge.r +++ b/bin/scflow_dge.r @@ -5,14 +5,12 @@ # ____________________________________________________________________________ # Initialization #### -options(mc.cores = future::availableCores()) -print(future::availableCores()) ## ............................................................................ ## Load packages #### library(argparse) -library(scFlow) library(cli) +# Note: scFlow is loaded after the mc.cores option is defined/overriden below ## ............................................................................ ## Parse command-line arguments #### @@ -155,15 +153,57 @@ required$add_argument( required = TRUE ) +required$add_argument( + "--species", + help = "the biological species (e.g. mouse, human)", + default = "human", + required = TRUE +) + +required$add_argument( + "--max_cores", + default = NULL, + help = "override for lower cpu core usage", + metavar = "N", + required = TRUE +) + # get command line options, if help option encountered print help and exit, # otherwise if options not found on command line then set defaults args <- parser$parse_args() + +options("scflow_species" = args$species) + args$rescale_numerics <- as.logical(args$rescale_numerics) args$pseudobulk <- as.logical(args$pseudobulk) args$force_run <- as.logical(args$force_run) -if(tolower(args$random_effects_var) == "null") args$random_effects_var <- NULL +if (tolower(args$random_effects_var) == "null") args$random_effects_var <- NULL + +args$max_cores <- if (toupper(args$max_cores) == "NULL") NULL else { + as.numeric(as.character(args$max_cores)) +} + args$confounding_vars <- strsplit(args$confounding_vars, ",")[[1]] +# ____________________________________________________________________________ +# Delay Package Loading for Optional Max Cores Override + +n_cores <- future::availableCores(methods = "mc.cores") + +if (is.null(args$max_cores)) { + options(mc.cores = n_cores) +} else { + options(mc.cores = min(args$max_cores, n_cores)) +} + +cli::cli_alert(sprintf( + "Using %s cores on system with %s available cores.", + getOption("mc.cores"), + n_cores +)) + +library(scFlow) + # ____________________________________________________________________________ # Start DE #### @@ -180,7 +220,9 @@ if (args$pseudobulk) { pb_str <- "_pb" sce_subset <- pseudobulk_sce( sce_subset, - keep_vars = c(args$dependent_var, args$confounding_vars, args$random_effects_var), + keep_vars = c( + args$dependent_var, args$confounding_vars, args$random_effects_var + ), assay_name = "counts", celltype_var = args$celltype_var, sample_var = args$sample_var @@ -203,17 +245,9 @@ de_results <- perform_de( pval_cutoff = args$pval_cutoff, mast_method = args$mast_method, force_run = args$force_run, - ensembl_mapping_file = args$ensembl_mappings - ) - -new_dirs <- c( - "de_table", - "de_report", - "de_plot", - "de_plot_data") - -#make dirs -purrr::walk(new_dirs, ~ dir.create(file.path(getwd(), .))) + ensembl_mapping_file = args$ensembl_mappings, + species = getOption("scflow_species") +) file_name <- paste0(args$celltype, "_", args$de_method, pb_str, "_") @@ -221,30 +255,20 @@ file_name <- paste0(args$celltype, "_", for (result in names(de_results)) { if (dim(de_results[[result]])[[1]] > 0) { write.table(de_results[[result]], - file = file.path(getwd(), "de_table", - paste0(file_name, result, "_DE.tsv")), + file = file.path(getwd(), + paste0(file_name, result, "_DE.tsv")), quote = FALSE, sep = "\t", col.names = TRUE, row.names = FALSE) - report_de(de_results[[result]], - report_folder_path = file.path(getwd(), "de_report"), - report_file = paste0(file_name, result, "_scflow_de_report")) - - png(file.path(getwd(), "de_plot", + report_folder_path = file.path(getwd()), + report_file = paste0(file_name, result, "_scflow_de_report")) + print("report generated") + png(file.path(getwd(), paste0(file_name, result, "_volcano_plot.png")), width = 247, height = 170, units = "mm", res = 600) print(attr(de_results[[result]], "plot")) dev.off() - p <- attr(de_results[[result]], "plot") - plot_data <- p$data - write.table(p$data, - file = file.path(getwd(), "de_plot_data", - paste0(file_name, result, ".tsv")), - quote = FALSE, sep = "\t", col.names = TRUE, row.names = FALSE) - } else { print(sprintf("No DE genes found for %s", result)) - } + } } - - diff --git a/bin/scflow_dirichlet.r b/bin/scflow_dirichlet.r index 95c55d6..ed0bdee 100755 --- a/bin/scflow_dirichlet.r +++ b/bin/scflow_dirichlet.r @@ -69,7 +69,7 @@ required$add_argument( # otherwise if options not found on command line then set defaults args <- parser$parse_args() args$var_order <- strsplit(args$var_order, ",")[[1]] -if(tolower(args$var_order) == "null") { args$var_order <- NULL } +if (tolower(args$var_order) == "null") { args$var_order <- NULL } # ____________________________________________________________________________ # Start #### @@ -77,12 +77,12 @@ if(tolower(args$var_order) == "null") { args$var_order <- NULL } sce <- read_sce(args$sce_path) results <- model_celltype_freqs( - sce, - unique_id_var = args$unique_id_var, - celltype_var = args$celltype_var, - dependent_var = args$dependent_var, - ref_class = args$ref_class, - var_order = args$var_order + sce, + unique_id_var = args$unique_id_var, + celltype_var = args$celltype_var, + dependent_var = args$dependent_var, + ref_class = args$ref_class, + var_order = args$var_order ) ## ............................................................................ @@ -90,7 +90,7 @@ results <- model_celltype_freqs( new_dirs <- c( "dirichlet_report" - ) +) #make dirs purrr::walk(new_dirs, ~ dir.create(file.path(getwd(), .))) @@ -98,12 +98,12 @@ purrr::walk(new_dirs, ~ dir.create(file.path(getwd(), .))) report_celltype_model( results, report_folder_path = file.path( - getwd(), + getwd(), "dirichlet_report" - ), + ), report_file = paste0( - args$celltype_var, - args$dependent_var, - "dirichlet_report", - sep = "_") -) \ No newline at end of file + args$celltype_var, + args$dependent_var, + "dirichlet_report", + sep = "_") +) diff --git a/bin/scflow_finalize_sce.r b/bin/scflow_finalize_sce.r index 2ea92b4..58813af 100755 --- a/bin/scflow_finalize_sce.r +++ b/bin/scflow_finalize_sce.r @@ -25,42 +25,42 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_path", help = "-path to the SingleCellExperiment", - metavar = "dir", + metavar = "dir", required = TRUE ) required$add_argument( "--celltype_mappings", help = "path to a tsv file with revised celltype mappings", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--clusters_colname", help = "name of the column with cluster numbers", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--celltype_var", help = "name of the column with celltype names", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--unique_id_var", help = "name of the column with unique sample ids", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--facet_vars", help = "names of variables to examine in the celltype metrics report", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) @@ -68,17 +68,45 @@ required$add_argument( required$add_argument( "--input_reduced_dim", help = "name of the reduced dimension slot to use for plots in the report", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--metric_vars", help = "names of variables to examine in the celltype metrics report", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) +required$add_argument( + "--top_n", + default = 5, + type = "integer", + required = TRUE, + help = "The number of top marker genes", + metavar = "N" +) + +required$add_argument( + "--reddimplot_pointsize", + default = 0.1, + type = "double", + required = TRUE, + help = "Point size for reduced dimension plots", + metavar = "N" +) + +required$add_argument( + "--reddimplot_alpha", + default = 0.2, + type = "double", + required = TRUE, + help = "Alpha value for reduced dimension plots", + metavar = "N" +) + + ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### @@ -86,20 +114,23 @@ args <- parser$parse_args() args$facet_vars <- strsplit(args$facet_vars, ",")[[1]] args$metric_vars <- strsplit(args$metric_vars, ",")[[1]] +options("scflow_reddimplot_pointsize" = args$reddimplot_pointsize) +options("scflow_reddimplot_alpha" = args$reddimplot_alpha) + ## ............................................................................ ## Start #### sce <- read_sce(args$sce_path) if (file.exists(args$celltype_mappings)) { - celltype_mappings <- read_celltype_mappings(args$celltype_mappings) - sce <- map_custom_celltypes( - sce, - mappings = celltype_mappings, - clusters_colname = args$clusters_colname - ) + celltype_mappings <- read_celltype_mappings(args$celltype_mappings) + sce <- map_custom_celltypes( + sce, + mappings = celltype_mappings, + clusters_colname = args$clusters_colname + ) } else { - print("Revised cell-type mappings not provided, using auto-annotations") + print("Revised cell-type mappings not provided, using auto-annotations") } sce <- annotate_celltype_metrics( @@ -109,7 +140,8 @@ sce <- annotate_celltype_metrics( unique_id_var = args$unique_id_var, facet_vars = args$facet_vars, input_reduced_dim = args$input_reduced_dim, - metric_vars = args$metric_vars + metric_vars = args$metric_vars, + top_n = args$top_n ) dir.create(file.path(getwd(), "celltype_metrics_report")) @@ -123,20 +155,81 @@ report_celltype_metrics( ## ............................................................................ ## Save Outputs #### +### Save cell-types/n_cells for NextFlow tags celltypes <- as.data.frame(SummarizedExperiment::colData(sce)) %>% dplyr::count(cluster_celltype) colnames(celltypes) <- c("celltype", "n_cells") write.table( - data.frame(celltypes), - file = "celltypes.tsv", + data.frame(celltypes), + file = "celltypes.tsv", row.names = FALSE, col.names = TRUE, quote = FALSE, sep = "\t") -# Save SingleCellExperiment +### Save Marker Gene Plots +folder_path <- file.path(getwd(), "celltype_marker_plots") +dir.create(folder_path) + +for (group in names(sce@metadata$markers)) { + pwidth <- max(10, + length( + unique(sce@metadata$markers[[group]]$marker_plot$data$Group) + ) + ) + pheight <- length( + unique(sce@metadata$markers[[group]]$marker_plot$data$Gene) + ) + p <- sce@metadata$markers[[group]]$marker_plot + plot_file_name <- paste0(group, "_markers") + # save PNG + png(file.path(folder_path, paste0(plot_file_name, ".png")), + width = pwidth * 12, height = pheight * 5, units = "mm", res = 600) + print(p) + dev.off() + + # save PDF + ggsave( + file.path(folder_path, paste0(group, ".pdf")), + p, + width = pwidth * 12, + height = pheight * 5, + units = "mm", + scale = 1 + ) + +} + +### Save Marker Gene Tables +folder_path <- file.path(getwd(), "celltype_marker_tables") +dir.create(folder_path) +for (group in names(sce@metadata$markers)) { + + marker_test_file_name <- paste0(group, "_markers_test.tsv") + top_markers_file_name <- paste0(group, "_top_markers.tsv") + + write.table( + sce@metadata$markers[[group]]$marker_test_res, + file = file.path(folder_path, marker_test_file_name), + row.names = FALSE, + col.names = TRUE, + sep = "\t" + ) + + write.table( + sce@metadata$markers[[group]]$top_specific_markers, + file = file.path(folder_path, top_markers_file_name), + row.names = FALSE, + col.names = TRUE, + sep = "\t" + ) + +} + + +### Save SingleCellExperiment write_sce( sce = sce, folder_path = file.path(getwd(), "final_sce") - ) +) ## ............................................................................ ## Clean up #### diff --git a/bin/scflow_integrate.r b/bin/scflow_integrate.r index 45018bc..b9879e0 100755 --- a/bin/scflow_integrate.r +++ b/bin/scflow_integrate.r @@ -24,260 +24,258 @@ required <- parser$add_argument_group("Required", "required arguments") optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( -"--sce_path", -help = "-path to the SingleCellExperiment", -metavar = "dir", -required = TRUE + "--sce_path", + help = "-path to the SingleCellExperiment", + metavar = "dir", + required = TRUE ) required$add_argument( -"--method", -required = TRUE, -help ="The integration method to use", -metavar = "Liger" + "--method", + required = TRUE, + help = "The integration method to use", + metavar = "Liger" ) required$add_argument( -"--unique_id_var", -required = TRUE, -help ="Unique id variable", -metavar = "manifest" + "--unique_id_var", + required = TRUE, + help = "Unique id variable", + metavar = "manifest" ) required$add_argument( -"--take_gene_union", -default = FALSE, -required = TRUE, -help ="Whether to fill out raw.data matrices with union of genes across all datasets (filling in 0 for missing data)", -metavar = "Boolean" + "--take_gene_union", + default = FALSE, + required = TRUE, + help = "Whether to fill out raw.data matrices with union of genes", + metavar = "Boolean" ) required$add_argument( -"--remove_missing", -default = TRUE, -required = TRUE, -help ="Whether to remove cells not expressing any measured genes, and genes not expressed in any cells", -metavar = "Boolean" + "--remove_missing", + default = TRUE, + required = TRUE, + help = "Remove non-expressive genes and cells", + metavar = "Boolean" ) required$add_argument( -"--num_genes", -default = 3000, -type = "integer", -required = TRUE, -help ="Number of genes to find for each dataset", -metavar = "N" + "--num_genes", + default = 3000, + type = "integer", + required = TRUE, + help = "Number of genes to find for each dataset", + metavar = "N" ) required$add_argument( -"--combine", -default = "union", -required = TRUE, -help ="How to combine variable genes across experiments", -metavar = "union,intersect" + "--combine", + default = "union", + required = TRUE, + help = "How to combine variable genes across experiments", + metavar = "union,intersect" ) required$add_argument( -"--keep_unique", -default = FALSE, -required = TRUE, -help ="Keep genes that occur (i.e., there is a corresponding column in raw.data) only in one dataset", -metavar = "Boolean" + "--keep_unique", + default = FALSE, + required = TRUE, + help = "Keep genes that occur only in one dataset", + metavar = "Boolean" ) required$add_argument( -"--capitalize", -default = FALSE, -required = TRUE, -help ="Capitalize gene names to match homologous genes(ie. across species)", -metavar = "Boolean" + "--capitalize", + default = FALSE, + required = TRUE, + help = "Capitalize gene names to match homologous genes(i.e. across species)", + metavar = "Boolean" ) required$add_argument( -"--use_cols", -default = TRUE, -required = TRUE, -help ="Treat each column as a cell", -metavar = "Boolean" + "--use_cols", + default = TRUE, + required = TRUE, + help = "Treat each column as a cell", + metavar = "Boolean" ) required$add_argument( -"--k", -default = 30, -type = "integer", -required = TRUE, -help ="Inner dimension of factorization (number of factors)", -metavar = "N" + "--k", + default = 30, + type = "integer", + required = TRUE, + help = "Inner dimension of factorization (number of factors)", + metavar = "N" ) required$add_argument( -"--lambda", -default = 5.0, -type = "double", -required = TRUE, -help ="Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases)", -metavar = "N" + "--lambda", + default = 5.0, + type = "double", + required = TRUE, + help = "Regularization parameter", + metavar = "N" ) required$add_argument( -"--thresh", -default = 0.0001, -type = "double", -required = TRUE, -help ="Convergence threshold. Convergence occurs when |obj0-obj|/(mean(obj0,obj)) < thresh", -metavar = "N" + "--thresh", + default = 0.0001, + type = "double", + required = TRUE, + help = "Convergence threshold.", + metavar = "N" ) required$add_argument( -"--max_iters", -default = 100, -type = "integer", -required = TRUE, -help ="Maximum number of block coordinate descent iterations to perform", -metavar = "N" + "--max_iters", + default = 100, + type = "integer", + required = TRUE, + help = "Maximum number of block coordinate descent iterations to perform", + metavar = "N" ) required$add_argument( -"--nrep", -default = 1, -type = "integer", -required = TRUE, -help ="Number of restarts to perform", -metavar = "N" + "--nrep", + default = 1, + type = "integer", + required = TRUE, + help = "Number of restarts to perform", + metavar = "N" ) required$add_argument( -"--rand_seed", -default = 1, -type = "integer", -required = TRUE, -help ="Random seed to allow reproducible results", -metavar = "N" + "--rand_seed", + default = 1, + type = "integer", + required = TRUE, + help = "Random seed to allow reproducible results", + metavar = "N" ) required$add_argument( -"--knn_k", -default = 20, -type = "integer", -required = TRUE, -help ="Number of nearest neighbors for within-dataset knn graph", -metavar = "N" + "--knn_k", + default = 20, + type = "integer", + required = TRUE, + help = "Number of nearest neighbors for within-dataset knn graph", + metavar = "N" ) required$add_argument( -"--k2", -default = 500, -type = "integer", -required = TRUE, -help ="Horizon parameter for shared nearest factor graph", -metavar = "N" + "--k2", + default = 500, + type = "integer", + required = TRUE, + help = "Horizon parameter for shared nearest factor graph", + metavar = "N" ) required$add_argument( -"--prune_thresh", -default = 0.2, -type = "double", -required = TRUE, -help ="Minimum allowed edge weight. Any edges below this are removed (given weight 0)", -metavar = "N" + "--prune_thresh", + default = 0.2, + type = "double", + required = TRUE, + help = "Minimum allowed edge weight. Any edges below this are removed", + metavar = "N" ) required$add_argument( -"--ref_dataset", -default = '', -required = TRUE, -help ="Name of dataset to use as a reference for normalization", -metavar = "ref" + "--ref_dataset", + default = "", + required = TRUE, + help = "Name of dataset to use as a reference for normalization", + metavar = "ref" ) required$add_argument( -"--min_cells", -default = 2, -type = "integer", -required = TRUE, -help ="Minimum number of cells to consider a cluster shared across datasets", -metavar = "N" + "--min_cells", + default = 2, + type = "integer", + required = TRUE, + help = "Minimum number of cells to consider a cluster shared across datasets", + metavar = "N" ) required$add_argument( -"--quantiles", -default = 50, -type = "integer", -required = TRUE, -help ="Number of quantiles to use for quantile normalization", -metavar = "N" + "--quantiles", + default = 50, + type = "integer", + required = TRUE, + help = "Number of quantiles to use for quantile normalization", + metavar = "N" ) required$add_argument( -"--nstart", -default = 10, -type = "integer", -required = TRUE, -help ="Number of times to perform Louvain community detection with different random starts", -metavar = "N" + "--nstart", + default = 10, + type = "integer", + required = TRUE, + help = "Number of times to perform Louvain community detection", + metavar = "N" ) required$add_argument( -"--resolution", -default = 1, -type = "double", -required = TRUE, -help ="Controls the number of communities detected (Higher resolution -> more communities)", -metavar = "N" + "--resolution", + default = 1, + type = "double", + required = TRUE, + help = "Controls the number of communities detected", + metavar = "N" ) required$add_argument( -"--dims_use", -default = "null", -required = TRUE, -help ="Indices of factors to use for shared nearest factor determination", -metavar = "Indices" + "--dims_use", + default = "null", + required = TRUE, + help = "Indices of factors to use for shared nearest factor determination", + metavar = "Indices" ) required$add_argument( -"--dist_use", -default = "CR", -required = TRUE, -help ="Distance metric to use in calculating nearest neighbors", -metavar = "CR" + "--dist_use", + default = "CR", + required = TRUE, + help = "Distance metric to use in calculating nearest neighbors", + metavar = "CR" ) required$add_argument( -"--center", -default = FALSE, -required = TRUE, -help ="Centers the data when scaling factors (useful for less sparse modalities like methylation data)", -metavar = "Boolean" + "--center", + default = FALSE, + required = TRUE, + help = "Centers the data when scaling factors", + metavar = "Boolean" ) required$add_argument( -"--small_clust_thresh", -default = 0, -type = "double", -required = TRUE, -help ="Extracts small clusters loading highly on single factor with fewer cells than this before regular alignment", -metavar = "N" + "--small_clust_thresh", + default = 0, + type = "double", + required = TRUE, + help = "Extracts small clusters loading highly on single factor", + metavar = "N" ) - - ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### args <- parser$parse_args() args <- purrr::map(args, function(x) { -if (length(x) == 1) { -if (toupper(x) == "TRUE") { -return(TRUE) -} -if (toupper(x) == "FALSE") { -return(FALSE) -} -if (toupper(x) == "NULL") { -return(NULL) -} -} -return(x) + if (length(x) == 1) { + if (toupper(x) == "TRUE") { + return(TRUE) + } + if (toupper(x) == "FALSE") { + return(FALSE) + } + if (toupper(x) == "NULL") { + return(NULL) + } + } + return(x) }) @@ -287,43 +285,43 @@ return(x) sce <- read_sce(args$sce_path) sce <- integrate_sce( -sce, -method = args$method, -unique_id_var = args$unique_id_var, -take_gene_union = args$take_gene_union, -remove.missing = args$remove_missing, -make.sparse = T, -num_genes = args$num_genes, -combine = args$combine, -keep_unique = args$keep_unique, -capitalize = args$capitalize, -use_cols = args$use_cols, -k = args$k, -lambda = args$lambda, -thresh = args$thresh, -max_iters = args$max_iters, -nrep = args$nrep, -H_init = NULL, -W_init = NULL, -V_init = NULL, -rand_seed = args$rand_seed, -knn_k = args$knn_k, -k2 = args$k2, -prune_thresh = args$prune_thresh, -ref_dataset = args$ref_dataset, -min_cells = args$min_cells, -quantiles = args$quantiles, -nstart = args$nstart, -resolution = args$resolution, -dims_use = args$dims_use, -dist_use = args$dist_use, -center = args$center, -small_clust_thresh = args$small_clust_thresh, -do_plot = FALSE, -id_number = NULL, -print_obj = FALSE, -print_mod = FALSE, -print_align_summary = FALSE + sce, + method = args$method, + unique_id_var = args$unique_id_var, + take_gene_union = args$take_gene_union, + remove.missing = args$remove_missing, + make.sparse = T, + num_genes = args$num_genes, + combine = args$combine, + keep_unique = args$keep_unique, + capitalize = args$capitalize, + use_cols = args$use_cols, + k = args$k, + lambda = args$lambda, + thresh = args$thresh, + max_iters = args$max_iters, + nrep = args$nrep, + H_init = NULL, + W_init = NULL, + V_init = NULL, + rand_seed = args$rand_seed, + knn_k = args$knn_k, + k2 = args$k2, + prune_thresh = args$prune_thresh, + ref_dataset = args$ref_dataset, + min_cells = args$min_cells, + quantiles = args$quantiles, + nstart = args$nstart, + resolution = args$resolution, + dims_use = args$dims_use, + dist_use = args$dist_use, + center = args$center, + small_clust_thresh = args$small_clust_thresh, + do_plot = FALSE, + id_number = NULL, + print_obj = FALSE, + print_mod = FALSE, + print_align_summary = FALSE ) ## ............................................................................ @@ -331,7 +329,7 @@ print_align_summary = FALSE # Save SingleCellExperiment write_sce( -sce = sce, -folder_path = file.path(getwd(), "integrated_sce"), -write_metadata = TRUE + sce = sce, + folder_path = file.path(getwd(), "integrated_sce"), + write_metadata = TRUE ) diff --git a/bin/scflow_ipa.r b/bin/scflow_ipa.r index 516c1ea..0cca3a8 100755 --- a/bin/scflow_ipa.r +++ b/bin/scflow_ipa.r @@ -31,14 +31,6 @@ required$add_argument( default = "NULL" ) -required$add_argument( - "--reference_file", - help = "full path to the reference gene file", - metavar = ".tsv", - required = TRUE, - default = "NULL" -) - required$add_argument( "--enrichment_tool", help = "one or more enrichment tools", @@ -63,22 +55,6 @@ required$add_argument( default = "KEGG" ) -required$add_argument( - "--is_output", - help = "Whether to return output in a directory", - metavar = "logical", - required = TRUE, - default = "TRUE" -) - -required$add_argument( - "--output_dir", - help = "full path to the dir", - metavar = "current dir", - required = TRUE, - default = "./" -) - ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. @@ -108,31 +84,30 @@ args <- purrr::map(args, function(x) { ## ............................................................................ ## Start impacted pathway analysis(IPA) #### -output_dir <- file.path(args$output_dir, "ipa") +output_dir <- file.path(getwd(), "ipa") +report_dir <- file.path(getwd()) + dir.create(output_dir) +dir.create(report_dir) for (gene_file in args$gene_file) { - enrichment_result <- find_impacted_pathways( gene_file = gene_file, enrichment_tool = args$enrichment_tool, enrichment_method = args$enrichment_method, enrichment_database = args$enrichment_database, - is_output = args$is_output, + is_output = TRUE, output_dir = output_dir ) - report_name <- tools::file_path_sans_ext(gene_file) report_fp <- paste0(report_name, "_scflow_ipa_report") - report_impacted_pathway( - res = enrichment_result, - report_folder_path = output_dir, - report_file = report_fp - ) - - cli::cli_text(c( - "{cli::col_green(symbol$tick)} Analysis complete, output is found at: ", - "{.file {output_dir}}" - )) -} \ No newline at end of file + res = enrichment_result, + report_folder_path = report_dir, + report_file = report_fp + ) + cli::cli_text(c( + "{cli::col_green(symbol$tick)} Analysis complete, output is found at: ", + "{.file {output_dir}}" + )) +} diff --git a/bin/scflow_map_celltypes.r b/bin/scflow_map_celltypes.r index 6e8d686..5c913dd 100755 --- a/bin/scflow_map_celltypes.r +++ b/bin/scflow_map_celltypes.r @@ -26,38 +26,69 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_path", help = "-path to the SingleCellExperiment", - metavar = "dir", + metavar = "dir", required = TRUE ) required$add_argument( "--ctd_folder", help = "path to a folder containing ewce ctd files", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--clusters_colname", help = "the sce colData variable storing cluster numbers", - metavar = "foo/bar", + metavar = "foo/bar", required = TRUE ) required$add_argument( "--cells_to_sample", - type = "integer", + type = "integer", default = 10000, help = "the number of cells to sample with ewce", - metavar = "N", + metavar = "N", required = TRUE ) +required$add_argument( + "--species", + help = "the biological species (e.g. mouse, human)", + default = "human", + required = TRUE +) + +required$add_argument( + "--reddimplot_pointsize", + default = 0.1, + type = "double", + required = TRUE, + help = "Point size for reduced dimension plots", + metavar = "N" +) + +required$add_argument( + "--reddimplot_alpha", + default = 0.2, + type = "double", + required = TRUE, + help = "Alpha value for reduced dimension plots", + metavar = "N" +) + + + ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### args <- parser$parse_args() +options("scflow_species" = args$species) +options("scflow_reddimplot_pointsize" = args$reddimplot_pointsize) +options("scflow_reddimplot_alpha" = args$reddimplot_alpha) + ## ............................................................................ ## Start #### @@ -66,11 +97,12 @@ cat(print(tempdir())) sce <- read_sce(args$sce_path) sce <- map_celltypes_sce( - sce, + sce, ctd_folder = args$ctd_folder, clusters_colname = args$clusters_colname, - cells_to_sample = args$cells_to_sample - ) + cells_to_sample = args$cells_to_sample, + species = args$species +) ## ............................................................................ ## Save Outputs #### @@ -81,7 +113,7 @@ write_celltype_mappings(sce, folder_path = getwd()) write_sce( sce = sce, folder_path = file.path(getwd(), "celltype_mapped_sce") - ) +) ## ............................................................................ ## Clean up #### diff --git a/bin/scflow_merge.r b/bin/scflow_merge.r index cea5278..3006617 100755 --- a/bin/scflow_merge.r +++ b/bin/scflow_merge.r @@ -5,8 +5,6 @@ # ____________________________________________________________________________ # Initialization #### -#options(mc.cores = parallel::detectCores()) - ## ............................................................................ ## Load packages #### library(argparse) @@ -26,50 +24,60 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_paths", help = "-paths to SingleCellExperiment folders", - metavar = "dir,dir2", + metavar = "dir,dir2", required = TRUE ) required$add_argument( "--ensembl_mappings", help = "path to ensembl mappings file", - metavar = "tsv", + metavar = "tsv", required = TRUE ) required$add_argument( "--unique_id_var", help = "unique id variable", - metavar = "manifest", + metavar = "manifest", required = TRUE ) required$add_argument( "--plot_vars", help = "variables to plot", - metavar = "total_features_by_counts,pc_mito", + metavar = "total_features_by_counts,pc_mito", required = TRUE ) required$add_argument( "--facet_vars", help = "variables to facet/subset by", - metavar = "total_features_by_counts,pc_mito", + metavar = "total_features_by_counts,pc_mito", required = TRUE ) required$add_argument( "--outlier_vars", help = "variables to apply adaptive thresholding", - metavar = "total_features_by_counts,total_counts", + metavar = "total_features_by_counts,total_counts", + required = TRUE +) + +required$add_argument( + "--species", + help = "the biological species (e.g. mouse, human)", + default = "human", required = TRUE ) + ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### args <- parser$parse_args() +options("scflow_species" = args$species) + args$sce_paths <- strsplit(args$sce_paths, ",")[[1]] args$facet_vars <- strsplit(args$facet_vars, ",")[[1]] args$outlier_vars <- strsplit(args$outlier_vars, ",")[[1]] @@ -89,16 +97,16 @@ args <- purrr::map(args, function(x) { ## Start Merge #### print(sprintf( - "Reading %sx SingleCellExperiment's", + "Reading %sx SingleCellExperiment's", length(args$sce_paths)) - ) +) sce_l <- lapply(args$sce_paths, read_sce) sce <- merge_sce( sce_l, ensembl_mapping_file = args$ensembl_mappings - ) +) dir.create(file.path(getwd(), "merged_report")) @@ -121,9 +129,9 @@ report_merged_sce( dir.create(file.path(getwd(), "merge_plots")) # Save merged plots (images) -for(rd_name in setdiff(names(sce@metadata$pseudobulk_rd_plots), "UMAP3D")) { - png(file.path(getwd(), "merge_plots", - paste0(args$unique_id_var, "_", rd_name, ".png")), +for (rd_name in setdiff(names(sce@metadata$pseudobulk_rd_plots), "UMAP3D")) { + png(file.path(getwd(), "merge_plots", + paste0(args$unique_id_var, "_", rd_name, ".png")), width = 247, height = 170, units = "mm", res = 600) print(sce@metadata$pseudobulk_rd_plots[[rd_name]]) dev.off() @@ -131,30 +139,30 @@ for(rd_name in setdiff(names(sce@metadata$pseudobulk_rd_plots), "UMAP3D")) { dir.create(file.path(getwd(), "pb_plots")) # Save pb plots (images) -for(rd_name in names(sce@metadata$pseudobulk_plots)) { - png(file.path(getwd(), "pb_plots", - paste0(args$unique_id_var, "_", rd_name, ".png")), +for (rd_name in names(sce@metadata$pseudobulk_plots)) { + png(file.path(getwd(), "pb_plots", + paste0(args$unique_id_var, "_", rd_name, ".png")), width = 247, height = 170, units = "mm", res = 600) print(sce@metadata$pseudobulk_rd_plots[[rd_name]]) dev.off() } dir.create(file.path(getwd(), "merge_summary_plots")) -# save multi-sample summary plots -for(plot_var in names(sce@metadata$merged_plots)) { - for(plot in names(sce@metadata$merged_plots[[plot_var]])) { +# save multi-sample summary plots +for (plot_var in names(sce@metadata$merged_plots)) { + for (plot in names(sce@metadata$merged_plots[[plot_var]])) { if (plot_var == plot) { plot_caption <- sprintf( - "%s_by_%s", + "%s_by_%s", plot_var, sce@metadata$merge_qc_params$unique_id_var) } else { plot_caption <- sprintf( - "%s_by_%s", + "%s_by_%s", plot_var, strsplit(plot, "_vs_")[[1]][[2]]) } - png(file.path(getwd(), "merge_summary_plots", - paste0(args$unique_id_var, "_", plot_caption, ".png")), - width = 247, height = 170, units = "mm", res = 600) + png(file.path(getwd(), "merge_summary_plots", + paste0(args$unique_id_var, "_", plot_caption, ".png")), + width = 247, height = 170, units = "mm", res = 600) print(sce@metadata$merged_plots[[plot_var]][[plot]]) dev.off() } @@ -164,9 +172,8 @@ for(plot_var in names(sce@metadata$merged_plots)) { write_sce( sce = sce, folder_path = file.path(getwd(), "merged_sce") - ) +) ## ............................................................................ ## Clean up #### - diff --git a/bin/scflow_perform_de.r b/bin/scflow_perform_de.r deleted file mode 100755 index 7f086ac..0000000 --- a/bin/scflow_perform_de.r +++ /dev/null @@ -1,250 +0,0 @@ -#!/usr/bin/env Rscript -# Perform differential gene expression on a SingleCellExperiment Object -# Combiz Khozoie - -# ____________________________________________________________________________ -# Initialization #### - -options(mc.cores = future::availableCores()) -print(future::availableCores()) - -## ............................................................................ -## Load packages #### -library(argparse) -library(scFlow) -library(cli) - -## ............................................................................ -## Parse command-line arguments #### - -# create parser object -parser <- ArgumentParser() - -# specify options -required <- parser$add_argument_group("Required", "required arguments") -optional <- parser$add_argument_group("Optional", "required arguments") - -required$add_argument( - "--sce", - help = "path to SingleCellExperiment directory", - metavar = "/dir/sce/", - required = TRUE -) - -required$add_argument( - "--celltype", - help = "celltype to subset for DE analysis", - metavar = "DESEQ2PB", - required = TRUE -) - -required$add_argument( - "--de_method", - help = "differential gene expression method", - metavar = "MAST", - required = TRUE -) - -required$add_argument( - "--mast_method", - help = "differential gene expression sub-method for MAST", - metavar = "bayesglm", - required = TRUE -) - -required$add_argument( - "--min_counts", - type = "integer", - default = 1, - help = "minimum library size (counts) per cell", - metavar = "N", - required = TRUE -) - -required$add_argument( - "--min_cells_pc", - type = "double", - default = 0.10, - metavar = "N", - help = "minimum percentage of cells with min_counts" -) - -required$add_argument( - "--rescale_numerics", - help = "rescale numeric variables in the model (lgl)", - metavar = "TRUE", - required = TRUE -) - -required$add_argument( - "--pseudobulk", - help = "perform pseudobulking option (lgl)", - metavar = "TRUE", - required = TRUE -) - -required$add_argument( - "--celltype_var", - help = "celltype variable", - metavar = "cluster_celltype", - required = TRUE -) - -required$add_argument( - "--sample_var", - help = "sample variable", - metavar = "manifest", - required = TRUE -) - -required$add_argument( - "--force_run", - help = "force run if non-full-rank (lgl)", - metavar = "TRUE", - required = TRUE -) - -required$add_argument( - "--dependent_var", - help = "dependent variable", - metavar = "group", - required = TRUE -) - -required$add_argument( - "--ref_class", - help = "reference class within dependent variable", - metavar = "Control", - required = TRUE -) - -required$add_argument( - "--confounding_vars", - help = "confounding variables", - metavar = "age,sex,pc_mito", - required = TRUE -) - -required$add_argument( - "--random_effects_var", - help = "random effects variable", - metavar = "individual", - required = TRUE -) - -required$add_argument( - "--fc_threshold", - type = "double", - default = 1.1, - metavar = "number", - help = "Absolute fold-change cutoff for DE [default %(default)s]" -) - -required$add_argument( - "--pval_cutoff", - type = "double", - default = 0.05, - metavar = "number", - help = "p-value cutoff for DE [default %(default)s]" -) - -required$add_argument( - "--ensembl_mappings", - help = "path to ensembl mappings file", - metavar = "tsv", - required = TRUE -) - -# get command line options, if help option encountered print help and exit, -# otherwise if options not found on command line then set defaults -args <- parser$parse_args() -args$rescale_numerics <- as.logical(args$rescale_numerics) -args$pseudobulk <- as.logical(args$pseudobulk) -args$force_run <- as.logical(args$force_run) -if(tolower(args$random_effects_var) == "null") args$random_effects_var <- NULL -args$confounding_vars <- strsplit(args$confounding_vars, ",")[[1]] - -# ____________________________________________________________________________ -# Start DE #### - -write(sprintf( - "##### Starting DE of %s cells with %s", - args$celltype, args$demethod -), stdout()) - -sce <- read_sce(args$sce) - -sce_subset <- sce[, sce$cluster_celltype == args$celltype] - -if (args$pseudobulk) { - pb_str <- "_pb" - sce_subset <- pseudobulk_sce( - sce_subset, - keep_vars = c(args$dependent_var, args$confounding_vars, args$random_effects_var), - assay_name = "counts", - celltype_var = args$celltype_var, - sample_var = args$sample_var - ) -} else { - pb_str <- "" -} - -de_results <- perform_de( - sce_subset, - de_method = args$de_method, - min_counts = args$min_counts, - min_cells_pc = args$min_cells_pc, - rescale_numerics = args$rescale_numerics, - dependent_var = args$dependent_var, - ref_class = args$ref_class, - confounding_vars = args$confounding_vars, - random_effects_var = args$random_effects_var, - fc_threshold = args$fc_threshold, - pval_cutoff = args$pval_cutoff, - mast_method = args$mast_method, - force_run = args$force_run, - ensembl_mapping_file = args$ensembl_mappings - ) - -new_dirs <- c( - "de_table", - "de_report", - "de_plot", - "de_plot_data") - -#make dirs -purrr::walk(new_dirs, ~ dir.create(file.path(getwd(), .))) - -file_name <- paste0(args$celltype, "_", - args$de_method, pb_str, "_") - -for (result in names(de_results)) { - if (dim(de_results[[result]])[[1]] > 0) { - write.table(de_results[[result]], - file = file.path(getwd(), "de_table", - paste0(file_name, result, "_DE.tsv")), - quote = FALSE, sep = "\t", col.names = TRUE, row.names = FALSE) - - report_de(de_results[[result]], - report_folder_path = file.path(getwd(), "de_report"), - report_file = paste0(file_name, result, "_scflow_de_report")) - - png(file.path(getwd(), "de_plot", - paste0(file_name, result, "_volcano_plot.png")), - width = 247, height = 170, units = "mm", res = 600) - print(attr(de_results[[result]], "plot")) - dev.off() - - p <- attr(de_results[[result]], "plot") - plot_data <- p$data - write.table(p$data, - file = file.path(getwd(), "de_plot_data", - paste0(file_name, result, ".tsv")), - quote = FALSE, sep = "\t", col.names = TRUE, row.names = FALSE) - - } else { - print(sprintf("No DE genes found for %s", result)) - } -} - - diff --git a/bin/scflow_plot_reddim_genes.r b/bin/scflow_plot_reddim_genes.r index 2a3f936..222072f 100755 --- a/bin/scflow_plot_reddim_genes.r +++ b/bin/scflow_plot_reddim_genes.r @@ -23,30 +23,49 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_path", help = "-path to the SingleCellExperiment", - metavar = "dir", + metavar = "dir", required = TRUE ) required$add_argument( "--reddim_genes_yml", help = "-path to the yml file with genes of interest", - metavar = "dir", + metavar = "dir", required = TRUE ) required$add_argument( "--reduction_methods", help = "reduced dimension embedding(s) to use for plots", - metavar = "UMAP", + metavar = "UMAP", required = TRUE ) +required$add_argument( + "--reddimplot_pointsize", + default = 0.1, + type = "double", + required = TRUE, + help = "Point size for reduced dimension plots", + metavar = "N" +) + +required$add_argument( + "--reddimplot_alpha", + default = 0.2, + type = "double", + required = TRUE, + help = "Alpha value for reduced dimension plots", + metavar = "N" +) ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### args <- parser$parse_args() args$reduction_methods <- strsplit(args$reduction_methods, ",")[[1]] +options("scflow_reddimplot_pointsize" = args$reddimplot_pointsize) +options("scflow_reddimplot_alpha" = args$reddimplot_alpha) ## ............................................................................ ## Start #### @@ -70,10 +89,10 @@ for (reddim in args$reduction_method) { if (gene %in% SummarizedExperiment::rowData(sce)$gene) { p <- plot_reduced_dim_gene( sce, - reduced_dim = reddim, + reduced_dim = reddim, gene = gene - ) - png(file.path(folder_path, paste0(gene, ".png")), + ) + png(file.path(folder_path, paste0(gene, ".png")), width = 170, height = 170, units = "mm", res = 600) print(p) dev.off() diff --git a/bin/scflow_qc.r b/bin/scflow_qc.r index 2d74b7e..726a29a 100755 --- a/bin/scflow_qc.r +++ b/bin/scflow_qc.r @@ -23,124 +23,124 @@ required <- parser$add_argument_group("Required", "required arguments") optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( - "--samplesheet", - help = "full path to the sample sheet tsv file", - metavar = "SampleSheet.tsv", + "--input", + help = "full path to the input sample sheet tsv file", + metavar = "SampleSheet.tsv", required = TRUE ) required$add_argument( "--key_colname", help = "sample sheet column name with unique sample identifiers", - metavar = "manifest", + metavar = "manifest", required = TRUE ) required$add_argument( "--key", help = "unique identifier in sample sheet column specified by key_colname", - metavar = "hirol", + metavar = "hirol", required = TRUE ) required$add_argument( "--factor_vars", help = "sample sheet variables to treat as factors", - metavar = "hirol", + metavar = "hirol", required = TRUE ) required$add_argument( "--mat_path", help = "folder path of sparse matrix (cellranger output)", - metavar = "out", + metavar = "out", required = TRUE ) required$add_argument( "--ensembl_mappings", help = "path to ensembl mappings file", - metavar = "tsv", + metavar = "tsv", required = TRUE ) required$add_argument( "--min_library_size", - type = "integer", + type = "integer", default = 300, help = "minimum library size (counts) per cell", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--max_library_size", help = "maximum library size (counts) per cell or adaptive", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--min_features", - type = "integer", + type = "integer", default = 100, help = "minimum features (expressive genes) per cell", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--max_features", help = "maximum features (expressive genes) per cell or adaptive", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--max_mito", - help = "maximum proportion of counts mapping to mitochondrial genes or adaptive", - metavar = "N", + help = "maximum proportion of counts mapping to mt genes or adaptive", + metavar = "N", required = TRUE ) required$add_argument( "--min_ribo", - type = "double", + type = "double", default = 0.0, help = "minimum proportion of counts mapping to ribosomal genes", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--max_ribo", - type = "double", + type = "double", default = 1.0, help = "maximum proportion of counts mapping to ribosomal genes", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--min_counts", - type = "integer", + type = "integer", default = 2, help = "expressive genes must have >=min_counts in >=min_cells", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--min_cells", - type = "integer", + type = "integer", default = 2, help = "expressive genes must have >=min_counts in >=min_cells", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( - "--drop_unmapped", + "--drop_unmapped", default = "TRUE", help = "drop genes which could not be mapped to gene names (lgl)", required = TRUE @@ -149,71 +149,71 @@ required$add_argument( required$add_argument( "--drop_mito", help = "drop mitochondrial genes (lgl)", - metavar = "TRUE", + metavar = "TRUE", required = TRUE ) required$add_argument( "--drop_ribo", help = "drop ribosomal genes (lgl)", - metavar = "TRUE", + metavar = "TRUE", required = TRUE ) required$add_argument( "--nmads", - type = "double", + type = "double", default = 3.5, help = "number of median absolute deviations for adaptive thresholding", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--find_singlets", help = "run a singlet finding algorithm (lgl)", - metavar = "TRUE", + metavar = "TRUE", required = TRUE ) required$add_argument( "--singlets_method", help = "method to identify singlets", - metavar = "doubletfinder", + metavar = "doubletfinder", required = TRUE ) required$add_argument( "--vars_to_regress_out", help = "variables to regress out before finding singlets", - metavar = "nCount_RNA,pc_mito", + metavar = "nCount_RNA,pc_mito", required = TRUE ) required$add_argument( "--pca_dims", - type = "integer", + type = "integer", default = 10, help = "number of principal components for singlet finding", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--var_features", - type = "integer", + type = "integer", default = 2000, help = "number of variable features for singlet finding", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--doublet_rate", - type = "double", + type = "double", default = 0.075, help = "estimated doublet rate", - metavar = "N", + metavar = "N", required = TRUE ) @@ -224,19 +224,26 @@ required$add_argument( required = TRUE ) +required$add_argument( + "--dpk", + default = "NULL", + help = "doublets per thousands cells increment if doublet_rate is 0", + required = TRUE +) + required$add_argument( "--find_cells", help = "run empty drops (ambient RNA) algorithm (lgl)", - metavar = "TRUE", + metavar = "TRUE", required = TRUE ) required$add_argument( "--lower", - type = "integer", + type = "integer", default = 100, help = "lower parameter for empty drops", - metavar = "N", + metavar = "N", required = TRUE ) @@ -249,66 +256,78 @@ required$add_argument( required$add_argument( "--alpha_cutoff", - type = "double", + type = "double", default = 0.0001, help = "alpha cutoff for emptyDrops algorithm", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--niters", - type = "integer", + type = "integer", default = 10000, - help = "number of iterations to use for the Monte Carlo p-value calculations.", - metavar = "N", + help = "number of iterations for the Monte Carlo p-value calculations.", + metavar = "N", required = TRUE ) required$add_argument( "--expect_cells", - type = "integer", + type = "integer", help = "number of expected cells for emptydrops automated retain estimate", default = 3000, metavar = "N", required = TRUE ) +required$add_argument( + "--species", + help = "the biological species (e.g. mouse, human)", + default = "human", + required = TRUE +) + ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### args <- parser$parse_args() -args[startsWith(names(args), "drop_")] <- + +options("scflow_species" = args$species) + +args[startsWith(names(args), "drop_")] <- as.logical(args[startsWith(names(args), "drop_")]) args$max_library_size <- ifelse( - args$max_library_size == "adaptive", - args$max_library_size, + args$max_library_size == "adaptive", + args$max_library_size, as.numeric(as.character(args$max_library_size)) - ) +) args$max_features <- ifelse( - args$max_features == "adaptive", - args$max_features, + args$max_features == "adaptive", + args$max_features, as.numeric(as.character(args$max_features)) - ) +) args$max_mito <- ifelse( - args$max_mito == "adaptive", - args$max_mito, + args$max_mito == "adaptive", + args$max_mito, as.numeric(as.character(args$max_mito)) - ) -args$pK <- if(toupper(args$pK) == "NULL") NULL else { +) +args$pK <- if (toupper(args$pK) == "NULL") NULL else { as.numeric(as.character(args$pK)) } +args$dpk <- if (toupper(args$dpk) == "NULL") NULL else { + as.numeric(as.character(args$dpk)) +} -if(toupper(args$retain) == "NULL") { +if (toupper(args$retain) == "NULL") { args$retain <- NULL -} else if(toupper(args$retain) == "AUTO") { - args$retain <- "auto" - } else { +} else if (toupper(args$retain) == "AUTO") { + args$retain <- "auto" +} else { args$retain <- as.numeric(as.character(args$retain)) } args$find_singlets <- as.logical(args$find_singlets) -args$factor_vars <- strsplit(args$factor_vars, ",")[[1]] args$vars_to_regress_out <- strsplit(args$vars_to_regress_out, ",")[[1]] args <- purrr::map(args, function(x) { if (length(x) == 1) { @@ -319,6 +338,16 @@ args <- purrr::map(args, function(x) { return(x) }) +if (!is.null(args$factor_vars)) { + args$factor_vars <- strsplit(args$factor_vars, ",")[[1]] + col_classes <- rep("factor", length(args$factor_vars)) + names(col_classes) <- args$factor_vars +} else { + col_classes <- NA +} + + + ## ............................................................................ ## Start QC #### @@ -326,13 +355,10 @@ cli::boxx(paste0("Analysing: ", args$key), float = "center") mat <- scFlow::read_sparse_matrix(args$mat_path) -col_classes <- rep("factor", length(args$factor_vars)) -names(col_classes) <- args$factor_vars - metadata <- read_metadata( unique_key = args$key, key_colname = args$key_colname, - samplesheet_path = args$samplesheet, + samplesheet_path = args$input, col_classes = col_classes ) @@ -367,16 +393,17 @@ sce <- annotate_sce( nmads = args$nmads, annotate_genes = TRUE, annotate_cells = TRUE, - ensembl_mapping_file = args$ensembl_mappings + ensembl_mapping_file = args$ensembl_mappings, + species = args$species ) sce <- filter_sce( - sce, - filter_genes = TRUE, + sce, + filter_genes = TRUE, filter_cells = TRUE ) -if(args$find_singlets) { +if (args$find_singlets) { sce <- find_singlets( sce = sce, singlet_find_method = args$singlets_method, @@ -384,13 +411,13 @@ if(args$find_singlets) { pca_dims = args$pca_dims, var_features = args$var_features, doublet_rate = args$doublet_rate, + dpk = args$dpk, pK = args$pK, num.cores = future::availableCores() ) - sce <- filter_sce( - sce, - filter_genes = TRUE, + sce, + filter_genes = TRUE, filter_cells = TRUE ) } @@ -399,7 +426,8 @@ dir.create(file.path(getwd(), "qc_report")) report_qc_sce( sce = sce, - report_folder_path = file.path(getwd(), "qc_report"), + #report_folder_path = file.path(getwd(), "qc_report"), + report_folder_path = file.path(getwd()), report_file = paste0(args$key, "_scflow_qc_report") ) @@ -412,7 +440,7 @@ print("Analysis complete, saving outputs..") write_sce( sce = sce, folder_path = file.path(getwd(), paste0(args$key, "_sce")) - ) +) new_dirs <- c( "qc_plot_data", @@ -426,7 +454,7 @@ purrr::walk(new_dirs, ~ dir.create(file.path(getwd(), .))) for (df in names(sce@metadata$qc_plot_data)) { write.table( sce@metadata$qc_plot_data[[df]], - file.path(getwd(), "qc_plot_data", + file.path(getwd(), "qc_plot_data", paste0(args$key, "_", df, ".tsv")), sep = "\t", col.names = TRUE, row.names = FALSE) @@ -435,24 +463,24 @@ for (df in names(sce@metadata$qc_plot_data)) { # Save QC summary table write.table( cbind(sce@metadata$metadata, sce@metadata$qc_summary), - file.path(getwd(), "qc_summary", + file.path(getwd(), "qc_summary", paste0(args$key, "_qc_summary.tsv")), sep = "\t", col.names = TRUE, row.names = FALSE) # Save QC plots (images) -for(pname in names(sce@metadata$qc_plots)) { - png(file.path(getwd(), "qc_plots", - paste0(args$key, "_", pname, ".png")), +for (pname in names(sce@metadata$qc_plots)) { + png(file.path(getwd(), "qc_plots", + paste0(args$key, "_", pname, ".png")), width = 247, height = 170, units = "mm", res = 600) print(sce@metadata$qc_plots[[pname]]) dev.off() } # Save doublet finder plots, square -for(pname in names(sce@metadata$qc_plots$doublet_finder)) { - png(file.path(getwd(), "qc_plots", - paste0(args$key, "_", pname, "_doublet_finder.png")), +for (pname in names(sce@metadata$qc_plots$doublet_finder)) { + png(file.path(getwd(), "qc_plots", + paste0(args$key, "_", pname, "_doublet_finder.png")), width = 170, height = 170, units = "mm", res = 600) print(sce@metadata$qc_plots$doublet_finder[[pname]]) dev.off() @@ -460,6 +488,3 @@ for(pname in names(sce@metadata$qc_plots$doublet_finder)) { ## ............................................................................ ## Clean up #### - -# Clear biomart cache -#biomaRt::biomartCacheClear() diff --git a/bin/scflow_reduce_dims.r b/bin/scflow_reduce_dims.r index 4d0f9af..3ed29ae 100755 --- a/bin/scflow_reduce_dims.r +++ b/bin/scflow_reduce_dims.r @@ -27,267 +27,267 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_path", help = "-path to the SingleCellExperiment", - metavar = "dir", + metavar = "dir", required = TRUE ) required$add_argument( "--input_reduced_dim", help = "input reducedDim to use for further dim reds", - metavar = "PCA,Liger", + metavar = "PCA,Liger", required = TRUE ) required$add_argument( "--reduction_methods", help = "methods to use for dimensionality reduction", - metavar = "PCA,tSNE,UMAP,UMAP3D", + metavar = "PCA,tSNE,UMAP,UMAP3D", required = TRUE ) required$add_argument( "--vars_to_regress_out", help = "variables to regress out before finding singlets", - metavar = "nCount_RNA,pc_mito", + metavar = "nCount_RNA,pc_mito", required = TRUE ) required$add_argument( "--pca_dims", - type = "integer", + type = "integer", default = 20, help = "the number of PCA dimensions used", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--n_neighbors", - type = "integer", + type = "integer", default = 30, help = "the number of nearest neighbors", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--n_components", - type = "integer", + type = "integer", default = 2, help = "the number of UMAP dimensions (2 or 3)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--init", help = "type of initialization for UMAP coordinates (uwot)", - metavar = "pca", + metavar = "pca", required = TRUE ) required$add_argument( "--metric", help = "type of distance metric for nearest neighbours (uwot)", - metavar = "euclidean", + metavar = "euclidean", required = TRUE ) required$add_argument( "--n_epochs", - type = "integer", + type = "integer", default = 500, help = "number of epochs for optimization of embeddings (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--learning_rate", - type = "double", + type = "double", default = 1.0, help = "initial learning rate used in optimization (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--min_dist", - type = "double", + type = "double", default = 0.3, help = "effective minimum distance between embedded points (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--spread", - type = "double", + type = "double", default = 1.0, help = "effective scale of embedded points (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--set_op_mix_ratio", - type = "double", + type = "double", default = 1.0, - help = "interpolation between fuzzy union and intersection set operation (uwot)", - metavar = "N", + help = "interpolation between fuzzy union and intersection set (uwot)", + metavar = "N", required = TRUE ) required$add_argument( "--local_connectivity", - type = "integer", + type = "integer", default = 1, help = "number of nearest neighbours assumed connected locally (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--repulsion_strength", - type = "double", + type = "double", default = 1.0, help = "weighting applied to negative samples in optimization (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--negative_sample_rate", - type = "double", + type = "double", default = 5.0, help = "number of negative edge samples per positive edge (uwot)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--fast_sgd", help = "faster but less reproducible UMAP (uwot) (lgl)", - metavar = "FALSE", + metavar = "FALSE", required = TRUE ) required$add_argument( "--dims", - type = "integer", + type = "integer", default = 30, help = "the number of dimensions to output (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--initial_dims", - type = "integer", + type = "integer", default = 50, help = "the number of dimensions retained in the PCA init (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--perplexity", - type = "integer", + type = "integer", default = 30, help = "perplexity parameter (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--theta", - type = "double", + type = "double", default = 0.5, help = "speed / accuracy trade-off (increase for less accuracy) (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--max_iter", - type = "integer", + type = "integer", default = 1000, help = "number of iterations (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--pca_center", help = "should data be centered before pca (rtsne) (lgl)", - metavar = "TRUE", + metavar = "TRUE", required = TRUE ) required$add_argument( "--pca_scale", help = "should data be scaled before pca (rtsne) (lgl)", - metavar = "FALSE", + metavar = "FALSE", required = TRUE ) required$add_argument( "--normalize", help = "should data be normalized before distance calculations (rtsne) (lgl)", - metavar = "TRUE", + metavar = "TRUE", required = TRUE ) required$add_argument( "--stop_lying_iter", - type = "integer", + type = "integer", default = 250, help = "iteration after which perplexities are no longer exaggerated (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--mom_switch_iter", - type = "integer", + type = "integer", default = 250, help = "iteration after which final momentum is used (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--momentum", - type = "double", + type = "double", default = 0.5, help = "momentum used in the first part of the optimization (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--final_momentum", - type = "double", + type = "double", default = 0.8, help = "momentum used in the final part of the optimization (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--eta", - type = "double", + type = "double", default = 200.0, help = "learning rate (rtsne)", - metavar = "N", + metavar = "N", required = TRUE ) required$add_argument( "--exaggeration_factor", - type = "double", + type = "double", default = 12.0, - help = "Exaggeration factor used in the first part of the optimization (rtsne)", - metavar = "N", + help = "Exaggeration factor used in early optimization (rtsne)", + metavar = "N", required = TRUE ) @@ -346,7 +346,7 @@ sce <- reduce_dims_sce( final_momentum = args$final_momentum, eta = args$eta, exaggeration_factor = args$exaggeration_factor - ) +) ## ............................................................................ ## Save Outputs #### diff --git a/bin/scflow_report_integrated.r b/bin/scflow_report_integrated.r index 21f83e7..94c5ad4 100755 --- a/bin/scflow_report_integrated.r +++ b/bin/scflow_report_integrated.r @@ -28,7 +28,7 @@ optional <- parser$add_argument_group("Optional", "required arguments") required$add_argument( "--sce_path", help = "-path to the SingleCellExperiment", - metavar = "dir", + metavar = "dir", required = TRUE ) @@ -42,25 +42,47 @@ required$add_argument( required$add_argument( "--input_reduced_dim", help = "reduced dimension embedding to use for the integration report", - metavar = "UMAP", + metavar = "UMAP", required = TRUE ) +required$add_argument( + "--reddimplot_pointsize", + default = 0.1, + type = "double", + required = TRUE, + help = "Point size for reduced dimension plots", + metavar = "N" +) + +required$add_argument( + "--reddimplot_alpha", + default = 0.2, + type = "double", + required = TRUE, + help = "Alpha value for reduced dimension plots", + metavar = "N" +) + ### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ### Pre-process args #### args <- parser$parse_args() args$categorical_covariates <- strsplit(args$categorical_covariates, ",")[[1]] +options("scflow_reddimplot_pointsize" = args$reddimplot_pointsize) +options("scflow_reddimplot_alpha" = args$reddimplot_alpha) + + ## ............................................................................ ## Start #### sce <- read_sce(args$sce_path, read_metadata = TRUE) sce <- annotate_integrated_sce( - sce, - categorical_covariates = args$categorical_covariates, - input_reduced_dim = args$input_reduced_dim + sce, + categorical_covariates = args$categorical_covariates, + input_reduced_dim = args$input_reduced_dim ) ## ............................................................................ diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 87feb98..1da80cd 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -1,36 +1,18 @@ #!/usr/bin/env python from __future__ import print_function -from collections import OrderedDict -import re +import os -# TODO nf-core: Add additional regexes for new tools in process get_software_versions -regexes = { - "nf-core/scflow": ["v_pipeline.txt", r"(\S+)"], - "Nextflow": ["v_nextflow.txt", r"(\S+)"], - "FastQC": ["v_fastqc.txt", r"FastQC v(\S+)"], - "MultiQC": ["v_multiqc.txt", r"multiqc, version (\S+)"], -} -results = OrderedDict() -results["nf-core/scflow"] = 'N/A' -results["Nextflow"] = 'N/A' -results["FastQC"] = 'N/A' -results["MultiQC"] = 'N/A' +results = {} +version_files = [x for x in os.listdir(".") if x.endswith(".version.txt")] +for version_file in version_files: -# Search each file using its regex -for k, v in regexes.items(): - try: - with open(v[0]) as x: - versions = x.read() - match = re.search(v[1], versions) - if match: - results[k] = "v{}".format(match.group(1)) - except IOError: - results[k] = False + software = version_file.replace(".version.txt", "") + if software == "pipeline": + software = "nf-core/scflow" -# Remove software set to false in results -for k in list(results): - if not results[k]: - del results[k] + with open(version_file) as fin: + version = fin.read().strip() + results[software] = version # Dump to YAML print( @@ -44,11 +26,11 @@
""" ) -for k, v in results.items(): +for k, v in sorted(results.items()): print("
{}
{}
".format(k, v)) print("
") -# Write out regexes as csv file: -with open("software_versions.csv", "w") as f: - for k, v in results.items(): +# Write out as tsv file: +with open("software_versions.tsv", "w") as f: + for k, v in sorted(results.items()): f.write("{}\t{}\n".format(k, v)) diff --git a/bin/scrape_software_versions.r b/bin/scrape_software_versions.r new file mode 100755 index 0000000..8977e5b --- /dev/null +++ b/bin/scrape_software_versions.r @@ -0,0 +1,22 @@ +#!/usr/bin/env Rscript +# Scrape versions of R package dependencies +# Combiz Khozoie + +# Obtain script arguments (output file path) +args <- commandArgs(trailingOnly = TRUE) +assertthat::not_empty(args) + +# Retrieve package versions according with the nf-core format +pkg_versions <- tibble::tibble( + Package = names(installed.packages()[, 3]), + Version = paste0("v", unname(installed.packages()[, 3])) +) + +# Write out package versions as a tsv file +write.table( + pkg_versions, + file = args[1], + row.names = FALSE, + col.names = FALSE, + sep = "\t" +) diff --git a/conf/awsbatch.config b/conf/awsbatch.config deleted file mode 100644 index 14af586..0000000 --- a/conf/awsbatch.config +++ /dev/null @@ -1,18 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for running on AWS batch - * ------------------------------------------------- - * Base config needed for running with -profile awsbatch - */ -params { - config_profile_name = 'AWSBATCH' - config_profile_description = 'AWSBATCH Cloud Profile' - config_profile_contact = 'Alexander Peltzer (@apeltzer)' - config_profile_url = 'https://aws.amazon.com/de/batch/' -} - -aws.region = params.awsregion -process.executor = 'awsbatch' -process.queue = params.awsqueue -executor.awscli = '/home/ec2-user/miniconda/bin/aws' -params.tracedir = './' diff --git a/conf/base.config b/conf/base.config index 4ea277f..3b0559d 100644 --- a/conf/base.config +++ b/conf/base.config @@ -1,51 +1,60 @@ /* - * ------------------------------------------------- - * nf-core/scflow Nextflow base config file - * ------------------------------------------------- - * A 'blank slate' config file, appropriate for general - * use on most high performace compute environments. - * Assumes that all software is installed and available - * on the PATH. Runs in `local` mode - all jobs will be - * run on the logged in environment. - */ +======================================================================================== + nf-core/scflow Nextflow base config file +======================================================================================== + A 'blank slate' config file, appropriate for general use on most high performance + compute environments. Assumes that all software is installed and available on + the PATH. Runs in `local` mode - all jobs will be run on the logged in environment. +---------------------------------------------------------------------------------------- +*/ process { - // TODO nf-core: Check the defaults for all processes - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + // TODO nf-core: Check the defaults for all processes + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } - maxRetries = 1 - maxErrors = '-1' + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' - // Process-specific resource requirements - // NOTE - Only one of the labels below are used in the fastqc process in the main script. - // If possible, it would be nice to keep the same label naming convention when - // adding in your processes. - // TODO nf-core: Customise requirements for specific processes. - // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors - withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 14.GB * task.attempt, 'memory' ) } - time = { check_max( 6.h * task.attempt, 'time' ) } - } - withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 42.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } - } - withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 84.GB * task.attempt, 'memory' ) } - time = { check_max( 10.h * task.attempt, 'time' ) } - } - withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } - } - withName:get_software_versions { - cache = false - } - + // Process-specific resource requirements + // NOTE - Please try and re-use the labels below as much as possible. + // These labels are used and recognised by default in DSL2 files hosted on nf-core/modules. + // If possible, it would be nice to keep the same label naming convention when + // adding in your local modules too. + withLabel:process_tiny { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 1.h * task.attempt, 'time' ) } + } + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 12.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 36.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } + time = { check_max( 16.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withLabel:process_high_memory { + memory = { check_max( 200.GB * task.attempt, 'memory' ) } + } + withLabel:error_ignore { + errorStrategy = 'ignore' + } + withLabel:error_retry { + errorStrategy = 'retry' + maxRetries = 2 + } } diff --git a/conf/igenomes.config b/conf/igenomes.config new file mode 100644 index 0000000..855948d --- /dev/null +++ b/conf/igenomes.config @@ -0,0 +1,432 @@ +/* +======================================================================================== + Nextflow config file for iGenomes paths +======================================================================================== + Defines reference genomes using iGenome paths. + Can be used by any config that customises the base path using: + $params.igenomes_base / --igenomes_base +---------------------------------------------------------------------------------------- +*/ + +params { + // illumina iGenomes reference file paths + genomes { + 'GRCh37' { + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed" + } + 'GRCh38' { + fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'GRCm38' { + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed" + } + 'TAIR10' { + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" + mito_name = "Mt" + } + 'EB2' { + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" + } + 'UMD3.1' { + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" + mito_name = "MT" + } + 'WBcel235' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + mito_name = "MtDNA" + macs_gsize = "9e7" + } + 'CanFam3.1' { + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" + mito_name = "MT" + } + 'GRCz10' { + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'BDGP6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + mito_name = "M" + macs_gsize = "1.2e8" + } + 'EquCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" + mito_name = "MT" + } + 'EB1' { + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" + } + 'Galgal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Gm01' { + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" + } + 'Mmul_1' { + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" + mito_name = "MT" + } + 'IRGSP-1.0' { + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'CHIMP2.1.4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" + mito_name = "MT" + } + 'Rnor_5.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Rnor_6.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'R64-1-1' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + mito_name = "MT" + macs_gsize = "1.2e7" + } + 'EF2' { + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.21e7" + } + 'Sbi1' { + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" + } + 'Sscrofa10.2' { + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" + mito_name = "MT" + } + 'AGPv3' { + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'hg38' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'hg19' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed" + } + 'mm10' { + fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed" + } + 'bosTau8' { + fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'ce10' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "9e7" + } + 'canFam3' { + fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" + mito_name = "chrM" + } + 'danRer10' { + fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.37e9" + } + 'dm6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.2e8" + } + 'equCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" + mito_name = "chrM" + } + 'galGal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" + mito_name = "chrM" + } + 'panTro4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" + mito_name = "chrM" + } + 'rn6' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'sacCer3' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.2e7" + } + 'susScr3' { + fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" + mito_name = "chrM" + } + } +} diff --git a/conf/modules.config b/conf/modules.config index 578dc0f..92362c1 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -1,36 +1,46 @@ /* - * -------------------------------------------------- - * Config file for defining DSL2 per module options - * -------------------------------------------------- - * - * Available keys to override module options: - * args = Additional arguments appended to command in module. - * args2 = Second set of arguments appended to command in module (multi-tool modules). - * publish_dir = Directory to publish results. - * publish_by_id = Publish results in separate folders by meta.id value. - * publish_files = Groovy map where key = "file_ext" and value = "directory" to publish results for that file extension - * The value of "directory" is appended to the standard "publish_dir" path as defined above. - * If publish_files == null (unspecified) - All files are published. - * If publish_files == false - No files are published. - * suffix = File name suffix for output files. - * - */ +======================================================================================== + Config file for defining DSL2 per module options +======================================================================================== + Available keys to override module options: + args = Additional arguments appended to command in module. + args2 = Second set of arguments appended to command in module (multi-tool modules). + args3 = Third set of arguments appended to command in module (multi-tool modules). + publish_dir = Directory to publish results. + publish_by_meta = Groovy list of keys available in meta map to append as directories to "publish_dir" path + If publish_by_meta = true - Value of ${meta['id']} is appended as a directory to "publish_dir" path + If publish_by_meta = ['id', 'custompath'] - If "id" is in meta map and "custompath" isn't then "${meta['id']}/custompath/" + is appended as a directory to "publish_dir" path + If publish_by_meta = false / null - No directories are appended to "publish_dir" path + publish_files = Groovy map where key = "file_ext" and value = "directory" to publish results for that file extension + The value of "directory" is appended to the standard "publish_dir" path as defined above. + If publish_files = null (unspecified) - All files are published. + If publish_files = false - No files are published. + suffix = File name suffix for output files. +---------------------------------------------------------------------------------------- +*/ params { - modules { + modules { + + 'scflow_checkinputs' { + publish_dir = '' + publish_files = false + } 'scflow_qc' { publish_dir = 'quality_control' publish_files = [ - 'qc_report':'', - 'qc_plot_data':'tables', - 'qc_plots':'plots', - '*_sce':'sce/individual_sce/' + 'html':'../../reports/qc', + 'qc_plot_data':'', + 'qc_plots':'', + 'sce':'' ] + publish_by_id = true } - 'scflow_merge_qc_summaries' { + 'scflow_mergeqctables' { publish_dir = 'quality_control' publish_files = ['tsv':''] } @@ -38,9 +48,9 @@ params { 'scflow_merge' { publish_dir = 'merged' publish_files = [ - 'merged_report':'merged_report', - 'merge_plots':'plots', - 'merge_summary_plots':'plots' + 'merged_report':'../reports', + 'merge_plots':'', + 'merge_summary_plots':'' ] } @@ -49,7 +59,7 @@ params { publish_files = false } - 'scflow_reduce_dims' { + 'scflow_reducedims' { publish_dir = '' publish_files = false } @@ -59,15 +69,15 @@ params { publish_files = false } - 'scflow_report_integrated' { - publish_dir = '' - publish_files = ['integration_report',''] + 'scflow_reportintegrated' { + publish_dir = 'integration' + publish_files = ['integration_report':'../reports'] } - 'scflow_map_celltypes' { - publish_dir = 'Tables' + 'scflow_mapcelltypes' { + publish_dir = 'tables' publish_files = [ - 'celltype_mappings.tsv':'Celltype_Mappings' + 'celltype_mappings.tsv':'celltype_mappings' ] } @@ -76,39 +86,47 @@ params { publish_files = [ 'final_sce':'SCE', 'celltypes.tsv':'', - 'celltype_metrics_report':'' + 'celltype_metrics_report':'../reports', + 'celltype_marker_plots':'../celltype_markers', + 'celltype_marker_tables':'../celltype_markers' ] } 'scflow_dge' { publish_dir = 'DGE' publish_files = [ - 'de_table':'tables', - 'de_report':'DGE_report', - 'de_plot':'plots', - 'de_plot_data':'tables' + 'tsv':'', + 'html':'../../reports/DGE', + 'png':'de_plots' ] + publish_by_id = true } 'scflow_ipa' { publish_dir = 'IPA' publish_files = [ - 'ipa':'' + 'ipa':'', + 'html':'../reports/IPA' ] + publish_by_id = true } 'scflow_dirichlet' { publish_dir = 'dirichlet' publish_files = [ - 'dirichlet_report':'' + 'dirichlet_report':'../reports' ] } - 'scflow_plot_reddim_genes' { + 'scflow_plotreddimgenes' { publish_dir = 'plots' publish_files = [ 'reddim_gene_plots':'' ] } + + 'get_software_versions' { + publish_dir = 'pipeline_info' + } } -} \ No newline at end of file +} diff --git a/conf/scflow_analysis.config b/conf/scflow_analysis.config index 68f8a84..ebba67d 100644 --- a/conf/scflow_analysis.config +++ b/conf/scflow_analysis.config @@ -1,164 +1,168 @@ params { + // DEFAULT PARAMETERS + // * = multiple comma-separated variables allowed - // DEFAULT PARAMETERS - // * = multiple comma-separated variables allowed + // Options: Quality-Control + qc_key_colname = 'manifest' + qc_factor_vars = 'individual' // * + qc_min_library_size = 100 + qc_max_library_size = 'adaptive' // if numeric, pass as string + qc_min_features = 100 + qc_max_features = 'adaptive' // if numeric, pass as string + qc_max_mito = 'adaptive' // if numeric, pass as string + qc_min_ribo = 0 + qc_max_ribo = 1 + qc_min_counts = 2 + qc_min_cells = 2 + qc_drop_unmapped = 'true' + qc_drop_mito = 'true' + qc_drop_ribo = 'true' + qc_nmads = 4.0 - // Options: Quality-Control - qc_key_colname = 'manifest' - qc_factor_vars = 'individual' // * - qc_min_library_size = 100 - qc_max_library_size = 'adaptive' - qc_min_features = 100 - qc_max_features = 'adaptive' - qc_max_mito = 'adaptive' - qc_min_ribo = 0 - qc_max_ribo = 1 - qc_min_counts = 2 - qc_min_cells = 2 - qc_drop_unmapped = true - qc_drop_mito = true - qc_drop_ribo = true - qc_nmads = 4.0 + // Options: Ambient RNA Profiling + amb_find_cells = 'false' + amb_lower = 100 + amb_retain = 'auto' // if numeric, pass as string + amb_alpha_cutoff = 0.001 + amb_niters = 10000 + amb_expect_cells = 3000 - // Options: Ambient RNA Profiling - amb_find_cells = false - amb_lower = 100 - amb_retain = 12000 - amb_alpha_cutoff = 0.001 - amb_niters = 10000 - amb_expect_cells = 3000 + // Options: Multiplet Identification + mult_find_singlets = 'false' + mult_singlets_method = 'doubletfinder' + mult_vars_to_regress_out = 'nCount_RNA,pc_mito' // * + mult_pca_dims = 10 + mult_var_features = 2000 + mult_doublet_rate = 0 + mult_dpk = 8 + mult_pK = 0.02 - // Options: Multiplet Identification - mult_find_singlets = false - mult_singlets_method = 'doubletfinder' - mult_vars_to_regress_out = 'nCount_RNA,pc_mito' // * - mult_pca_dims = 10 - mult_var_features = 2000 - mult_doublet_rate = 0 - mult_dpk = 8 - mult_pK = 0.02 + // Options: Integration + integ_method = 'Liger' + integ_unique_id_var = 'manifest' + integ_take_gene_union = 'false' + integ_remove_missing = 'true' + integ_num_genes = 3000 + integ_combine = 'union' + integ_keep_unique = 'false' + integ_capitalize = 'false' + integ_use_cols = 'true' + integ_k = 30 + integ_lambda = 5.0 + integ_thresh = 0.0001 + integ_max_iters = 100 + integ_nrep = 1 + integ_rand_seed = 1 + integ_knn_k = 20 + integ_k2 = 500 + integ_prune_thresh = 0.2 + integ_ref_dataset = 'NULL' + integ_min_cells = 2 + integ_quantiles = 50 + integ_nstart = 10 + integ_resolution = 1 + integ_dims_use = 'NULL' + integ_dist_use = 'CR' + integ_center = 'false' + integ_small_clust_thresh = 0 - // Options: Integration - integ_method = 'Liger' - integ_unique_id_var = 'manifest' - integ_take_gene_union = false - integ_remove_missing = true - integ_num_genes = 3000 - integ_combine = 'union' - integ_keep_unique = false - integ_capitalize = false - integ_use_cols = true - integ_k = 30 - integ_lambda = 5.0 - integ_thresh = 0.0001 - integ_max_iters = 100 - integ_nrep = 1 - integ_rand_seed = 1 - integ_knn_k = 20 - integ_k2 = 500 - integ_prune_thresh = 0.2 - integ_ref_dataset = 'NULL' - integ_min_cells = 2 - integ_quantiles = 50 - integ_nstart = 10 - integ_resolution = 1 - integ_dims_use = 'NULL' - integ_dist_use = 'CR' - integ_center = false - integ_small_clust_thresh = 0 + // Options: Integration report + integ_categorical_covariates = 'manifest,diagnosis,sex' // * + integ_input_reduced_dim = 'UMAP' - // Options: Integration report - integ_categorical_covariates = 'manifest,diagnosis,sex' // * - integ_input_reduced_dim = 'UMAP' + // Options: Merge + merge_plot_vars = 'total_features_by_counts,total_counts,pc_mito,pc_ribo' + merge_facet_vars = 'NULL' // * + merge_outlier_vars = 'total_features_by_counts,total_counts' // * - // Options: Merge - merge_plot_vars = 'total_features_by_counts,total_counts,pc_mito,pc_ribo' - merge_facet_vars = 'NULL' // * - merge_outlier_vars = 'total_features_by_counts,total_counts' // * + // Options: Dimensionality Reduction + reddim_input_reduced_dim = 'PCA,Liger' // * + reddim_reduction_methods = 'tSNE,UMAP,UMAP3D' // * + reddim_vars_to_regress_out = 'nCount_RNA,pc_mito' // * + // umap + reddim_umap_pca_dims = 30 + reddim_umap_n_neighbors = 35 + reddim_umap_n_components = 2 + reddim_umap_init = 'spectral' + reddim_umap_metric = 'euclidean' + reddim_umap_n_epochs = 200 + reddim_umap_learning_rate = 1 + reddim_umap_min_dist = 0.4 + reddim_umap_spread = 0.85 + reddim_umap_set_op_mix_ratio = 1 + reddim_umap_local_connectivity = 1 + reddim_umap_repulsion_strength = 1 + reddim_umap_negative_sample_rate = 5 + reddim_umap_fast_sgd = 'false' + // tsne + reddim_tsne_dims = 2 + reddim_tsne_initial_dims = 50 + reddim_tsne_perplexity = 150 + reddim_tsne_theta = 0.5 + reddim_tsne_stop_lying_iter = 250 + reddim_tsne_mom_switch_iter = 250 + reddim_tsne_max_iter = 1000 + reddim_tsne_pca_center = 'true' + reddim_tsne_pca_scale = 'false' + reddim_tsne_normalize = 'true' + reddim_tsne_momentum = 0.5 + reddim_tsne_final_momentum = 0.8 + reddim_tsne_eta = 1000 + reddim_tsne_exaggeration_factor = 12 - // Options: Dimensionality Reduction - reddim_input_reduced_dim = 'PCA,Liger' // * - reddim_reduction_methods = 'tSNE,UMAP,UMAP3D' // * - reddim_vars_to_regress_out = 'nCount_RNA,pc_mito' // * - // umap - reddim_umap_pca_dims = 30 - reddim_umap_n_neighbors = 35 - reddim_umap_n_components = 2 - reddim_umap_init = 'spectral' - reddim_umap_metric = 'euclidean' - reddim_umap_n_epochs = 200 - reddim_umap_learning_rate = 1 - reddim_umap_min_dist = 0.4 - reddim_umap_spread = 0.85 - reddim_umap_set_op_mix_ratio = 1 - reddim_umap_local_connectivity = 1 - reddim_umap_repulsion_strength = 1 - reddim_umap_negative_sample_rate = 5 - reddim_umap_fast_sgd = false - // tsne - reddim_tsne_dims = 2 - reddim_tsne_initial_dims = 50 - reddim_tsne_perplexity = 150 - reddim_tsne_theta = 0.5 - reddim_tsne_stop_lying_iter = 250 - reddim_tsne_mom_switch_iter = 250 - reddim_tsne_max_iter = 1000 - reddim_tsne_pca_center = true - reddim_tsne_pca_scale = false - reddim_tsne_normalize = true - reddim_tsne_momentum = 0.5 - reddim_tsne_final_momentum = 0.8 - reddim_tsne_eta = 1000 - reddim_tsne_exaggeration_factor = 12 + // Options: Clustering + clust_cluster_method = 'leiden' + clust_reduction_method = 'UMAP_Liger' + clust_res = 0.01 + clust_k = 100 + clust_louvain_iter = 1 - // Options: Clustering - clust_cluster_method = 'leiden' - clust_reduction_method = 'UMAP_Liger' - clust_res = 0.001 - clust_k = 50 - clust_louvain_iter = 1 + // Options: Celltype Annotation + cta_clusters_colname = 'clusters' + cta_cells_to_sample = 10000 + // Options: Celltype Metrics Report + cta_unique_id_var = 'manifest' + cta_clusters_colname = 'clusters' + cta_celltype_var = 'cluster_celltype' + cta_facet_vars = 'manifest,diagnosis,sex' + cta_metric_vars = 'pc_mito,pc_ribo,total_counts,total_features_by_counts' + cta_top_n = 5 - // Options: Celltype Annotation - cta_clusters_colname = 'clusters' - cta_cells_to_sample = 10000 - // Options: Celltype Metrics Report - cta_unique_id_var = 'individual' - cta_clusters_colname = 'clusters' - cta_celltype_var = 'cluster_celltype' - cta_facet_vars = 'manifest,diagnosis,sex' - cta_metric_vars = 'pc_mito,pc_ribo,total_counts,total_features_by_counts' + // Options: Differential Gene Expression + dge_de_method = 'MASTZLM' // * + dge_mast_method = 'bayesglm' + dge_min_counts = 1 + dge_min_cells_pc = 0.1 + dge_rescale_numerics = 'true' + dge_pseudobulk = 'false' + dge_celltype_var = 'cluster_celltype' + dge_sample_var = 'manifest' + dge_dependent_var = 'diagnosis' + dge_ref_class = 'Control' + dge_confounding_vars = 'cngeneson' // * + dge_random_effects_var = 'NULL' + dge_fc_threshold = 1.1 + dge_pval_cutoff = 0.05 + dge_force_run = 'false' + dge_max_cores = 'null' - // Options: Differential Gene Expression - dge_de_method = 'MASTZLM' // * - dge_mast_method = 'bayesglm' - dge_min_counts = 1 - dge_min_cells_pc = 0.1 - dge_rescale_numerics = true - dge_pseudobulk = false - dge_celltype_var = 'cluster_celltype' - dge_sample_var = 'manifest' - dge_dependent_var = 'diagnosis' - dge_ref_class = 'Controls' - dge_confounding_vars = 'cngeneson' // * - dge_random_effects_var = 'NULL' - dge_fc_threshold = 1.1 - dge_pval_cutoff = 0.05 - dge_force_run = false + // Options: Integrated Pathway Analysis + ipa_enrichment_tool = 'WebGestaltR' + ipa_enrichment_method = 'ORA' + ipa_enrichment_database = 'GO_Biological_Process' // * - // Options: Integrated Pathway Analysis - ipa_reference_file = 'NULL' - ipa_enrichment_tool = 'WebGestaltR' - ipa_enrichment_method = 'ORA' - ipa_enrichment_database = 'GO_Biological_Process' // * + // Options: Dirichlet Modeling + dirich_unique_id_var = 'individual' + dirich_celltype_var = 'cluster_celltype' + dirich_dependent_var = 'diagnosis' + dirich_ref_class = 'Control' + dirich_var_order = 'NULL' // * - // Options: Dirichlet Modeling - dirich_unique_id_var = 'individual' - dirich_celltype_var = 'cluster_celltype' - dirich_dependent_var = 'diagnosis' - dirich_ref_class = 'Controls' - dirich_var_order = 'NULL' // * - - // Options: Plots (Reduced Dim) - plotreddim_reduction_methods = 'UMAP_Liger' // * + // Options: Plots (Reduced Dim) + plotreddim_reduction_methods = 'UMAP_Liger' // * + reddimplot_pointsize = 0.1 + reddimplot_alpha = 0.2 + // Misc + species = 'human' } diff --git a/conf/scflow_analysis_bck.config b/conf/scflow_analysis_bck.config deleted file mode 100644 index 27e1330..0000000 --- a/conf/scflow_analysis_bck.config +++ /dev/null @@ -1,164 +0,0 @@ -params { - - // DEFAULT PARAMETERS - // * = multiple comma-separated variables allowed - - // Options: Quality-Control - qc_key_colname = 'manifest' - qc_factor_vars = 'seqdate' // * - qc_min_library_size = 250 - qc_max_library_size = 'adaptive' - qc_min_features = 100 - qc_max_features = 'adaptive' - qc_max_mito = 'adaptive' - qc_min_ribo = 0 - qc_max_ribo = 1 - qc_min_counts = 2 - qc_min_cells = 2 - qc_drop_unmapped = true - qc_drop_mito = true - qc_drop_ribo = true - qc_nmads = 4.0 - - // Options: Ambient RNA Profiling - amb_find_cells = true - amb_lower = 100 - amb_retain = 12000 - amb_alpha_cutoff = 0.001 - amb_niters = 10000 - amb_expect_cells = 3000 - - // Options: Multiplet Identification - mult_find_singlets = true - mult_singlets_method = 'doubletfinder' - mult_vars_to_regress_out = 'nCount_RNA,pc_mito' // * - mult_pca_dims = 10 - mult_var_features = 2000 - mult_doublet_rate = 0 - mult_dpk = 8 - mult_pK = 0.02 - - // Options: Integration - integ_method = 'Liger' - integ_unique_id_var = 'manifest' - integ_take_gene_union = false - integ_remove_missing = true - integ_num_genes = 3000 - integ_combine = 'union' - integ_keep_unique = false - integ_capitalize = false - integ_use_cols = true - integ_k = 30 - integ_lambda = 5.0 - integ_thresh = 0.0001 - integ_max_iters = 100 - integ_nrep = 1 - integ_rand_seed = 1 - integ_knn_k = 20 - integ_k2 = 500 - integ_prune_thresh = 0.2 - integ_ref_dataset = 'NULL' - integ_min_cells = 2 - integ_quantiles = 50 - integ_nstart = 10 - integ_resolution = 1 - integ_dims_use = 'NULL' - integ_dist_use = 'CR' - integ_center = false - integ_small_clust_thresh = 0 - - // Options: Integration report - integ_categorical_covariates = 'individual,diagnosis,region,sex' // * - integ_input_reduced_dim = 'UMAP' - - // Options: Merge - merge_plot_vars = 'total_features_by_counts,total_counts,pc_mito,pc_ribo' - merge_facet_vars = 'NULL' // * - merge_outlier_vars = 'total_features_by_counts,total_counts' // * - - // Options: Dimensionality Reduction - reddim_input_reduced_dim = 'PCA,Liger' // * - reddim_reduction_methods = 'tSNE,UMAP,UMAP3D' // * - reddim_vars_to_regress_out = 'nCount_RNA,pc_mito' // * - // umap - reddim_umap_pca_dims = 30 - reddim_umap_n_neighbors = 35 - reddim_umap_n_components = 2 - reddim_umap_init = 'spectral' - reddim_umap_metric = 'euclidean' - reddim_umap_n_epochs = 200 - reddim_umap_learning_rate = 1 - reddim_umap_min_dist = 0.4 - reddim_umap_spread = 0.85 - reddim_umap_set_op_mix_ratio = 1 - reddim_umap_local_connectivity = 1 - reddim_umap_repulsion_strength = 1 - reddim_umap_negative_sample_rate = 5 - reddim_umap_fast_sgd = false - // tsne - reddim_tsne_dims = 2 - reddim_tsne_initial_dims = 50 - reddim_tsne_perplexity = 150 - reddim_tsne_theta = 0.5 - reddim_tsne_stop_lying_iter = 250 - reddim_tsne_mom_switch_iter = 250 - reddim_tsne_max_iter = 1000 - reddim_tsne_pca_center = true - reddim_tsne_pca_scale = false - reddim_tsne_normalize = true - reddim_tsne_momentum = 0.5 - reddim_tsne_final_momentum = 0.8 - reddim_tsne_eta = 1000 - reddim_tsne_exaggeration_factor = 12 - - // Options: Clustering - clust_method = 'leiden' - clust_reduction_method = 'UMAP_Liger' - clust_res = 0.001 - clust_k = 50 - clust_louvain_iter = 1 - - // Options: Celltype Annotation - cta_clusters_colname = 'clusters' - cta_cells_to_sample = 10000 - // Options: Celltype Metrics Report - cta_unique_id_var = 'individual' - cta_clusters_colname = 'clusters' - cta_celltype_var = 'cluster_celltype' - cta_facet_vars = 'manifest,diagnosis,sex,capdate,prepdate,seqdate' - cta_metric_vars = 'pc_mito,pc_ribo,total_counts,total_features_by_counts' - - // Options: Differential Gene Expression - dge_de_method = 'MASTZLM' // * - dge_mast_method = 'bayesglm' - dge_min_counts = 1 - dge_min_cells_pc = 0.1 - dge_rescale_numerics = true - dge_pseudobulk = false - dge_celltype_var = 'cluster_celltype' - dge_sample_var = 'manifest' - dge_dependent_var = 'group' - dge_ref_class = 'Control' - dge_confounding_vars = 'cngeneson,seqdate,pc_mito' // * - dge_random_effects_var = 'NULL' - dge_fc_threshold = 1.1 - dge_pval_cutoff = 0.05 - dge_force_run = false - - // Options: Integrated Pathway Analysis - ipa_reference_file = 'NULL' - ipa_enrichment_tool = 'WebGestaltR' - ipa_enrichment_method = 'ORA' - ipa_enrichment_database = 'GO_Biological_Process' // * - - // Options: Dirichlet Modeling - dirich_unique_id_var = 'individual' - dirich_celltype_var = 'cluster_celltype' - dirich_dependent_var = 'group' - dirich_ref_class = 'Control' - dirich_var_order = 'Control,Low,High' // * - - // Options: Plots (Reduced Dim) - plotreddim_reduction_methods = 'UMAP_Liger' // * - -} diff --git a/conf/test.config b/conf/test.config index 6e5469c..945ee35 100644 --- a/conf/test.config +++ b/conf/test.config @@ -1,26 +1,30 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/scflow -profile test, - */ +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/scflow -profile test, + +---------------------------------------------------------------------------------------- +*/ params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = 6.GB + max_memory = 6.7.GB max_time = 48.h - // Input data - // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets - // TODO nf-core: Give any required params for the test so that command line flags are not needed - single_end = false - input_paths = [ - ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], - ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] - ] + input = "https://raw.githubusercontent.com/nf-core/test-datasets/scflow/refs/SampleSheet.tsv" + manifest = "https://raw.githubusercontent.com/nf-core/test-datasets/scflow/refs/Manifest.txt" + ensembl_mappings = "https://raw.githubusercontent.com/nfancy/test-datasets/scflow/assets/ensembl_mappings.tsv" + ctd_path = "https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/28033407/ctd_v1.zip" + reddim_genes_yml = "https://raw.githubusercontent.com/nf-core/test-datasets/scflow/refs/reddim_genes.yml" + + reddimplot_pointsize = 1 + reddimplot_alpha = 0.8 + } diff --git a/conf/test_full.config b/conf/test_full.config index 6b15817..2c2c2ad 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -1,22 +1,22 @@ /* - * ------------------------------------------------- - * Nextflow config file for running full-size tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a full size pipeline test. Use as follows: - * nextflow run nf-core/scflow -profile test_full, - */ +======================================================================================== + Nextflow config file for running full-size tests +======================================================================================== + Defines input files and everything required to run a full size pipeline test. + Use as follows: + nextflow run nf-core/scflow -profile test_full, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Full test profile' - config_profile_description = 'Full test dataset to check pipeline function' + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + /* + Use as follows: + nextflow run nf-core/scflow -profile test_full, - // Input data for full size test - // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) - // TODO nf-core: Give any required params for the test so that command line flags are not needed - single_end = false - input_paths = [ - ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], - ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] - ] + ---------------------------------------------------------------------------------------- + */ } diff --git a/docs/README.md b/docs/README.md index bf32009..62522da 100644 --- a/docs/README.md +++ b/docs/README.md @@ -3,8 +3,8 @@ The nf-core/scflow documentation is split into the following pages: * [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) - * An overview of the different results produced by the pipeline and how to interpret them. + * An overview of the different results produced by the pipeline and how to interpret them. You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/images/nf-core-scflow_logo.png b/docs/images/nf-core-scflow_logo.png index d4257f7..46a0ca8 100644 Binary files a/docs/images/nf-core-scflow_logo.png and b/docs/images/nf-core-scflow_logo.png differ diff --git a/docs/images/scflow_workflow.png b/docs/images/scflow_workflow.png new file mode 100644 index 0000000..388f2c8 Binary files /dev/null and b/docs/images/scflow_workflow.png differ diff --git a/docs/output.md b/docs/output.md index d428306..85230c6 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,63 +1,157 @@ # nf-core/scflow: Output -## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/scflow/output](https://nf-co.re/scflow/output) +## Introduction -> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ +This document describes the output produced by the pipeline. Key outputs include interactive HTML reports for major analytical steps, flat-file tables, and publication-quality plots. In addition, a fully-annotated SingleCellExperiment (SCE) object is output for optional downstream tertiary analysis, in addition to individual SCEs for each sample after quality control. -## Introduction +The pipeline will create the directories listed below during an analysis run. All paths are relative to the top-level results directory. -This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. +## Pipeline overview -The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. +The pipeline is built using [Nextflow](https://www.nextflow.io/) and automates a case/control scRNA-seq analysis using the following steps: - +* [Check inputs](#check_inputs) - Checks the input sample sheet and manifest files (`SCFLOW_CHECKINPUTS`) +* [Quality control](#qc) - Quality control of gene-cell matrices for each individual sample (`SCFLOW_QC`) +* [Merged summary](#merged) - Quality control of merged QC tables and the merged SingleCellExperiment (SCE) object (`SCFLOW_MERGEQCTABLES` and `SCFLOW_MERGE`) +* [Integration](#integration) - Calculating latent metagene factors for the merged SCE for sample integration (`SCFLOW_INTEGRATE`, `SCFLOW_REPORTINTEGRATED`) +* [Dimension reduction](#dimension_reduction) - Dimension reduction for the merged SCE using UMAP or tSNE (`SCFLOW_REDUCEDIMS`) +* [Clustering](#clustering) - Community detection to identify clusters of cells using the Louvain/Leiden algorithm ( `SCFLOW_CLUSTER`) +* [Cell-type annotation](#celltype_annotation) - Automated cell-type annotation of the clustered SCE and identification of cell-type marker genes and calculation of relevant metrics (`SCFLOW_MAPCELLTYPES`, `SCFLOW_FINALIZE`) +* [Differential gene expression analysis](#DGE) - Performs differential gene expression analysis and generates result tables and plots (`SCFLOW_DGE`) +* [Impacted pathway analysis](#IPA) - Performs impacted pathway analysis and generates result tables and plots (`SCFLOW_IPA`) +* [Dirichlet](#dirichlet) - Performs differential analysis of cell-type composition (`SCFLOW_DIRICHLET`) +* [Reports](#reports) - Interactive HTML reports describing results from major analytical steps of the pipeline (`SCFLOW_QC`, `SCFLOW_MERGE`, `SCFLOW_REPORTINTEGRATED`, `SCFLOW_FINALIZE`, `SCFLOW_DGE`, `SCFLOW_IPA`, `SCFLOW_DIRICHLET`) +* [Additional plots](#plots) - User-specified gene plots highlighting the expression of genes in cells plotted in reduced dimensional space +* [Additional tables](#tables) - Tables of cell-type mappings for each cluster +* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution (`GET_SOFTWARE_VERSIONS`) -## Pipeline overview +### Quality control + +
+Output files + +* `quality_control/` + * `merged.tsv` : A `.tsv` file containing detailed individual-sample QC metrics for all samples. + +* `quality_cotrol//` + * `qc_plot_data` : `.tsv` files for major QC values for plotting. + * `qc_plots` : `.png` files included in the `.html` reports for major qc steps. + * `sce//`: Post-QC SCE for an individual sample. It is possible to use the read_sce() function of the scFlow R package to read in this object. + +
+ +### Merged summary + +
+Output files + +* `merged/` + * `merged_plots/` : Pseudobulk plots from the `*_scflow_merged_report.html` report. + * `merge_summary_plots/` : High-resolution plots from the `*_scflow_merged_report.html` report. + +
+ +### Integration, dimension reduction and clustering + +
+Output files + +The process `SCFLOW_REPORTINTEGRATED` saves an interactive integration and clustering HTML report to the reports/ folder. + +
-The pipeline is built using [Nextflow](https://www.nextflow.io/) -and processes data using the following steps: +### Celltype annotation -* [FastQC](#fastqc) - Read quality control -* [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline -* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +
+Output files -## FastQC +* `celltype_markers/celltype_marker_plots/` + * `*.pdf` : `.pdf` image of the marker gene plots for clusters and cluster_celltype variables. + * `*.png` : `.png` image of the marker gene plots for clusters and cluster_celltype variables. -[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. +* `celltype_markers/celltype_marker_tables/` + * `*.tsv` : `.tsv` files for all and top n marker genes for clusters and cluster_celltype variables -For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). +* `final/` + * `SCE/final_sce` : The directory containing the final SCE with all metadata and dimensionality reduction, clustering and cell-type annotation. It is possible to use the read_sce() function of the scFlow R package to read in this object. + * `celltypes.tsv` : A `.tsv` file giving the final number of all cell-types and number of nuclei/cells per cell-type. -**Output files:** +
-* `fastqc/` - * `*_fastqc.html`: FastQC report containing quality metrics for your untrimmed raw fastq files. -* `fastqc/zips/` - * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. +### Differential gene expression analysis -> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. +
+Output files -## MultiQC +* `DGE//` + * `*.tsv` : A `.tsv` file containing all genes with statistical results of the fitted model, including logFC, adjusted p-value, etc. + * `de_plots` : Directory containing the volcano plot used in the `scflow_de_report.html` report. -[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarizing all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. +
-The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. +### Impacted pathway analysis -For more information about how to use MultiQC reports, see [https://multiqc.info](https://multiqc.info). +
+Output files -**Output files:** +* `IPA//` + * `//` : Directory containing a `.png` plot of the top 10 significant impacted pathways and `.tsv` file containing all significantly impacted pathways. -* `multiqc/` - * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/`: directory containing static images from the report in various formats. +
-## Pipeline information +### Dirichlet -[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. +
+Output files +The process `SCFLOW_DIRICHLET` saves a differential cell-composition report to the `reports/` folder. -**Output files:** +
+ +### Reports + +
+Output files + +* `reports/` + * `qc/*_scflow_qc_report.html` : Per-sample QC reports containing post-QC summaries, key parameters used, QC plots, etc. + * `merged_report/*_scflow_merged_report.html` : The merged summary report containing inter-sample QC metrics. + * `integration_report/integrate_report_scflow.html` : The integration report describing key parameters for integration and visual and quantitative outputs of integration performance. + * `celltype_metrics_report/scflow_celltype_metrics_report.html` : The cell-type metrics report, including cluster and cell-type annotations, marker genes, and additional metrics. + * `DGE/*scflow_de_report.html` : Individual reports for each differential gene expression model fit for each cell-type. + * `IPA/*scflow_ipa_report.html` : Individual reports from impacted pathway analysis with plots and tables of enrichment results. + * `dirichlet_report/*dirichlet_report.html` : Dirichlet report for differential cell-type composition analysis. + +
+ +### Additional plots + +
+Output files + +* `plots/reddim_gene_plots` + * `/` : Directories of plots of gene expression in 2D space for each gene in the `reddim_genes.yml` file. + +
+ +### Additional tables + +
+Output files + +* `tables/celltype_mappings/` + * `celltype_mappings.tsv` : A `.tsv` file with the automated cell-type annotations generated by the process `SCFLOW_MAPCELLTYPES`. Optionally copy this file to a new location, update it, and return to the analysis with the `--celltype_mappings` parameter to manually revise cell-type annotations for clusters. + +
+ +### Pipeline information + +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline and provide you with other information such as launch commands, run times, and resource usage. + +
+Output files * `pipeline_info/` - * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. - * Documentation for interpretation of results in HTML format: `results_description.html`. + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.svg`. + * Reports generated by the pipeline: `software_versions.tsv`. + +
diff --git a/docs/usage.md b/docs/usage.md index 089eb97..5ce96f6 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,111 +1,151 @@ # nf-core/scflow: Usage -## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/scflow/usage](https://nf-co.re/scflow/usage) - -> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ - ## Introduction - +The **nf-core/scflow** pipeline is designed to orchestrate a reproducible case/control analysis of scRNA-seq data with best-practices, at scale, from quality-control through to insight discovery. -## Running the pipeline +A pipeline run with **nf-core/scflow** requires three inputs: (1) a two-column manifest file with paths to gene-cell matrices and a unique sample key; (2) a sample sheet with sample information for each input matrix in the manifest file; and, (3) a parameters configuration file (documentation for each parameter is available at [https://nf-co.re/scflow/dev/parameters]). -The typical command for running the pipeline is as follows: +A complete, automated, scalable, and reproducible case-control analysis can then be performed with a single line of code: - ```bash -nextflow run nf-core/scflow --input '*_R{1,2}.fastq.gz' -profile docker +nextflow run nf-core/scflow \ +--manifest Manifest.tsv \ +--input Samplesheet.tsv \ +-c scflow_params.config \ +-profile local ``` -This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. +## Pipeline inputs -Note that the pipeline will create the following files in your working directory: +### Manifest file -```bash -work # Directory containing the nextflow working files -results # Finished results (configurable, see below) -.nextflow_log # Log file from Nextflow -# Other nextflow hidden files, eg. history of pipeline runs and old logs. -``` +You will need to create a manifest file with paths to the sparse matrices you would like to analyse. Use the `manifest` parameter to specify its location: - -### Updating the pipeline +`--manifest '[path to manifest file]'` -When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: +This is a tab-separated-variable file with two columns: `key` and `filepath`. This file specifies the locations of input files for an analysis and a unique key. This unique key identifies a unique sample (single row) inside the sample sheet (discussed below). The manifest file may contain fewer samples than the sample sheet and can be revised based on QC etc. without changing the sample sheet. -```bash -nextflow pull nf-core/scflow +An example manifest file: - + +| key | filepath | +| ----- | -------------------------------------------------- | +| tavij | MS/BATCH1_outputs/C36/outs/raw_feature_bc_matrix | +| gavon | MS/BATCH1_outputs/C48/outs/raw_feature_bc_matrix | +| sisos | MS/BATCH1_outputs/C54/outs/raw_feature_bc_matrix | +| vubul | MS/BATCH1_outputs/MS411/outs/raw_feature_bc_matrix | +| larim | MS/BATCH1_outputs/MS430/outs/raw_feature_bc_matrix | +| famuv | MS/BATCH1_outputs/MS461/outs/raw_feature_bc_matrix | +| pobas | MS/BATCH1_outputs/MS513/outs/raw_feature_bc_matrix | +| dovim | MS/BATCH1_outputs/MS527/outs/raw_feature_bc_matrix | +| honiz | MS/BATCH1_outputs/MS530/outs/raw_feature_bc_matrix | +| kurus | MS/BATCH1_outputs/MS535/outs/raw_feature_bc_matrix | + +In this example, the `key` column is a single word proquint (pronouncable-quintuplet) identifier generated using the `ids` package in R. Any unique identifier is valid (avoid spaces and special characters). + +The `filepath` column of the manifest file should point to folders containing `matrix.mtx.gz`, `features.tsv.gz`, `barcodes.tsv.gz` for individual samples. These are the raw or filtered gene-cell matrices output by CellRanger (additional inputs will be supported in the future). + +### Samplesheet input + +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use the `input` parameter to specify its location: - + +```console +--input '[path to samplesheet file]' ``` -### Reproducibility +An example samplesheet file: - -It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. +| individual | group | diagnosis | sex | age | capdate | prepdate | seqdate | manifest | +| ---------- | ------- | --------- | ---- | ---- | -------- | -------- | ------- | -------- | +| C36 | Control | Control | M | 68 | 20180802 | 20180803 | 201808 | tavij | +| C48 | Control | Control | M | 68 | 20180803 | 20180803 | 201808 | gavon | +| C54 | Control | Control | M | 66 | 20180806 | 20180807 | 201808 | sisos | +| PDC05 | Control | Control | M | 58 | 20181002 | 20181008 | 201811 | hajov | +| MS527 | High | MS | M | 47 | 20180807 | 20180808 | 201808 | dovim | +| MS535 | High | MS | F | 65 | 20180806 | 20180807 | 201808 | kurus | +| MS430 | Low | MS | F | 61 | 20180802 | 20180803 | 201808 | larim | +| MS461 | Low | MS | M | 43 | 20180803 | 20180803 | 201808 | famuv | +| MS530 | Low | MS | M | 42 | 20180806 | 20180807 | 201808 | honiz | + +As sample sheet values may be used in figures and figure legends generated by the pipeline, relatively brief values in PascalCase are recommended. For example, `Low` is preferable to `low`, and `MS` is preferable to `MultipleSclerosis`. Spaces are not supported (e.g. `Multiple Sclerosis`). -First, go to the [nf-core/scflow releases page](https://github.com/nf-core/scflow/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. +### Parameters file -This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. +The parameter configuration for an analysis is defined in the `scflow_analysis.config` file in the conf directory. This file defines over 130 tunable analysis parameters. Defaults are recommended for most parameters, while those defining experimental design should be configured before a run. The pipeline supports parameter tuning with cache-based workflow resume using the -resume NextFlow feature. This feature is highly recommended for optimizing parameters which are highly sensitive to dataset size/quality differences (e.g. clustering, dimensionality reduction). -## Core Nextflow arguments +The documentation for parameters are available in human-readable format [here](https://nf-co.re/scflow/parameters) or for version-specific parameters, here: `https://nf-co.re/scflow/{version}/parameters`. -> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). +For your first analysis, we recommend default parameters; however, parameters containing sample sheet specific variables should be updated before starting. These include: - -### `-profile` +* `qc_factor_vars` -- this parameter will explicitly set the variable type of named sample sheet variables to factors, to override the assumption by R that table columns which contain only numbers are numeric or integers. In the sample sheet example above, the *capdate*, *prepdate*, and *seqdate* should all be explicitly overridden with `qc_factor_vars='capdate,prepdate,seqdate'`. +* `integ_categorical_covariates` -- add sample sheet variables which may be sources of sample-to-sample variance here in order to examine integration performance, e.g. `integ_categorical_covariates='manifest,diagnosis,sex,group,seqdate'`. +* `merge_facet_vars`-- optionally add categorical sample sheet variables which may be sources of batch effects here for assessment in the post-merge QC report, e.g. `merge_facet_vars='seqdate,capdate'`. +* `cta_facet_vars` -- optionally add categorical sample sheet variables here to evaluate cell-type metrics across classes in the cell-type metrics report, e.g. `cta_facet_vars='manifest,diagnosis,sex'`. +* For differential gene expression (`dge_` prefix parameters) and Dirichlet differential cell-type composition (`dirich_` prefix parameters), set the `dependent_var` as the sample sheet variable of interest (e.g. diagnosis), and specify the reference class (`ref_class`) of the variable (e.g. control) if the variable is categorical. Typically the `dge_confounding_vars` would include both cellular and sample level co-variates (e.g. `dge_confounding_vars='cngeneson,pc_mito,seqdate'`). -Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. +For more details of experimental parameters, see the [parameters documentation](https://nf-co.re/scflow/parameters) or the [scFlow Manual](https://combiz.github.io/scflow-manual/). -Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Conda) - see below. +## General Nextflow -> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. +### Profiles -The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). +The profile parameter can provide configuration presets for different compute environments. Switching from a local workstation analysis to a Cloud based analysis can be achieved simply by changing the profile parameter. For example, a Google Cloud analysis with automated staging of input matrices from Cloud storage (e.g. a Google Storage Bucket) can be achieved using `-profile gcp`. Additionally, pre-configured institutional profiles for a range of university and research institution HPC systems are readily available via nf-core [https://github.com/nf-core/configs] and will be loaded dynamically at runtime. To check if your system is available in these configs, please see the documentation [https://github.com/nf-core/configs#documentation]. -Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! -They are loaded in sequence, so later profiles can overwrite earlier profiles. +Additionally, the `test` profile includes a complete configuration to test the pipeline (including a dynamically loaded minimal dataset, and associated inputs: a sample sheet, manifest file, and parameters). This may run with: - -If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. +`nextflow run nf-core/scflow -profile test,` -* `docker` - * A generic configuration profile to be used with [Docker](https://docker.com/) - * Pulls software from Docker Hub: [`nfcore/scflow`](https://hub.docker.com/r/nfcore/scflow/) -* `singularity` - * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) - * Pulls software from Docker Hub: [`nfcore/scflow`](https://hub.docker.com/r/nfcore/scflow/) -* `podman` - * A generic configuration profile to be used with [Podman](https://podman.io/) - * Pulls software from Docker Hub: [`nfcore/scflow`](https://hub.docker.com/r/nfcore/scflow/) -* `conda` - * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity or Podman. - * A generic configuration profile to be used with [Conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) -* `test` - * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters +### Updating the pipeline + +When you run a `nextflow run` command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: + +`nextflow pull nf-core/scflow` + +### Reproducibility + +It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. + +First, go to the **nf-core/scflow** releases page and find the latest version number - numeric only (e.g. 1.0.0). Then specify this when running the pipeline with -r (one hyphen) - eg. `-r 1.0.0`. -### `-resume` +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. Each release will also be associated with a citable DOI. -Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. +### Additional options + +`-resume` + +Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. -### `-c` +`-c` + +Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. We highly recommend utilizing [Nextflow Tower](https://tower.nf/) for pipeline monitoring; your token can be passed with a custom config file. + +## Custom configuration -Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. +### Resource requests -#### Custom resource requests +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. -Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. +For example, if the nf-core/scflow pipeline is failing after multiple re-submissions of the `PERFORM_DE` process due to an exit code of `137` this would indicate that there is an out of memory issue. -Whilst these default requirements will hopefully work for most people with most data, you may find that you want to customise the compute resources that the pipeline requests. You can do this by creating a custom config file. For example, to give the workflow process `star` 32GB of memory, you could use the following config: +To bypass this error you would need to find exactly which resources are set by the `SCFLOW_DGE` process. The quickest way is to search for `process SCFLOW_DGE` in the [nf-core/scflow Github repo](https://github.com/nf-core/scflow/search?q=process+SCFLOW_DGE). We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so based on the search results the file we want is `modules/local/process/scflow/dge.nf`. If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/combiz/nf-core-scflow/blob/05c2a0cfc8338bc3d6edf9fafe119438eb230589/conf/base.config#L40) which in this case is defined as 72GB. Providing you haven't set any other standard nf-core parameters to **cap** the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `SCFLOW_DGE` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. ```nextflow process { - withName: star { - memory = 32.GB - } + withName: SCFLOW_DGE { + memory = 100.GB + } } ``` -See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information. +> **NB:** We specify just the process name i.e. `SCFLOW_DGE` in the config file and not the full task name string that is printed to screen in the error message or on the terminal whilst the pipeline is running i.e. `SCFLOW:SCFLOW_DGE`. You may get a warning suggesting that the process selector isn't recognised but you can ignore that if the process name has been specified correctly. This is something that needs to be fixed upstream in core Nextflow. -If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition above). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. +### nf-core/configs + +In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. + +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files. If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). @@ -118,11 +158,11 @@ The Nextflow `-bg` flag launches Nextflow in the background, detached from your Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). -#### Nextflow memory requirements +### Nextflow memory requirements In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): -```bash +```console NXF_OPTS='-Xms1g -Xmx4g' ``` diff --git a/environment.yml b/environment.yml deleted file mode 100644 index bd76ff2..0000000 --- a/environment.yml +++ /dev/null @@ -1,15 +0,0 @@ -# You can use this file to create a conda environment for this pipeline: -# conda env create -f environment.yml -name: nf-core-scflow-1.0dev -channels: - - conda-forge - - bioconda - - defaults -dependencies: - - conda-forge::python=3.7.3 - - conda-forge::markdown=3.1.1 - - conda-forge::pymdown-extensions=6.0 - - conda-forge::pygments=2.5.2 - # TODO nf-core: Add required software dependencies here - - bioconda::fastqc=0.11.8 - - bioconda::multiqc=1.7 diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy new file mode 100755 index 0000000..8d6920d --- /dev/null +++ b/lib/NfcoreSchema.groovy @@ -0,0 +1,517 @@ +// +// This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. +// + +import org.everit.json.schema.Schema +import org.everit.json.schema.loader.SchemaLoader +import org.everit.json.schema.ValidationException +import org.json.JSONObject +import org.json.JSONTokener +import org.json.JSONArray +import groovy.json.JsonSlurper +import groovy.json.JsonBuilder + +class NfcoreSchema { + + // + // Resolve Schema path relative to main workflow directory + // + public static String getSchemaPath(workflow, schema_filename='nextflow_schema.json') { + return "${workflow.projectDir}/${schema_filename}" + } + + // + // Function to loop over all parameters defined in schema and check + // whether the given parameters adhere to the specifications + // + /* groovylint-disable-next-line UnusedPrivateMethodParameter */ + public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') { + def has_error = false + //=====================================================================// + // Check for nextflow core params and unexpected params + def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text + def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') + def nf_params = [ + // Options for base `nextflow` command + 'bg', + 'c', + 'C', + 'config', + 'd', + 'D', + 'dockerize', + 'h', + 'log', + 'q', + 'quiet', + 'syslog', + 'v', + 'version', + + // Options for `nextflow run` command + 'ansi', + 'ansi-log', + 'bg', + 'bucket-dir', + 'c', + 'cache', + 'config', + 'dsl2', + 'dump-channels', + 'dump-hashes', + 'E', + 'entry', + 'latest', + 'lib', + 'main-script', + 'N', + 'name', + 'offline', + 'params-file', + 'pi', + 'plugins', + 'poll-interval', + 'pool-size', + 'profile', + 'ps', + 'qs', + 'queue-size', + 'r', + 'resume', + 'revision', + 'stdin', + 'stub', + 'stub-run', + 'test', + 'w', + 'with-charliecloud', + 'with-conda', + 'with-dag', + 'with-docker', + 'with-mpi', + 'with-notification', + 'with-podman', + 'with-report', + 'with-singularity', + 'with-timeline', + 'with-tower', + 'with-trace', + 'with-weblog', + 'without-docker', + 'without-podman', + 'work-dir' + ] + def unexpectedParams = [] + + // Collect expected parameters from the schema + def expectedParams = [] + for (group in schemaParams) { + for (p in group.value['properties']) { + expectedParams.push(p.key) + } + } + + for (specifiedParam in params.keySet()) { + // nextflow params + if (nf_params.contains(specifiedParam)) { + log.error "ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'" + has_error = true + } + // unexpected params + def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' + def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } + def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() + def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase)) + if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) { + // Temporarily remove camelCase/camel-case params #1035 + def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} + if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ + unexpectedParams.push(specifiedParam) + } + } + } + + //=====================================================================// + // Validate parameters against the schema + InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream() + JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream)) + + // Remove anything that's in params.schema_ignore_params + raw_schema = removeIgnoredParams(raw_schema, params) + + Schema schema = SchemaLoader.load(raw_schema) + + // Clean the parameters + def cleanedParams = cleanParameters(params) + + // Convert to JSONObject + def jsonParams = new JsonBuilder(cleanedParams) + JSONObject params_json = new JSONObject(jsonParams.toString()) + + // Validate + try { + schema.validate(params_json) + } catch (ValidationException e) { + println '' + log.error 'ERROR: Validation of pipeline parameters failed!' + JSONObject exceptionJSON = e.toJSON() + printExceptions(exceptionJSON, params_json, log) + println '' + has_error = true + } + + // Check for unexpected parameters + if (unexpectedParams.size() > 0) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + println '' + def warn_msg = 'Found unexpected parameters:' + for (unexpectedParam in unexpectedParams) { + warn_msg = warn_msg + "\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}" + } + log.warn warn_msg + log.info "- ${colors.dim}Ignore this warning: params.schema_ignore_params = \"${unexpectedParams.join(',')}\" ${colors.reset}" + println '' + } + + if (has_error) { + System.exit(1) + } + } + + // + // Beautify parameters for --help + // + public static String paramsHelp(workflow, params, command, schema_filename='nextflow_schema.json') { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + Integer num_hidden = 0 + String output = '' + output += 'Typical pipeline command:\n\n' + output += " ${colors.cyan}${command}${colors.reset}\n\n" + Map params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + Integer max_chars = paramsMaxChars(params_map) + 1 + Integer desc_indent = max_chars + 14 + Integer dec_linewidth = 160 - desc_indent + for (group in params_map.keySet()) { + Integer num_params = 0 + String group_output = colors.underlined + colors.bold + group + colors.reset + '\n' + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (group_params.get(param).hidden && !params.show_hidden_params) { + num_hidden += 1 + continue; + } + def type = '[' + group_params.get(param).type + ']' + def description = group_params.get(param).description + def defaultValue = group_params.get(param).default ? " [default: " + group_params.get(param).default.toString() + "]" : '' + def description_default = description + colors.dim + defaultValue + colors.reset + // Wrap long description texts + // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap + if (description_default.length() > dec_linewidth){ + List olines = [] + String oline = "" // " " * indent + description_default.split(" ").each() { wrd -> + if ((oline.size() + wrd.size()) <= dec_linewidth) { + oline += wrd + " " + } else { + olines += oline + oline = wrd + " " + } + } + olines += oline + description_default = olines.join("\n" + " " * desc_indent) + } + group_output += " --" + param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\n' + num_params += 1 + } + group_output += '\n' + if (num_params > 0){ + output += group_output + } + } + if (num_hidden > 0){ + output += colors.dim + "!! Hiding $num_hidden params, use --show_hidden_params to show them !!\n" + colors.reset + } + output += NfcoreTemplate.dashedLine(params.monochrome_logs) + return output + } + + // + // Groovy Map summarising parameters/workflow options used by the pipeline + // + public static LinkedHashMap paramsSummaryMap(workflow, params, schema_filename='nextflow_schema.json') { + // Get a selection of core Nextflow workflow options + def Map workflow_summary = [:] + if (workflow.revision) { + workflow_summary['revision'] = workflow.revision + } + workflow_summary['runName'] = workflow.runName + if (workflow.containerEngine) { + workflow_summary['containerEngine'] = workflow.containerEngine + } + if (workflow.container) { + workflow_summary['container'] = workflow.container + } + workflow_summary['launchDir'] = workflow.launchDir + workflow_summary['workDir'] = workflow.workDir + workflow_summary['projectDir'] = workflow.projectDir + workflow_summary['userName'] = workflow.userName + workflow_summary['profile'] = workflow.profile + workflow_summary['configFiles'] = workflow.configFiles.join(', ') + + // Get pipeline parameters defined in JSON Schema + def Map params_summary = [:] + def blacklist = ['hostnames'] + def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + for (group in params_map.keySet()) { + def sub_params = new LinkedHashMap() + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (params.containsKey(param) && !blacklist.contains(param)) { + def params_value = params.get(param) + def schema_value = group_params.get(param).default + def param_type = group_params.get(param).type + if (schema_value != null) { + if (param_type == 'string') { + if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) { + def sub_string = schema_value.replace('\$projectDir', '') + sub_string = sub_string.replace('\${projectDir}', '') + if (params_value.contains(sub_string)) { + schema_value = params_value + } + } + if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) { + def sub_string = schema_value.replace('\$params.outdir', '') + sub_string = sub_string.replace('\${params.outdir}', '') + if ("${params.outdir}${sub_string}" == params_value) { + schema_value = params_value + } + } + } + } + + // We have a default in the schema, and this isn't it + if (schema_value != null && params_value != schema_value) { + sub_params.put(param, params_value) + } + // No default in the schema, and this isn't empty + else if (schema_value == null && params_value != "" && params_value != null && params_value != false) { + sub_params.put(param, params_value) + } + } + } + params_summary.put(group, sub_params) + } + return [ 'Core Nextflow options' : workflow_summary ] << params_summary + } + + // + // Beautify parameters for summary and return as string + // + public static String paramsSummaryLog(workflow, params) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + String output = '' + def params_map = paramsSummaryMap(workflow, params) + def max_chars = paramsMaxChars(params_map) + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + if (group_params) { + output += colors.bold + group + colors.reset + '\n' + for (param in group_params.keySet()) { + output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n' + } + output += '\n' + } + } + output += "!! Only displaying parameters that differ from the pipeline defaults !!\n" + output += NfcoreTemplate.dashedLine(params.monochrome_logs) + return output + } + + // + // Loop over nested exceptions and print the causingException + // + private static void printExceptions(ex_json, params_json, log) { + def causingExceptions = ex_json['causingExceptions'] + if (causingExceptions.length() == 0) { + def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/ + // Missing required param + if (m.matches()) { + log.error "* Missing required parameter: --${m[0][1]}" + } + // Other base-level error + else if (ex_json['pointerToViolation'] == '#') { + log.error "* ${ex_json['message']}" + } + // Error with specific param + else { + def param = ex_json['pointerToViolation'] - ~/^#\// + def param_val = params_json[param].toString() + log.error "* --${param}: ${ex_json['message']} (${param_val})" + } + } + for (ex in causingExceptions) { + printExceptions(ex, params_json, log) + } + } + + // + // Remove an element from a JSONArray + // + private static JSONArray removeElement(json_array, element) { + def list = [] + int len = json_array.length() + for (int i=0;i + if(raw_schema.keySet().contains('definitions')){ + raw_schema.definitions.each { definition -> + for (key in definition.keySet()){ + if (definition[key].get("properties").keySet().contains(ignore_param)){ + // Remove the param to ignore + definition[key].get("properties").remove(ignore_param) + // If the param was required, change this + if (definition[key].has("required")) { + def cleaned_required = removeElement(definition[key].required, ignore_param) + definition[key].put("required", cleaned_required) + } + } + } + } + } + if(raw_schema.keySet().contains('properties') && raw_schema.get('properties').keySet().contains(ignore_param)) { + raw_schema.get("properties").remove(ignore_param) + } + if(raw_schema.keySet().contains('required') && raw_schema.required.contains(ignore_param)) { + def cleaned_required = removeElement(raw_schema.required, ignore_param) + raw_schema.put("required", cleaned_required) + } + } + return raw_schema + } + + // + // Clean and check parameters relative to Nextflow native classes + // + private static Map cleanParameters(params) { + def new_params = params.getClass().newInstance(params) + for (p in params) { + // remove anything evaluating to false + if (!p['value']) { + new_params.remove(p.key) + } + // Cast MemoryUnit to String + if (p['value'].getClass() == nextflow.util.MemoryUnit) { + new_params.replace(p.key, p['value'].toString()) + } + // Cast Duration to String + if (p['value'].getClass() == nextflow.util.Duration) { + new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day")) + } + // Cast LinkedHashMap to String + if (p['value'].getClass() == LinkedHashMap) { + new_params.replace(p.key, p['value'].toString()) + } + } + return new_params + } + + // + // This function tries to read a JSON params file + // + private static LinkedHashMap paramsLoad(String json_schema) { + def params_map = new LinkedHashMap() + try { + params_map = paramsRead(json_schema) + } catch (Exception e) { + println "Could not read parameters settings from JSON. $e" + params_map = new LinkedHashMap() + } + return params_map + } + + // + // Method to actually read in JSON file using Groovy. + // Group (as Key), values are all parameters + // - Parameter1 as Key, Description as Value + // - Parameter2 as Key, Description as Value + // .... + // Group + // - + private static LinkedHashMap paramsRead(String json_schema) throws Exception { + def json = new File(json_schema).text + def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') + def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') + /* Tree looks like this in nf-core schema + * definitions <- this is what the first get('definitions') gets us + group 1 + title + description + properties + parameter 1 + type + description + parameter 2 + type + description + group 2 + title + description + properties + parameter 1 + type + description + * properties <- parameters can also be ungrouped, outside of definitions + parameter 1 + type + description + */ + + // Grouped params + def params_map = new LinkedHashMap() + schema_definitions.each { key, val -> + def Map group = schema_definitions."$key".properties // Gets the property object of the group + def title = schema_definitions."$key".title + def sub_params = new LinkedHashMap() + group.each { innerkey, value -> + sub_params.put(innerkey, value) + } + params_map.put(title, sub_params) + } + + // Ungrouped params + def ungrouped_params = new LinkedHashMap() + schema_properties.each { innerkey, value -> + ungrouped_params.put(innerkey, value) + } + params_map.put("Other parameters", ungrouped_params) + + return params_map + } + + // + // Get maximum number of characters across all parameter names + // + private static Integer paramsMaxChars(params_map) { + Integer max_chars = 0 + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (param.size() > max_chars) { + max_chars = param.size() + } + } + } + return max_chars + } +} diff --git a/lib/NfcoreTemplate.groovy b/lib/NfcoreTemplate.groovy new file mode 100755 index 0000000..44551e0 --- /dev/null +++ b/lib/NfcoreTemplate.groovy @@ -0,0 +1,270 @@ +// +// This file holds several functions used within the nf-core pipeline template. +// + +import org.yaml.snakeyaml.Yaml + +class NfcoreTemplate { + + // + // Check AWS Batch related parameters have been specified correctly + // + public static void awsBatch(workflow, params) { + if (workflow.profile.contains('awsbatch')) { + // Check params.awsqueue and params.awsregion have been set if running on AWSBatch + assert (params.awsqueue && params.awsregion) : "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + assert params.outdir.startsWith('s3:') : "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + } + } + + // + // Check params.hostnames + // + public static void hostName(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (params.hostnames) { + try { + def hostname = "hostname".execute().text.trim() + params.hostnames.each { prof, hnames -> + hnames.each { hname -> + if (hostname.contains(hname) && !workflow.profile.contains(prof)) { + log.info "=${colors.yellow}====================================================${colors.reset}=\n" + + "${colors.yellow}WARN: You are running with `-profile $workflow.profile`\n" + + " but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\n" + + " ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\n" + + "=${colors.yellow}====================================================${colors.reset}=" + } + } + } + } catch (Exception e) { + log.warn "[$workflow.manifest.name] Could not determine 'hostname' - skipping check. Reason: ${e.message}." + } + } + } + + // + // Construct and send completion email + // + public static void email(workflow, params, summary_params, projectDir, log, multiqc_report=[]) { + + // Set up the e-mail variables + def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + if (!workflow.success) { + subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + } + + def summary = [:] + for (group in summary_params.keySet()) { + summary << summary_params[group] + } + + def misc_fields = [:] + misc_fields['Date Started'] = workflow.start + misc_fields['Date Completed'] = workflow.complete + misc_fields['Pipeline script file path'] = workflow.scriptFile + misc_fields['Pipeline script hash ID'] = workflow.scriptId + if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build + misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + + def email_fields = [:] + email_fields['version'] = workflow.manifest.version + email_fields['runName'] = workflow.runName + email_fields['success'] = workflow.success + email_fields['dateComplete'] = workflow.complete + email_fields['duration'] = workflow.duration + email_fields['exitStatus'] = workflow.exitStatus + email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') + email_fields['errorReport'] = (workflow.errorReport ?: 'None') + email_fields['commandLine'] = workflow.commandLine + email_fields['projectDir'] = workflow.projectDir + email_fields['summary'] = summary << misc_fields + + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { + if (mqc_report.size() > 1) { + log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + } + mqc_report = mqc_report[0] + } + } + } catch (all) { + if (multiqc_report) { + log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + } + } + + // Check if we are only sending emails on failure + def email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + + // Render the TXT template + def engine = new groovy.text.GStringTemplateEngine() + def tf = new File("$projectDir/assets/email_template.txt") + def txt_template = engine.createTemplate(tf).make(email_fields) + def email_txt = txt_template.toString() + + // Render the HTML template + def hf = new File("$projectDir/assets/email_template.html") + def html_template = engine.createTemplate(hf).make(email_fields) + def email_html = html_template.toString() + + // Render the sendmail template + def max_multiqc_email_size = params.max_multiqc_email_size as nextflow.util.MemoryUnit + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def sf = new File("$projectDir/assets/sendmail_template.txt") + def sendmail_template = engine.createTemplate(sf).make(smail_fields) + def sendmail_html = sendmail_template.toString() + + // Send the HTML e-mail + Map colors = logColours(params.monochrome_logs) + if (email_address) { + try { + if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" + } catch (all) { + // Catch failures and try with plaintext + def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + if ( mqc_report.size() <= max_multiqc_email_size.toBytes() ) { + mail_cmd += [ '-A', mqc_report ] + } + mail_cmd.execute() << email_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + } + } + + // Write summary e-mail HTML to a file + def output_d = new File("${params.outdir}/pipeline_info/") + if (!output_d.exists()) { + output_d.mkdirs() + } + def output_hf = new File(output_d, "pipeline_report.html") + output_hf.withWriter { w -> w << email_html } + def output_tf = new File(output_d, "pipeline_report.txt") + output_tf.withWriter { w -> w << email_txt } + } + + // + // Print pipeline summary on completion + // + public static void summary(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (workflow.success) { + if (workflow.stats.ignoredCount == 0) { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" + } else { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + } + } else { + hostName(workflow, params, log) + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + } + + // + // ANSII Colours used for terminal logging + // + public static Map logColours(Boolean monochrome_logs) { + Map colorcodes = [:] + + // Reset / Meta + colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" + colorcodes['bold'] = monochrome_logs ? '' : "\033[1m" + colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" + colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m" + colorcodes['blink'] = monochrome_logs ? '' : "\033[5m" + colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m" + colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" + + // Regular Colors + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + + // Bold + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + + // Underline + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + + // High Intensity + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + + // Bold High Intensity + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + + return colorcodes + } + + // + // Does what is says on the tin + // + public static String dashedLine(monochrome_logs) { + Map colors = logColours(monochrome_logs) + return "-${colors.dim}----------------------------------------------------${colors.reset}-" + } + + // + // nf-core logo + // + public static String logo(workflow, monochrome_logs) { + Map colors = logColours(monochrome_logs) + String.format( + """\n + ${dashedLine(monochrome_logs)} + ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} + ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} + ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} + ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} + ${colors.green}`._,._,\'${colors.reset} + ${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset} + ${dashedLine(monochrome_logs)} + """.stripIndent() + ) + } +} diff --git a/lib/Utils.groovy b/lib/Utils.groovy new file mode 100755 index 0000000..18173e9 --- /dev/null +++ b/lib/Utils.groovy @@ -0,0 +1,47 @@ +// +// This file holds several Groovy functions that could be useful for any Nextflow pipeline +// + +import org.yaml.snakeyaml.Yaml + +class Utils { + + // + // When running with -profile conda, warn if channels have not been set-up appropriately + // + public static void checkCondaChannels(log) { + Yaml parser = new Yaml() + def channels = [] + try { + def config = parser.load("conda config --show channels".execute().text) + channels = config.channels + } catch(NullPointerException | IOException e) { + log.warn "Could not verify conda channel configuration." + return + } + + // Check that all channels are present + def required_channels = ['conda-forge', 'bioconda', 'defaults'] + def conda_check_failed = !required_channels.every { ch -> ch in channels } + + // Check that they are in the right order + conda_check_failed |= !(channels.indexOf('conda-forge') < channels.indexOf('bioconda')) + conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults')) + + if (conda_check_failed) { + log.warn "=============================================================================\n" + + " There is a problem with your Conda configuration!\n\n" + + " You will need to set-up the conda-forge and bioconda channels correctly.\n" + + " Please refer to https://bioconda.github.io/user/install.html#set-up-channels\n" + + " NB: The order of the channels matters!\n" + + "===================================================================================" + } + } + + // + // Join module args with appropriate spacing + // + public static String joinModuleArgs(args_list) { + return ' ' + args_list.join(' ') + } +} diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy new file mode 100755 index 0000000..3a0d09d --- /dev/null +++ b/lib/WorkflowMain.groovy @@ -0,0 +1,94 @@ +// +// This file holds several functions specific to the main.nf workflow in the nf-core/scflow pipeline +// + +class WorkflowMain { + + // + // Citation string for pipeline + // + public static String citation(workflow) { + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + + // TODO nf-core: Add Zenodo DOI for pipeline after first release + //"* The pipeline\n" + + //" https://doi.org/10.5281/zenodo.XXXXXXX\n\n" + + "* The nf-core framework\n" + + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + + "* Software dependencies\n" + + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + } + + // + // Print help to screen if required + // + public static String help(workflow, params, log) { + def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh37 -profile docker" + def help_string = '' + help_string += NfcoreTemplate.logo(workflow, params.monochrome_logs) + help_string += NfcoreSchema.paramsHelp(workflow, params, command) + help_string += '\n' + citation(workflow) + '\n' + help_string += NfcoreTemplate.dashedLine(params.monochrome_logs) + return help_string + } + + // + // Print parameter summary log to screen + // + public static String paramsSummaryLog(workflow, params, log) { + def summary_log = '' + summary_log += NfcoreTemplate.logo(workflow, params.monochrome_logs) + summary_log += NfcoreSchema.paramsSummaryLog(workflow, params) + summary_log += '\n' + citation(workflow) + '\n' + summary_log += NfcoreTemplate.dashedLine(params.monochrome_logs) + return summary_log + } + + // + // Validate parameters and print summary to screen + // + public static void initialise(workflow, params, log) { + // Print help to screen if required + if (params.help) { + log.info help(workflow, params, log) + System.exit(0) + } + + // Validate workflow parameters via the JSON schema + if (params.validate_params) { + NfcoreSchema.validateParameters(workflow, params, log) + } + + // Print parameter summary log to screen + log.info paramsSummaryLog(workflow, params, log) + + // Check that conda channels are set-up correctly + if (params.enable_conda) { + Utils.checkCondaChannels(log) + } + + // Check AWS batch settings + NfcoreTemplate.awsBatch(workflow, params) + + // Check the hostnames against configured profiles + NfcoreTemplate.hostName(workflow, params, log) + + // Check input has been provided + if (!params.input) { + log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'" + System.exit(1) + } + } + + // + // Get attribute from genome config file e.g. fasta + // + public static String getGenomeAttribute(params, attribute) { + def val = '' + if (params.genomes && params.genome && params.genomes.containsKey(params.genome)) { + if (params.genomes[ params.genome ].containsKey(attribute)) { + val = params.genomes[ params.genome ][ attribute ] + } + } + return val + } +} diff --git a/lib/WorkflowScflow.groovy b/lib/WorkflowScflow.groovy new file mode 100755 index 0000000..67f0a7f --- /dev/null +++ b/lib/WorkflowScflow.groovy @@ -0,0 +1,21 @@ +// +// This file holds several functions specific to the workflow/scflow.nf in the nf-core/scflow pipeline +// +class WorkflowScflow { + + // + // Check and validate parameters + // + public static void initialise(params, log) { + + if (params.dge_mast_method == 'glmer' && params.dge_random_effects_var == 'NULL') { + log.error "The glmer method requires a random effects variable." + System.exit(1) + } + + if (params.dge_mast_method == 'glm' && params.dge_random_effects_var != 'NULL') { + log.error "The glm method can not fit a random effects variable." + System.exit(1) + } + } +} diff --git a/lib/nfcore_external_java_deps.jar b/lib/nfcore_external_java_deps.jar new file mode 100644 index 0000000..805c8bb Binary files /dev/null and b/lib/nfcore_external_java_deps.jar differ diff --git a/main.nf b/main.nf index 7803a28..c9118ac 100644 --- a/main.nf +++ b/main.nf @@ -1,1020 +1,32 @@ #!/usr/bin/env nextflow -nextflow.preview.dsl=2 - /* ======================================================================================== - nf-core/scflow + nf-core/scflow ======================================================================================== - nf-core/scflow Analysis Pipeline. - #### Homepage / Documentation - https://github.com/nf-core/scflow + Github : https://github.com/nf-core/scflow + Website: https://nf-co.re/scflow + Slack : https://nfcore.slack.com/channels/scflow ---------------------------------------------------------------------------------------- */ -include { getSoftwareName;initOptions;getPathFromList;saveFiles } from './modules/local/process/functions.nf' -params.options = [:] - -def helpMessage() { - // TODO nf-core: Add to this help message with new command line parameters - log.info nfcoreHeader() - log.info""" - - Usage: - - The typical command for running the pipeline is as follows: - - nextflow run nf-core/scflow --manifest "refs/Manifest.txt" --samplesheet "refs/SampleSheet.tsv" -params-file conf/nfx-params.json - - Mandatory arguments: - --manifest [file] Path to Manifest.txt file (must be surrounded with quotes) - --samplesheet [file] Path to SampleSheet.tsv file (must be surrounded with quotes) - -params-file [file] Path to nfx-params.json file - -profile [str] Configuration profile to use. Can use multiple (comma separated) - Available: conda, docker, singularity, test, awsbatch, and more - TODO: This feature will be available when configs are submitted to nf-core - - References If not specified in the configuration file or you wish to overwrite any of the references - --ensembl_mappings [file] Path to ensembl_mappings file - --ctd_folder [path] Path to the folder containing .ctd files for celltype annotation - - Other options: - --outdir [file] The output directory where the results will be saved - --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful - -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic - - AWSBatch options: - --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch - --awsregion [str] The AWS Region for your AWS Batch job to run on - --awscli [str] Path to the AWS CLI tool - """.stripIndent() -} - -// Show help message -if (params.help) { - helpMessage() - exit 0 -} - -/* - * SET UP CONFIGURATION VARIABLES - */ - -// Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name -custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - custom_runName = workflow.runName -} - -// Check AWS batch settings -if (workflow.profile.contains('awsbatch')) { - // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" - // Check outdir paths to be S3 buckets if running on AWSBatch - // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" - // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (params.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." -} - -/* - * Create a channel for input read files - */ - if (params.manifest) { ch_manifest = file(params.manifest, checkIfExists: true) } - if (params.samplesheet) { ch_samplesheet = file(params.samplesheet, checkIfExists: true) } - if (params.samplesheet) { ch_samplesheet2 = file(params.samplesheet, checkIfExists: true) } // copy for qc - if (params.ctd_folder) { ch_ctd_folder = file(params.ctd_folder, checkIfExists: true) } - if (params.celltype_mappings) { ch_celltype_mappings = file(params.celltype_mappings, checkIfExists: false) } - if (params.ensembl_mappings) { ch_ensembl_mappings = file(params.ensembl_mappings, checkIfExists: false) } - if (params.ensembl_mappings) { ch_ensembl_mappings2 = file(params.ensembl_mappings, checkIfExists: false) } - if (params.ensembl_mappings) { ch_ensembl_mappings3 = file(params.ensembl_mappings, checkIfExists: false) } - if (params.reddim_genes_yml) { ch_reddim_genes_yml = file(params.reddim_genes_yml, checkIfExists: false) } - -// Header log info -log.info nfcoreHeader() -def summary = [:] -if (workflow.revision) summary['Pipeline Release'] = workflow.revision -summary['Run Name'] = custom_runName ?: workflow.runName -summary['Manifest'] = params.manifest -summary['SampleSheet'] = params.samplesheet -summary['Run EmptyDrops'] = params.amb_find_cells ? "Yes" : "No" -summary['Find Singlets'] = params.mult_find_singlets ? "Yes ($params.singlets.singlets_method)" : 'No' -summary['Dimension Reds.'] = params.reddim_reduction_methods -summary['Clustering Input'] = params.clust_reduction_method -summary['DGE Method'] = params.dge_de_method == "MASTZLM" ? "$params.dge_de_method ($params.dge_mast_method)": "$params.dge_de_method" -summary['DGE Dependent Var']= params.dge_dependent_var -summary['DGE Ref Class'] = params.dge_ref_class -summary['DGE Confound Vars']= params.dge_confounding_vars -summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" -summary['Output dir'] = params.outdir -summary['Launch dir'] = workflow.launchDir -summary['Working dir'] = workflow.workDir -summary['Script dir'] = workflow.projectDir -summary['User'] = workflow.userName -if (workflow.profile.contains('awsbatch')) { - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue - summary['AWS CLI'] = params.awscli -} -summary['Config Profile'] = workflow.profile -if (params.config_profile_description) summary['Config Profile Description'] = params.config_profile_description -if (params.config_profile_contact) summary['Config Profile Contact'] = params.config_profile_contact -if (params.config_profile_url) summary['Config Profile URL'] = params.config_profile_url -summary['Config Files'] = workflow.configFiles.join(', ') -if (params.email || params.email_on_fail) { - summary['E-mail Address'] = params.email - summary['E-mail on failure'] = params.email_on_fail - summary['MultiQC maxsize'] = params.max_multiqc_email_size -} -log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "-\033[2m--------------------------------------------------\033[0m-" - -// Check the hostnames against configured profiles -checkHostname() - -Channel.from(summary.collect{ [it.key, it.value] }) - .map { k,v -> "
$k
${v ?: 'N/A'}
" } - .reduce { a, b -> return [a, b].join("\n ") } - .map { x -> """ - id: 'nf-core-scflow-summary' - description: " - this information is collected when the pipeline is started." - section_name: 'nf-core/scflow Workflow Summary' - section_href: 'https://github.com/nf-core/scflow' - plot_type: 'html' - data: | -
- $x -
- """.stripIndent() } - .set { ch_workflow_summary } - -/* - * Parse software version numbers - */ -process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, - saveAs: { filename -> - if (filename.indexOf(".csv") > 0) filename - else null - } - - output: - file 'software_versions_mqc.yaml' into ch_software_versions_yaml - file "software_versions.csv" - - script: - // TODO nf-core: Get all tools to print their version number here - """ - echo $workflow.manifest.version > v_pipeline.txt - echo $workflow.nextflow.version > v_nextflow.txt - R --version > v_R.txt - scrape_software_versions.py &> software_versions_mqc.yaml - """ -} - -/* - * Check manifest and samplesheet inputs are valid - */ - process SCFLOW_CHECK_INPUTS { - - tag 'SCFLOW_CHECK_INPUTS' - label 'process_tiny' - //container 'google/cloud-sdk:alpine' - - echo true - - input: - path manifest - path samplesheet - - output: - path 'checked_manifest.txt', emit: checked_manifest - - script: - - """ - check_inputs.r \ - --samplesheet $samplesheet \ - --manifest $manifest - """ - -} - -/* - * Single Sample QC - */ -process SCFLOW_QC { - - tag "${key}" - label 'process_low' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - errorStrategy { task.attempt <= 3 ? 'retry' : 'finish' } - maxRetries 3 - - echo false - - input: - tuple val(key), path(mat_path) - path samplesheet - path ensembl_mappings - - output: - path 'qc_report/*.html' , emit: qc_report - path 'qc_plot_data/*.tsv' , emit: qc_plot_data - path 'qc_summary/*.tsv' , emit: qc_summary - path 'qc_plots/*.png' , emit: qc_plots - path '*_sce' , emit: qc_sce - - script: - """ - scflow_qc.r \ - --samplesheet ${samplesheet} \ - --mat_path ${mat_path} \ - --key ${key} \ - --key_colname ${params.qc_key_colname} \ - --factor_vars ${params.qc_factor_vars} \ - --ensembl_mappings ${ensembl_mappings} \ - --min_library_size ${params.qc_min_library_size} \ - --max_library_size ${params.qc_max_library_size} \ - --min_features ${params.qc_min_features} \ - --max_features ${params.qc_max_features} \ - --max_mito ${params.qc_max_mito} \ - --min_ribo ${params.qc_min_ribo} \ - --max_ribo ${params.qc_max_ribo} \ - --min_counts ${params.qc_min_counts} \ - --min_cells ${params.qc_min_cells} \ - --drop_unmapped ${params.qc_drop_unmapped} \ - --drop_mito ${params.qc_drop_mito} \ - --drop_ribo ${params.qc_drop_ribo} \ - --nmads ${params.qc_nmads} \ - --find_singlets ${params.mult_find_singlets} \ - --singlets_method ${params.mult_singlets_method} \ - --vars_to_regress_out ${params.mult_vars_to_regress_out} \ - --pca_dims ${params.mult_pca_dims} \ - --var_features ${params.mult_var_features} \ - --doublet_rate ${params.mult_doublet_rate} \ - --pK ${params.mult_pK} \ - --find_cells ${params.amb_find_cells} \ - --lower ${params.amb_lower} \ - --retain ${params.amb_retain} \ - --alpha_cutoff ${params.amb_alpha_cutoff} \ - --niters ${params.amb_niters} \ - --expect_cells ${params.amb_expect_cells} - - """ -} - -/* - * Merge individual quality-control tsv summaries into combined tsv file - */ -process SCFLOW_MERGE_QC_SUMMARIES { - - tag 'merged' - label 'process_tiny' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - input: - path qcs_tsv - - output: - path '*.tsv', emit: qc_summary - - script: - """ - - merge_tables.r \ - --filepaths ${qcs_tsv.join(',')} - - """ - -} - -/* - * Merge quality-control passed SCEs - */ -process SCFLOW_MERGE { - - tag 'merged' - label 'process_medium' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - input: - path qc_passed_sces - path ensembl_mappings - - output: - path 'merged_sce/' , emit: merged_sce - path 'merge_plots/*.png' , emit: merge_plots - path 'merge_summary_plots/*.png', emit: merge_summary_plots - path 'merged_report/*.html' , emit: merged_report - - script: - """ - - scflow_merge.r \ - --sce_paths ${qc_passed_sces.join(',')} \ - --ensembl_mappings ${ensembl_mappings} \ - --unique_id_var ${params.qc_key_colname} \ - --plot_vars ${params.merge_plot_vars} \ - --facet_vars ${params.merge_facet_vars} \ - --outlier_vars ${params.merge_outlier_vars} - - """ - -} - -/* - * Integrate data for batch-effect correction - */ -process SCFLOW_INTEGRATE { - - tag 'merged' - label 'process_medium' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - input: - path sce - - output: - path 'integrated_sce/', emit: integrated_sce - - script: - """ - - scflow_integrate.r \ - --sce_path ${sce} \ - --method ${params.integ_method} \ - --unique_id_var ${params.integ_unique_id_var} \ - --take_gene_union ${params.integ_take_gene_union} \ - --remove_missing ${params.integ_remove_missing} \ - --num_genes ${params.integ_num_genes} \ - --combine ${params.integ_combine} \ - --keep_unique ${params.integ_keep_unique} \ - --capitalize ${params.integ_capitalize} \ - --use_cols ${params.integ_use_cols} \ - --k ${params.integ_k} \ - --lambda ${params.integ_lambda} \ - --thresh ${params.integ_thresh} \ - --max_iters ${params.integ_max_iters} \ - --nrep ${params.integ_nrep} \ - --rand_seed ${params.integ_rand_seed} \ - --knn_k ${params.integ_knn_k} \ - --k2 ${params.integ_k2} \ - --prune_thresh ${params.integ_prune_thresh} \ - --ref_dataset ${params.integ_ref_dataset} \ - --min_cells ${params.integ_min_cells} \ - --quantiles ${params.integ_quantiles} \ - --nstart ${params.integ_nstart} \ - --resolution ${params.integ_resolution} \ - --dims_use ${params.integ_dims_use} \ - --dist_use ${params.integ_dist_use} \ - --center ${params.integ_center} \ - --small_clust_thresh ${params.integ_small_clust_thresh} - - """ - -} - -/* - * Perform dimensionality reduction - */ -process SCFLOW_REDUCE_DIMS { - - tag 'merged' - label 'process_medium' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - input: - path sce - - output: - path 'reddim_sce/', emit: reddim_sce - - script: - """ - - scflow_reduce_dims.r \ - --sce_path ${sce} \ - --input_reduced_dim ${params.reddim_input_reduced_dim} \ - --reduction_methods ${params.reddim_reduction_methods} \ - --vars_to_regress_out ${params.reddim_vars_to_regress_out} \ - --pca_dims ${params.reddim_umap_pca_dims} \ - --n_neighbors ${params.reddim_umap_n_neighbors} \ - --n_components ${params.reddim_umap_n_components} \ - --init ${params.reddim_umap_init} \ - --metric ${params.reddim_umap_metric} \ - --n_epochs ${params.reddim_umap_n_epochs} \ - --learning_rate ${params.reddim_umap_learning_rate} \ - --min_dist ${params.reddim_umap_min_dist} \ - --spread ${params.reddim_umap_spread} \ - --set_op_mix_ratio ${params.reddim_umap_set_op_mix_ratio} \ - --local_connectivity ${params.reddim_umap_local_connectivity} \ - --repulsion_strength ${params.reddim_umap_repulsion_strength} \ - --negative_sample_rate ${params.reddim_umap_negative_sample_rate} \ - --fast_sgd ${params.reddim_umap_fast_sgd} \ - --dims ${params.reddim_tsne_dims} \ - --initial_dims ${params.reddim_tsne_initial_dims} \ - --perplexity ${params.reddim_tsne_perplexity} \ - --theta ${params.reddim_tsne_theta} \ - --stop_lying_iter ${params.reddim_tsne_stop_lying_iter} \ - --mom_switch_iter ${params.reddim_tsne_mom_switch_iter} \ - --max_iter ${params.reddim_tsne_max_iter} \ - --pca_center ${params.reddim_tsne_pca_center} \ - --pca_scale ${params.reddim_tsne_pca_scale} \ - --normalize ${params.reddim_tsne_normalize} \ - --momentum ${params.reddim_tsne_momentum} \ - --final_momentum ${params.reddim_tsne_final_momentum} \ - --eta ${params.reddim_tsne_eta} \ - --exaggeration_factor ${params.reddim_tsne_exaggeration_factor} - - - """ - -} - -/* - * Cluster cells - */ -process SCFLOW_CLUSTER { - - tag 'merged' - label 'process_low' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - input: - path sce - - output: - path 'clustered_sce/' , emit: clustered_sce - - script: - """ - - scflow_cluster.r \ - --sce_path ${sce} \ - --cluster_method ${params.clust_cluster_method} \ - --reduction_method ${params.clust_reduction_method} \ - --res ${params.clust_res} \ - --k ${params.clust_k} \ - --louvain_iter ${params.clust_louvain_iter} - - """ - -} - +nextflow.enable.dsl = 2 /* - * Generate integration report - */ -process SCFLOW_REPORT_INTEGRATED { - - tag "merged" - label 'process_medium' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - input: - path( sce ) - - output: - path 'integration_report/', emit: integration_report - - script: - """ - scflow_report_integrated.r \ - --sce_path ${sce} \ - --categorical_covariates ${params.integ_categorical_covariates} \ - --input_reduced_dim ${params.integ_input_reduced_dim} - """ - -} - - -/* - * Annotate cluster celltypes - */ -process SCFLOW_MAP_CELLTYPES { - - tag 'merged' - label 'process_low' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - input: - path sce - path ctd_folder - - output: - path 'celltype_mapped_sce/' , emit: celltype_mapped_sce - path 'celltype_mappings.tsv', emit: celltype_mappings - - script: - """ - - scflow_map_celltypes.r \ - --sce_path ${sce} \ - --ctd_folder ${ctd_folder} \ - --clusters_colname ${params.cta_clusters_colname} \ - --cells_to_sample ${params.cta_cells_to_sample} - - """ - -} - -/* - * Generate final SCE with optionally revised cell-types - */ -process SCFLOW_FINALIZE { - - tag 'merged' - label 'process_high' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - echo true - - input: - path sce - path celltype_mappings - - output: - path 'final_sce/' , emit: final_sce - path 'celltypes.tsv' , emit: celltypes - path 'celltype_metrics_report' , emit: celltype_metrics_report - - - script: - - """ - scflow_finalize_sce.r \ - --sce_path ${sce} \ - --celltype_mappings ${celltype_mappings} \ - --clusters_colname ${params.cta_clusters_colname} \ - --celltype_var ${params.cta_celltype_var} \ - --unique_id_var ${params.cta_unique_id_var} \ - --facet_vars ${params.cta_facet_vars} \ - --input_reduced_dim ${params.clust_reduction_method} \ - --metric_vars ${params.cta_metric_vars} - """ - -} - -/* - * Generate 2D reduced dimension plots of gene expression - */ -process SCFLOW_PLOT_REDDIM_GENES { - - label 'process_low' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - input: - path sce - path reddim_genes_yml - - output: - path 'reddim_gene_plots/', emit: reddim_gene_plots - - script: - - """ - scflow_plot_reddim_genes.r \ - --sce ${sce} \ - --reduction_methods ${params.plotreddim_reduction_methods} \ - --reddim_genes_yml ${reddim_genes_yml} - - """ -} - -/* - * Perform differential gene expression - */ -process SCFLOW_DGE { - - tag "${celltype} (${n_cells_str} cells) | ${de_method}" - label 'process_medium' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - maxRetries 3 - - input: - path sce - each de_method - each ct_tuple - path ensembl_mappings - - output: - path 'de_table/*.tsv' , emit: de_table, optional: true - path 'de_report/*.html' , emit: de_report - path 'de_plot/*.png' , emit: de_plot - path 'de_plot_data/*.tsv' , emit: de_plot_data - - script: - celltype = ct_tuple[0] - n_cells = ct_tuple[1].toInteger() - n_cells_str = (Math.round(n_cells * 100) / 100000).round(1).toString() + 'k' - - """ - echo "celltype: ${celltype} n_cells: ${n_cells_str}" - scflow_dge.r \ - --sce ${sce} \ - --celltype ${celltype} \ - --de_method ${de_method} \ - --mast_method ${params.dge_mast_method} \ - --min_counts ${params.dge_min_counts} \ - --min_cells_pc ${params.dge_min_cells_pc} \ - --rescale_numerics ${params.dge_rescale_numerics} \ - --force_run ${params.dge_force_run} \ - --pseudobulk ${params.dge_pseudobulk} \ - --celltype_var ${params.dge_celltype_var} \ - --sample_var ${params.dge_sample_var} \ - --dependent_var ${params.dge_dependent_var} \ - --ref_class ${params.dge_ref_class} \ - --confounding_vars ${params.dge_confounding_vars} \ - --random_effects_var ${params.dge_random_effects_var} \ - --fc_threshold ${params.dge_fc_threshold} \ - --ensembl_mappings ${ensembl_mappings} - - """ -} - -/* - * Integrated pathway analysis of differentially expressed genes - */ -process SCFLOW_IPA { - - label 'process_low' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - input: - path de_table - //each de_method - //each celltype +======================================================================================== + VALIDATE & PRINT PARAMETER SUMMARY +======================================================================================== +*/ - output: - path 'ipa/**/*' , emit: ipa_results , optional: true, type: 'dir' - path 'ipa/*.html' , emit: ipa_report , optional: true +WorkflowMain.initialise(workflow, params, log) - script: - """ - scflow_ipa.r \ - --gene_file ${de_table.join(',')} \ - --reference_file ${params.ipa_reference_file} \ - --enrichment_tool ${params.ipa_enrichment_tool} \ - --enrichment_method ${params.ipa_enrichment_method} \ - --enrichment_database ${params.ipa_enrichment_database} - - """ +// Execute a single named workflow +include { SCFLOW } from './workflows/scflow' +workflow { + SCFLOW () } /* - * Dirichlet modeling of relative cell-type abundance - */ -process SCFLOW_DIRICHLET { - - tag "merged" - label 'process_low' - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } - - - echo true - - input: - path sce - - output: - path 'dirichlet_report', emit: dirichlet_report - - script: - """ - scflow_dirichlet.r \ - --sce_path ${sce} \ - --unique_id_var ${params.dirich_unique_id_var} \ - --celltype_var ${params.dirich_celltype_var} \ - --dependent_var ${params.dirich_dependent_var} \ - --ref_class ${params.dirich_ref_class} \ - --var_order ${params.dirich_var_order} - """ - -} - - -workflow { - - main: - SCFLOW_CHECK_INPUTS ( - ch_manifest, - ch_samplesheet - ) - - SCFLOW_QC ( - SCFLOW_CHECK_INPUTS.out.checked_manifest.splitCsv( - header:['key', 'filepath'], - skip: 1, sep: '\t' - ) - .map { row -> tuple(row.key, row.filepath) }, - ch_samplesheet2, - ch_ensembl_mappings - ) - - SCFLOW_MERGE_QC_SUMMARIES ( - SCFLOW_QC.out.qc_summary.collect() - ) - - SCFLOW_MERGE ( - SCFLOW_QC.out.qc_sce.collect(), - ch_ensembl_mappings2 - ) - - SCFLOW_INTEGRATE ( - SCFLOW_MERGE.out.merged_sce - ) - - SCFLOW_REDUCE_DIMS ( - SCFLOW_INTEGRATE.out.integrated_sce - ) - - SCFLOW_CLUSTER ( - SCFLOW_REDUCE_DIMS.out.reddim_sce - ) - - SCFLOW_REPORT_INTEGRATED ( - SCFLOW_CLUSTER.out.clustered_sce - ) - - SCFLOW_MAP_CELLTYPES ( - SCFLOW_CLUSTER.out.clustered_sce, - ch_ctd_folder - ) - - SCFLOW_FINALIZE ( - SCFLOW_MAP_CELLTYPES.out.celltype_mapped_sce, - ch_celltype_mappings - ) - - SCFLOW_DGE ( - SCFLOW_FINALIZE.out.final_sce, - params.dge_de_method, - SCFLOW_FINALIZE.out.celltypes.splitCsv ( - header:['celltype', 'n_cells'], skip: 1, sep: '\t' - ) - .map { row -> tuple(row.celltype, row.n_cells) }, - ch_ensembl_mappings3 - ) - - SCFLOW_IPA ( - SCFLOW_DGE.out.de_table - ) - - SCFLOW_DIRICHLET ( - SCFLOW_FINALIZE.out.final_sce - ) - - SCFLOW_PLOT_REDDIM_GENES ( - SCFLOW_CLUSTER.out.clustered_sce, - ch_reddim_genes_yml - ) - - /* - publish: - SCFLOW_CHECK_INPUTS.out.checked_manifest to: "$params.outdir/", mode: 'copy', overwrite: 'true' - // Quality-control - SCFLOW_QC.out.qc_report to: "$params.outdir/Reports/", mode: 'copy', overwrite: 'true' - SCFLOW_QC.out.qc_plot_data to: "$params.outdir/Tables/Quality_Control/", mode: 'copy', overwrite: 'true' - SCFLOW_QC.out.qc_plots to: "$params.outdir/Plots/Quality_Control/", mode: 'copy', overwrite: 'true' - SCFLOW_QC.out.qc_sce to: "$params.outdir/SCE/Individual/", mode: 'copy', overwrite: 'true' - SCFLOW_MERGE_QC_SUMMARIES.out.qc_summary to: "$params.outdir/Tables/Merged/", mode: 'copy', overwrite: 'true' - // Merged SCE - SCFLOW_MERGE.out.merged_report to: "$params.outdir/Reports/", mode: 'copy', overwrite: 'true' - SCFLOW_MERGE.out.merge_plots to: "$params.outdir/Plots/Merged/", mode: 'copy', overwrite: 'true' - SCFLOW_MERGE.out.merge_summary_plots to: "$params.outdir/Plots/Merged/", mode: 'copy', overwrite: 'true' - // cluster - SCFLOW_CLUSTER.out.integration_report to: "$params.outdir/Reports/", mode: 'copy', overwrite: 'true' - // ct - SCFLOW_MAP_CELLTYPES.out.celltype_mappings to: "$params.outdir/Tables/Celltype_Mappings", mode: 'copy', overwrite: 'true' - // final - SCFLOW_FINALIZE.out.final_sce to: "$params.outdir/SCE/", mode: 'copy', overwrite: 'true' - SCFLOW_FINALIZE.out.celltypes to: "$params.outdir/Tables/Celltype_Mappings", mode: 'copy', overwrite: 'true' - SCFLOW_FINALIZE.out.celltype_metrics_report to: "$params.outdir/Reports/", mode: 'copy', overwrite: 'true' - // DE - SCFLOW_DGE.out.de_table to: "$params.outdir/Tables/DGE", mode: 'copy', optional: true, overwrite: 'true' - SCFLOW_DGE.out.de_report to: "$params.outdir/Reports/", mode: 'copy', overwrite: 'true' - SCFLOW_DGE.out.de_plot to: "$params.outdir/Plots/DGE/", mode: 'copy', overwrite: 'true' - SCFLOW_DGE.out.de_plot_data to: "$params.outdir/Tables/DGE", mode: 'copy', overwrite: 'true' - // IPA - SCFLOW_IPA.out.ipa_results to: "$params.outdir/Tables/", mode: 'copy', optional: true, overwrite: 'true' - SCFLOW_IPA.out.ipa_report to: "$params.outdir/Reports/", mode: 'copy', optional: true, overwrite: 'true' - // Dirichlet - SCFLOW_DIRICHLET.out.dirichlet_report to: "$params.outdir/Reports/", mode: 'copy', overwrite: 'true' - // plots - SCFLOW_PLOT_REDDIM_GENES.out.reddim_gene_plots to: "$params.outdir/Plots/", mode: 'copy', overwrite: 'true' +======================================================================================== + THE END +======================================================================================== */ -} - -/* - * Completion e-mail notification - */ -workflow.onComplete { - - // Set up the e-mail variables - def subject = "[nf-core/scflow] Successful: $workflow.runName" - if (!workflow.success) { - subject = "[nf-core/scflow] FAILED: $workflow.runName" - } - def email_fields = [:] - email_fields['version'] = workflow.manifest.version - email_fields['runName'] = custom_runName ?: workflow.runName - email_fields['success'] = workflow.success - email_fields['dateComplete'] = workflow.complete - email_fields['duration'] = workflow.duration - email_fields['exitStatus'] = workflow.exitStatus - email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') - email_fields['errorReport'] = (workflow.errorReport ?: 'None') - email_fields['commandLine'] = workflow.commandLine - email_fields['projectDir'] = workflow.projectDir - email_fields['summary'] = summary - email_fields['summary']['Date Started'] = workflow.start - email_fields['summary']['Date Completed'] = workflow.complete - email_fields['summary']['Pipeline script file path'] = workflow.scriptFile - email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - email_fields['summary']['Nextflow Version'] = workflow.nextflow.version - email_fields['summary']['Nextflow Build'] = workflow.nextflow.build - email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - - // TODO nf-core: If not using MultiQC, strip out this code (including params.max_multiqc_email_size) - // On success try attach the multiqc report - def mqc_report = null - try { - if (workflow.success) { - mqc_report = ch_multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList) { - log.warn "[nf-core/scflow] Found multiple reports from process 'multiqc', will use only one" - mqc_report = mqc_report[0] - } - } - } catch (all) { - log.warn "[nf-core/scflow] Could not attach MultiQC report to summary email" - } - - // Check if we are only sending emails on failure - email_address = params.email - if (!params.email && params.email_on_fail && !workflow.success) { - email_address = params.email_on_fail - } - - // Render the TXT template - def engine = new groovy.text.GStringTemplateEngine() - def tf = new File("$baseDir/assets/email_template.txt") - def txt_template = engine.createTemplate(tf).make(email_fields) - def email_txt = txt_template.toString() - - // Render the HTML template - def hf = new File("$baseDir/assets/email_template.html") - def html_template = engine.createTemplate(hf).make(email_fields) - def email_html = html_template.toString() - - // Render the sendmail template - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] - def sf = new File("$baseDir/assets/sendmail_template.txt") - def sendmail_template = engine.createTemplate(sf).make(smail_fields) - def sendmail_html = sendmail_template.toString() - - // Send the HTML e-mail - if (email_address) { - try { - if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } - // Try to send HTML e-mail using sendmail - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "[nf-core/scflow] Sent summary e-mail to $email_address (sendmail)" - } catch (all) { - // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] - if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { - mail_cmd += [ '-A', mqc_report ] - } - mail_cmd.execute() << email_html - log.info "[nf-core/scflow] Sent summary e-mail to $email_address (mail)" - } - } - - // Write summary e-mail HTML to a file - def output_d = new File("${params.outdir}/pipeline_info/") - if (!output_d.exists()) { - output_d.mkdirs() - } - def output_hf = new File(output_d, "pipeline_report.html") - output_hf.withWriter { w -> w << email_html } - def output_tf = new File(output_d, "pipeline_report.txt") - output_tf.withWriter { w -> w << email_txt } - - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_red = params.monochrome_logs ? '' : "\033[0;31m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - - if (workflow.stats.ignoredCount > 0 && workflow.success) { - log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" - log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" - log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" - } - - if (workflow.success) { - log.info "-${c_purple}[nf-core/scflow]${c_green} Pipeline completed successfully${c_reset}-" - } else { - checkHostname() - log.info "-${c_purple}[nf-core/scflow]${c_red} Pipeline completed with errors${c_reset}-" - } - -} - - -def nfcoreHeader() { - // Log colors ANSI codes - c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_white = params.monochrome_logs ? '' : "\033[0;37m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - - return """ -${c_dim}--------------------------------------------------${c_reset}- - ${c_green},--.${c_black}/${c_green},-.${c_reset} - ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} - ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} - ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} - ${c_green}`._,._,\'${c_reset} - ${c_purple} nf-core/scflow v${workflow.manifest.version}${c_reset} - -${c_dim}--------------------------------------------------${c_reset}- - """.stripIndent() -} - -def checkHostname() { - def c_reset = params.monochrome_logs ? '' : "\033[0m" - def c_white = params.monochrome_logs ? '' : "\033[0;37m" - def c_red = params.monochrome_logs ? '' : "\033[1;91m" - def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if (params.hostnames) { - def hostname = "hostname".execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error "====================================================\n" + - " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + - " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - "============================================================" - } - } - } - } -} diff --git a/modules.json b/modules.json new file mode 100644 index 0000000..2462038 --- /dev/null +++ b/modules.json @@ -0,0 +1,8 @@ +{ + "name": "nf-core/scflow", + "homePage": "https://github.com/nf-core/scflow", + "repos": { + "nf-core/modules": { + } + } +} diff --git a/modules/local/functions.nf b/modules/local/functions.nf new file mode 100644 index 0000000..da9da09 --- /dev/null +++ b/modules/local/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/local/get_software_versions.nf b/modules/local/get_software_versions.nf new file mode 100644 index 0000000..7c83440 --- /dev/null +++ b/modules/local/get_software_versions.nf @@ -0,0 +1,24 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +process GET_SOFTWARE_VERSIONS { + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', meta:[:], publish_by_meta:[]) } + + tag 'Version Info' + label 'process_tiny' + //cache false + + output: + path 'software_versions.tsv' , emit: tsv + + script: // This script is bundled with the pipeline, in nf-core/scflow/bin/ + """ + echo $workflow.manifest.version > pipeline.version.txt + echo $workflow.nextflow.version > nextflow.version.txt + scrape_software_versions.r software_versions.tsv + """ +} diff --git a/modules/local/process/functions.nf b/modules/local/process/functions.nf index 54dc8fe..d25eea8 100644 --- a/modules/local/process/functions.nf +++ b/modules/local/process/functions.nf @@ -56,4 +56,4 @@ def saveFiles(Map args) { return "${getPathFromList(path_list)}/$args.filename" } } -} \ No newline at end of file +} diff --git a/modules/local/process/scflow/checkinputs.nf b/modules/local/process/scflow/checkinputs.nf new file mode 100644 index 0000000..ecdaa97 --- /dev/null +++ b/modules/local/process/scflow/checkinputs.nf @@ -0,0 +1,38 @@ +/* + * Check input manifest and samplesheet inputs are valid + */ + + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_CHECKINPUTS { + tag 'SCFLOW_CHECKINPUTS' + label 'process_tiny' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + //container "combiz/scflow-docker:0.6.1" + + input: + path manifest + path input + + output: + path 'checked_manifest.txt', emit: checked_manifest + + script: + def software = getSoftwareName(task.process) + + """ + check_inputs.r \\ + --input $input \ + --manifest $manifest + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/cluster.nf b/modules/local/process/scflow/cluster.nf new file mode 100644 index 0000000..9c2ab10 --- /dev/null +++ b/modules/local/process/scflow/cluster.nf @@ -0,0 +1,38 @@ +/* + * Single Sample QC + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_CLUSTER { + tag 'MERGED' + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + //container 'combiz/scflow-docker:0.6.1' + + input: + path sce + + output: + path 'clustered_sce/' , emit: clustered_sce, type: 'dir' + + script: + def software = getSoftwareName(task.process) + + """ + export MC_CORES=${task.cpus} + + scflow_cluster.r \ + $options.args \ + --sce_path ${sce} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/dge.nf b/modules/local/process/scflow/dge.nf new file mode 100644 index 0000000..7459e3b --- /dev/null +++ b/modules/local/process/scflow/dge.nf @@ -0,0 +1,56 @@ +/* + * Generate 2D reduced dimension plots of gene expression + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_DGE { + tag "${celltype} (${n_cells_str} cells) | ${de_method}" + label 'process_medium' + errorStrategy 'ignore' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:"${celltype}_${de_method}") } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + each de_method + each ct_tuple + path ensembl_mappings + + output: + path '*.tsv' , emit: de_table , optional: true + path '*.html' , emit: de_report , optional: true + path '*_volcano_plot.png' , emit: de_plot , optional: true + //path 'de_plot_data/' , emit: de_plot_data , optional: true + + script: + celltype = ct_tuple[0] + n_cells = ct_tuple[1].toInteger() + n_cells_str = (Math.round(n_cells * 100) / 100000).round(1).toString() + 'k' + def software = getSoftwareName(task.process) + + """ + echo "celltype: ${celltype} n_cells: ${n_cells_str}" + export MC_CORES=${task.cpus} + export MKL_NUM_THREADS=1 + export NUMEXPR_NUM_THREADS=1 + export OMP_NUM_THREADS=1 + export OPENBLAS_NUM_THREADS=1 + export VECLIB_MAXIMUM_THREADS=1 + scflow_dge.r \ + $options.args \ + --sce ${sce} \ + --celltype ${celltype} \ + --de_method ${de_method} \ + --ensembl_mappings ${ensembl_mappings} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/dirichlet.nf b/modules/local/process/scflow/dirichlet.nf new file mode 100644 index 0000000..01a7dfc --- /dev/null +++ b/modules/local/process/scflow/dirichlet.nf @@ -0,0 +1,38 @@ +/* + * Dirichlet modeling of relative cell-type abundance + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_DIRICHLET { + tag 'DIRICHLET' + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + + output: + path 'dirichlet_report', emit: dirichlet_report + + script: + def software = getSoftwareName(task.process) + + """ + export MC_CORES=${task.cpus} + + scflow_dirichlet.r \ + $options.args \ + --sce_path ${sce} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/finalize.nf b/modules/local/process/scflow/finalize.nf new file mode 100644 index 0000000..fec2380 --- /dev/null +++ b/modules/local/process/scflow/finalize.nf @@ -0,0 +1,44 @@ +/* + * Generate final SCE with optionally revised cell-types + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_FINALIZE { + tag 'MERGED' + label 'process_high' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + path celltype_mappings + + output: + path 'final_sce' , emit: final_sce, type: 'dir' + path 'celltypes.tsv' , emit: celltypes + path 'celltype_metrics_report' , emit: celltype_metrics_report, type: 'dir' + path 'celltype_marker_tables' , emit: celltype_marker_tables, type: 'dir' + path 'celltype_marker_plots' , emit: celltype_marker_plots, type: 'dir' + + script: + def software = getSoftwareName(task.process) + + """ + export MC_CORES=${task.cpus} + + scflow_finalize_sce.r \ + $options.args \ + --sce_path ${sce} \ + --celltype_mappings ${celltype_mappings} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/functions.nf b/modules/local/process/scflow/functions.nf new file mode 100644 index 0000000..d25eea8 --- /dev/null +++ b/modules/local/process/scflow/functions.nf @@ -0,0 +1,59 @@ +/* + * ----------------------------------------------------- + * Utility functions used in nf-core DSL2 module files + * ----------------------------------------------------- + */ + +/* + * Extract name of software tool from process name using $task.process + */ +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +/* + * Function to initialise default values and to generate a Groovy Map of available options for nf-core modules + */ +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.publish_by_id = args.publish_by_id ?: false + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +/* + * Tidy up and join elements of a list to return a path string + */ +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +/* + * Function to save/publish module results + */ +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_id) { + path_list.add(args.publish_id) + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/local/process/scflow/integrate.nf b/modules/local/process/scflow/integrate.nf new file mode 100644 index 0000000..fc564ff --- /dev/null +++ b/modules/local/process/scflow/integrate.nf @@ -0,0 +1,39 @@ +/* + * Integrate data for batch-effect correction + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_INTEGRATE { + tag 'MERGED' + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + + output: + path 'integrated_sce/', emit: integrated_sce + + script: + def software = getSoftwareName(task.process) + + + """ + export MC_CORES=${task.cpus} + + scflow_integrate.r \ + $options.args \ + --sce_path ${sce} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/ipa.nf b/modules/local/process/scflow/ipa.nf new file mode 100644 index 0000000..79c1169 --- /dev/null +++ b/modules/local/process/scflow/ipa.nf @@ -0,0 +1,42 @@ +/* + * Integrated pathway analysis of differentially expressed genes + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_IPA { + tag "${de_table_basename}" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path de_table + + output: + path '*_ipa' , emit: ipa_results , optional: true, type: 'dir' + path '*.html' , emit: ipa_report , optional: true + + script: + de_table_basename = "${de_table.baseName}" + def software = getSoftwareName(task.process) + + """ + export MC_CORES=${task.cpus} + + scflow_ipa.r \ + $options.args \ + --gene_file ${de_table.join(',')} + + mv ipa ${de_table_basename}_ipa + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/mapcelltypes.nf b/modules/local/process/scflow/mapcelltypes.nf new file mode 100644 index 0000000..fafeac1 --- /dev/null +++ b/modules/local/process/scflow/mapcelltypes.nf @@ -0,0 +1,45 @@ +/* + * Annotate cluster celltypes + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_MAPCELLTYPES { + tag 'MERGED' + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + path ctd_path + + output: + path 'celltype_mapped_sce/' , emit: celltype_mapped_sce, type: 'dir' + path 'celltype_mappings.tsv', emit: celltype_mappings + + script: + def software = getSoftwareName(task.process) + + + """ + export MC_CORES=${task.cpus} + + mkdir ctd_folder && unzip ${ctd_path} -d ./ctd_folder + + + scflow_map_celltypes.r \ + $options.args \ + --sce_path ${sce} \ + --ctd_folder ctd_folder + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/merge.nf b/modules/local/process/scflow/merge.nf new file mode 100644 index 0000000..8c3744e --- /dev/null +++ b/modules/local/process/scflow/merge.nf @@ -0,0 +1,42 @@ +/* + * Merge quality-control passed SCEs + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_MERGE { + tag 'MERGED' + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path qc_passed_sces + path ensembl_mappings + + output: + path 'merged_sce/' , emit: merged_sce , type: 'dir' + path 'merge_plots' , emit: merge_plots , type: 'dir' + path 'merge_summary_plots' , emit: merge_summary_plots , type: 'dir' + path 'merged_report' , emit: merged_report , type: 'dir' + + script: + def software = getSoftwareName(task.process) + + """ + + scflow_merge.r \ + $options.args \ + --sce_paths ${qc_passed_sces.join(',')} \ + --ensembl_mappings ${ensembl_mappings} \ + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/mergeqctables.nf b/modules/local/process/scflow/mergeqctables.nf new file mode 100644 index 0000000..166b5ab --- /dev/null +++ b/modules/local/process/scflow/mergeqctables.nf @@ -0,0 +1,37 @@ +/* + * Merge individual quality-control tsv summaries into combined tsv file + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_MERGEQCTABLES { + tag 'MERGEQCTABLES' + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path qcs_tsv + + output: + path '*.tsv', emit: qc_summary + + script: + def software = getSoftwareName(task.process) + + + """ + merge_tables.r \ + $options.args \ + --filepaths ${qcs_tsv.join(',')} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/plotreddimgenes.nf b/modules/local/process/scflow/plotreddimgenes.nf new file mode 100644 index 0000000..d027f56 --- /dev/null +++ b/modules/local/process/scflow/plotreddimgenes.nf @@ -0,0 +1,40 @@ +/* + * Generate 2D reduced dimension plots of gene expression + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_PLOTREDDIMGENES { + tag 'MERGED' + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + path reddim_genes_yml + + output: + path 'reddim_gene_plots/', emit: reddim_gene_plots + + script: + def software = getSoftwareName(task.process) + + """ + export MC_CORES=${task.cpus} + + scflow_plot_reddim_genes.r \ + $options.args \ + --sce ${sce} \ + --reddim_genes_yml ${reddim_genes_yml} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/qc.nf b/modules/local/process/scflow/qc.nf new file mode 100644 index 0000000..f54a53c --- /dev/null +++ b/modules/local/process/scflow/qc.nf @@ -0,0 +1,63 @@ +/* + * Single Sample QC + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_QC { + tag "${key}" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:"${key}") } + +// container 'combiz/scflow-docker:0.6.1' + + input: + tuple val(key), path(mat_path) + path input + path ensembl_mappings + + output: + path '*.html' , emit: qc_report + path 'qc_plot_data' , emit: qc_plot_data, type: 'dir' + path 'qc_summary/*.tsv' , emit: qc_summary + path 'qc_plots' , emit: qc_plots, type: 'dir' + path 'sce/*_sce' , emit: qc_sce, type: 'dir' + + script: + def software = getSoftwareName(task.process) + + + """ + export MC_CORES=${task.cpus} + + if [[ -d ${mat_path} ]]; then + echo "${mat_path} is a directory" + MATPATH=${mat_path} + elif [[ -f ${mat_path} ]]; then + echo "${mat_path} is a file" + mkdir mat_folder && unzip ${mat_path} -d ./mat_folder + MATPATH=mat_folder + else + echo "${mat_path} is not valid" + MATPATH=${mat_path} + exit 1 + fi + + scflow_qc.r \ + $options.args \ + --input ${input} \ + --mat_path \${MATPATH} \ + --key ${key} \ + --ensembl_mappings ${ensembl_mappings} + + mkdir sce; mv ${key}_sce sce/ + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/reducedims.nf b/modules/local/process/scflow/reducedims.nf new file mode 100644 index 0000000..9b421b1 --- /dev/null +++ b/modules/local/process/scflow/reducedims.nf @@ -0,0 +1,43 @@ +/* + * Perform dimensionality reduction + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_REDUCEDIMS { + tag 'MERGED' + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path sce + + output: + path 'reddim_sce/', emit: reddim_sce + + script: + def software = getSoftwareName(task.process) + + """ + export MC_CORES=${task.cpus} + export MKL_NUM_THREADS=1 + export NUMEXPR_NUM_THREADS=1 + export OMP_NUM_THREADS=1 + export OPENBLAS_NUM_THREADS=1 + export VECLIB_MAXIMUM_THREADS=1 + + scflow_reduce_dims.r \ + $options.args \ + --sce_path ${sce} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/process/scflow/reportintegrated.nf b/modules/local/process/scflow/reportintegrated.nf new file mode 100644 index 0000000..c284480 --- /dev/null +++ b/modules/local/process/scflow/reportintegrated.nf @@ -0,0 +1,39 @@ +/* + * Generate integration HTML report + */ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process SCFLOW_REPORTINTEGRATED { + tag 'MERGED' + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + // container 'combiz/scflow-docker:0.6.1' + + input: + path( sce ) + + output: + path 'integration_report', emit: integration_report, type: 'dir' + + script: + def software = getSoftwareName(task.process) + + + """ + export MC_CORES=${task.cpus} + + scflow_report_integrated.r \ + $options.args \ + --sce_path ${sce} + + scflow_version=\$(Rscript -e 'cat(as.character(utils::packageVersion("scFlow")))'); echo "scFlow \${scflow_version}" > "scFlow_\${scflow_version}.version.txt" + """ +} diff --git a/modules/local/samplesheet_check.nf b/modules/local/samplesheet_check.nf new file mode 100644 index 0000000..d6adc20 --- /dev/null +++ b/modules/local/samplesheet_check.nf @@ -0,0 +1,31 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +process SAMPLESHEET_CHECK { + tag "$samplesheet" + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/python:3.8.3" + } else { + container "quay.io/biocontainers/python:3.8.3" + } + + input: + path samplesheet + + output: + path '*.csv' + + script: // This script is bundled with the pipeline, in nf-core/scflow/bin/ + """ + check_samplesheet.py \\ + $samplesheet \\ + samplesheet.valid.csv + """ +} diff --git a/nextflow.config b/nextflow.config index b8223b0..834c279 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,155 +1,195 @@ /* - * ------------------------------------------------- - * nf-core/scflow Nextflow config file - * ------------------------------------------------- - * Default config options for all environments. - */ - -// Global default params, used in configs -params { - - // nf-core-scflow Workflow flags - samplesheet = "./refs/SampleSheet.tsv" - manifest = "./refs/Manifest.txt" - ensembl_mappings = "./src/ensembl-ids/ensembl_mappings.tsv" - ctd_folder = "./refs/ctd" - celltype_mappings = "./conf/celltype_mappings.tsv" - reddim_genes_yml = "./conf/reddim_genes.yml" - workDir = "/rds/general/user/$USER/ephemeral/tmp" - - outdir = './results' - publish_dir_mode = 'copy' - - // Boilerplate options - name = false - email = false - email_on_fail = false - plaintext_email = false - monochrome_logs = false - help = false - tracedir = "${params.outdir}/pipeline_info" - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - hostnames = false - config_profile_description = false - config_profile_contact = false - config_profile_url = false - - // Defaults only, expecting to be overwritten - max_memory = 128.GB - max_cpus = 16 - max_time = 240.h +======================================================================================== + nf-core/scflow Nextflow config file +======================================================================================== + Default config options for all compute environments +---------------------------------------------------------------------------------------- +*/ +manifest { + name = 'nf-core/scflow' + author = 'Dr Combiz Khozoie' + homePage = 'https://github.com/nf-core/scflow' + description = 'Complete analysis workflow for single-cell/nuclei RNA-sequencing data.' + mainScript = 'main.nf' + nextflowVersion = '>=21.04.2' + version = '0.7.0dev' } // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -//process.container = 'nfcore/scflow:dev' -process.container = 'combiz/scflow-docker:0.6.0' +process.container = 'almurphy/scfdev:dev' + +//workDir = "/rds/general/user/$USER/ephemeral/tmp" +workDir = './work' + +// Global default params, used in configs +params { + // nf-core-scflow Workflow flags + manifest = './refs/Manifest.txt' + input = './refs/SampleSheet.tsv' + ensembl_mappings = './src/ensembl-ids/ensembl_mappings.tsv' + ctd_path = 'https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/28033407/ctd_v1.zip' + celltype_mappings = './conf/celltype_mappings.tsv' + reddim_genes_yml = './conf/reddim_genes.yml' + + // Boilerplate options + outdir = './results' + tracedir = "${params.outdir}/pipeline_info" + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + help = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'input,modules' + enable_conda = false + singularity_pull_docker_container = false + + // Config options + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + hostnames = [:] + config_profile_description = null + config_profile_contact = null + config_profile_url = null + config_profile_name = null + validate_params = true + show_hidden_params = false + schema_ignore_params = 'modules,genome,input' + + // Defaults only, expecting to be overwritten + max_memory = 128.GB + max_cpus = 16 + max_time = 240.h +} // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// SCFLOW DEFAULT ANALYSIS PARAMETERS -includeConfig 'conf/scflow_analysis.config' - // Load modules.config for DSL2 module specific options includeConfig 'conf/modules.config' +// SCFLOW DEFAULT ANALYSIS PARAMETERS +includeConfig 'conf/scflow_analysis.config' + // Load nf-core custom profiles from different Institutions try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" + includeConfig "${params.custom_config_base}/nfcore_custom.config" } catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } profiles { - conda { process.conda = "$baseDir/environment.yml" } - debug { process.beforeScript = 'echo $HOSTNAME' } - docker { - docker.enabled = true - // Avoid this error: - // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. - // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 - // once this is established and works well, nextflow might implement this behavior as new default. - docker.runOptions = '-u \$(id -u):\$(id -g)' - } - singularity { - singularity.enabled = true - singularity.autoMounts = true - } - podman { - podman.enabled = true - } - test { includeConfig 'conf/test.config' } + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + params.enable_conda = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } // Export these variables to prevent local Python/R libraries from conflicting with those in the container env { - PYTHONNOUSERSITE = 1 - R_PROFILE_USER = "/.Rprofile" - R_ENVIRON_USER = "/.Renviron" + PYTHONNOUSERSITE = 1 + R_PROFILE_USER = '/.Rprofile' + R_ENVIRON_USER = '/.Renviron' } // Capture exit codes from upstream processes when piping process.shell = ['/bin/bash', '-euo', 'pipefail'] +def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { - enabled = true - file = "${params.tracedir}/execution_timeline.html" + enabled = true + file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" } report { - enabled = true - file = "${params.tracedir}/execution_report.html" + enabled = true + file = "${params.tracedir}/execution_report_${trace_timestamp}.html" } trace { - enabled = true - file = "${params.tracedir}/execution_trace.txt" + enabled = true + file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" } dag { - enabled = true - file = "${params.tracedir}/pipeline_dag.svg" -} - -manifest { - name = 'nf-core/scflow' - author = 'Dr Combiz Khozoie' - homePage = 'https://github.com/nf-core/scflow' - description = 'Complete analysis workflow for single-cell/nuclei RNA-sequencing data.' - mainScript = 'main.nf' - nextflowVersion = '>=20.01.0' - version = '0.6.0dev' + enabled = true + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" } // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } } - } } diff --git a/nextflow_schema.json b/nextflow_schema.json index 6c9c851..f209156 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -11,38 +11,63 @@ "fa_icon": "fas fa-terminal", "description": "Define where the pipeline should find input data and save output data.", "properties": { - "samplesheet": { + "manifest": { + "type": "string", + "default": "./refs/Manifest.txt", + "fa_icon": "fas fa-table", + "description": "The .tsv file specifying sample matrix filepaths." + }, + "input": { "type": "string", "default": "./refs/SampleSheet.tsv", - "fa_icon": "fas fa-table" + "fa_icon": "fas fa-table", + "description": "The .tsv file specifying sample metadata." }, "ensembl_mappings": { "type": "string", - "default": "./src/ensembl-ids/ensembl_mappings.tsv", - "fa_icon": "fas fa-table" + "default": "https://raw.githubusercontent.com/nf-core/test-datasets/scflow/assets/ensembl_mappings.tsv", + "fa_icon": "fas fa-table", + "description": "Optional tsv file containing mappings between ensembl_gene_id's and gene_names's" }, - "celltype_mappings": { + "ctd_path": { "type": "string", - "default": "./conf/celltype_mappings.tsv", - "fa_icon": "fas fa-table" + "default": "https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/28033407/ctd_v1.zip", + "description": "Cell-type annotations reference file path", + "help_text": "This is a zip file containing cell-type annotation reference files for the EWCE package.", + "fa_icon": "fas fa-file-archive" }, - "ctd_folder": { + "celltype_mappings": { "type": "string", - "default": "./refs/ctd", - "fa_icon": "fas fa-folder" + "default": "./conf/celltype_mappings.tsv", + "fa_icon": "fas fa-table", + "description": "Optional tsv file specifying manual revisions of cell-type annotations." }, "reddim_genes_yml": { "type": "string", "default": "./conf/reddim_genes.yml", - "fa_icon": "fas fa-list" + "fa_icon": "fas fa-list", + "description": "Optional list of genes of interest in YML format for plotting of gene expression." + }, + "species": { + "type": "string", + "default": "human", + "fa_icon": "fas fa-male", + "description": "Input sample species.", + "help_text": "Currently, \"human\" and \"mouse\" are supported." + }, + "outdir": { + "type": "string", + "default": "./results", + "description": "Outputs directory.", + "fa_icon": "fas fa-folder-open" } }, "required": [ - "samplesheet", + "manifest", + "input", "ensembl_mappings", - "celltype_mappings", - "ctd_folder", - "reddim_genes_yml" + "ctd_path", + "species" ], "help_text": "" }, @@ -63,7 +88,7 @@ "default": "seqdate", "description": "The sample sheet variables to treat as factors.", "help_text": "All sample sheet columns with numbers which should be treated as factors should be specified here separated by commas. Examples include columns with dates, numeric sample identifiers, etc.", - "fa_icon": "fas fa-quote-left" + "fa_icon": "fas fa-layer-group" }, "qc_min_library_size": { "type": "integer", @@ -92,18 +117,22 @@ "qc_min_ribo": { "type": "number", "description": "Minimum proportion of counts mapping to ribosomal genes.", - "fa_icon": "fas fa-greater-than-equal" - }, - "qc_max_mito": { - "type": "string", - "default": "adaptive", - "description": "Maximum proportion of counts mapping to mitochondrial genes.", - "fa_icon": "fas fa-less-than-equal" + "fa_icon": "fas fa-greater-than-equal", + "minimum": 0, + "maximum": 1 }, "qc_max_ribo": { "type": "number", "default": 1, "description": "Maximum proportion of counts mapping to ribosomal genes.", + "fa_icon": "fas fa-less-than-equal", + "minimum": 0, + "maximum": 1 + }, + "qc_max_mito": { + "type": "string", + "default": "adaptive", + "description": "Maximum proportion of counts mapping to mitochondrial genes.", "fa_icon": "fas fa-less-than-equal" }, "qc_min_counts": { @@ -121,21 +150,22 @@ "fa_icon": "fas fa-greater-than-equal" }, "qc_drop_unmapped": { - "type": "boolean", - "default": true, + "type": "string", + "default": "True", "description": "Option to drop unmapped genes.", "fa_icon": "fas fa-cut" }, "qc_drop_mito": { - "type": "boolean", - "default": true, + "type": "string", + "default": "True", "description": "Option to drop mitochondrial genes.", "fa_icon": "fas fa-cut" }, "qc_drop_ribo": { - "type": "boolean", + "type": "string", "description": "Option to drop ribosomal genes.", - "fa_icon": "fas fa-cut" + "fa_icon": "fas fa-cut", + "default": "false" }, "qc_nmads": { "type": "number", @@ -153,9 +183,8 @@ "qc_max_library_size", "qc_min_features", "qc_max_features", - "qc_min_ribo", - "qc_max_mito", "qc_max_ribo", + "qc_max_mito", "qc_min_counts", "qc_min_cells", "qc_drop_unmapped", @@ -172,26 +201,55 @@ "properties": { "amb_find_cells": { "type": "string", - "default": "true" + "default": "true", + "description": "Enable ambient RNA / empty droplet profiling.", + "fa_icon": "fas fa-cut" }, "amb_retain": { - "type": "integer", - "default": 12000 + "type": "string", + "default": "auto", + "help_text": "A numeric scalar specifying the threshold for the total UMI count above which all barcodes are assumed to contain cells, or \"auto\" for automated estimation based on the data.", + "description": "Upper UMI counts threshold for true cell annotation.", + "pattern": "^(\\d+|auto)$", + "fa_icon": "fas fa-less-than-equal" }, "amb_lower": { "type": "integer", - "default": 100 + "default": 100, + "help_text": "A numeric scalar specifying the lower bound on the total UMI count, at or below which all barcodes are assumed to correspond to empty droplets.", + "description": "Lower UMI counts threshold for empty droplet annotation.", + "fa_icon": "fas fa-greater-than-equal" }, "amb_alpha_cutoff": { "type": "number", - "default": 0.001 + "default": 0.001, + "description": "The maximum FDR for the emptyDrops algorithm.", + "fa_icon": "fas fa-less-than-equal" }, "amb_niters": { "type": "integer", - "default": 10000 + "default": 10000, + "help_text": "An integer scalar specifying the number of iterations to use for the Monte Carlo p-value calculations for the emptyDrops algorithm.", + "description": "Number of Monte Carlo p-value iterations.", + "fa_icon": "fas fa-recycle" + }, + "amb_expect_cells": { + "type": "integer", + "default": 3000, + "description": "Expected number of cells per sample.", + "help_text": "If the \"retain\" parameter is set to \"auto\" (recommended), then this parameter is used to identify the optimal value for \"retain\" for the emptyDrops algorithm.", + "fa_icon": "fas fa-greater-than-equal" } }, - "fa_icon": "far fa-chart-bar" + "fa_icon": "far fa-chart-bar", + "required": [ + "amb_find_cells", + "amb_retain", + "amb_lower", + "amb_alpha_cutoff", + "amb_niters", + "amb_expect_cells" + ] }, "multiplet_identification": { "title": "Multiplet Identification", @@ -201,37 +259,67 @@ "properties": { "mult_find_singlets": { "type": "string", - "default": "true" + "default": "true", + "description": "Enable doublet/multiplet identification.", + "fa_icon": "fas fa-cut" }, "mult_singlets_method": { "type": "string", - "default": "doubletfinder" + "default": "doubletfinder", + "description": "Algorithm to use for doublet/multiplet identification.", + "fa_icon": "fas fa-toolbox" }, "mult_vars_to_regress_out": { "type": "string", - "default": "nCount_RNA,pc_mito" + "default": "nCount_RNA,pc_mito", + "description": "Variables to regress out for dimensionality reduction.", + "fa_icon": "fas fa-layer-group" }, "mult_pca_dims": { "type": "integer", - "default": 10 + "default": 10, + "description": "Number of PCA dimensions to use.", + "fa_icon": "fas fa-calculator" }, "mult_var_features": { "type": "integer", - "default": 2000 + "default": 2000, + "description": "The top n most variable features to use.", + "fa_icon": "fas fa-calculator" }, "mult_doublet_rate": { - "type": "integer" + "type": "number", + "description": "A fixed doublet rate.", + "help_text": "Use a fixed default rate (e.g. 0.075 to specify that 7.5% of all cells should be marked as doublets), or set to 0 to use the \"dpk\" method (recommended).", + "fa_icon": "fas fa-calculator" }, "mult_dpk": { "type": "integer", - "default": 8 + "default": 8, + "description": "Doublets per thousand cells increment.", + "help_text": "The doublets per thousand cell increment specifies the expected doublet rate based on the number of cells, i.e. with a dpk of 8 (recommended by 10X), a dataset with 1000 cells is expected to contain 8 doublets per thousand cells, a dataset with 2000 cells is expected to contain 16 doublets per thousand cells, and a dataset with 10000 cells is expected to contain 80 cells per thousand cells (or 800 doublets in total). If the \"doublet_rate\" parameter is manually specified this recommended incremental behaviour is overridden.", + "minimum": 0, + "maximum": 1000, + "fa_icon": "fas fa-calculator" }, "mult_pK": { "type": "number", - "default": 0.02 + "default": 0.02, + "description": "Specify a pK value instead of parameter sweep.", + "help_text": "The optimal pK value used by the doubletFinder algorithm is determined following a compute-intensive parameter sweep. The parameter sweep can be overridden by manually specifying a pK value.", + "fa_icon": "fas fa-calculator" } }, - "fa_icon": "fas fa-adjust" + "fa_icon": "fas fa-adjust", + "required": [ + "mult_find_singlets", + "mult_singlets_method", + "mult_vars_to_regress_out", + "mult_pca_dims", + "mult_var_features", + "mult_dpk", + "mult_pK" + ] }, "merge": { "title": "Merge", @@ -241,18 +329,32 @@ "properties": { "merge_plot_vars": { "type": "string", - "default": "total_features_by_counts,total_counts,pc_mito,pc_ribo" + "default": "total_features_by_counts,total_counts,pc_mito,pc_ribo", + "description": "Numeric variables for inter-sample metrics.", + "help_text": "A comma-separated list of numeric variables which differ between individual cells of each sample. The merged sample report will include plots facilitating between-sample comparisons for each of these numeric variables.", + "fa_icon": "fas fa-layer-group" }, "merge_facet_vars": { "type": "string", - "default": "NULL" + "default": "NULL", + "description": "Categorical variables for further sub-setting of plots", + "help_text": "A comma-separated list of categorical variables. The merged sample report will include additional plots of sample metrics subset by each of these variables (e.g. sex, diagnosis).", + "fa_icon": "fas fa-layer-group" }, "merge_outlier_vars": { "type": "string", - "default": "total_features_by_counts,total_counts" + "default": "total_features_by_counts,total_counts", + "description": "Numeric variables for outlier identification.", + "help_text": "The merged report will include tables highlighting samples that are putative outliers for each of these numeric variables.", + "fa_icon": "fas fa-layer-group" } }, - "fa_icon": "fas fa-object-ungroup" + "fa_icon": "fas fa-object-ungroup", + "required": [ + "merge_plot_vars", + "merge_facet_vars", + "merge_outlier_vars" + ] }, "integration": { "title": "Integration", @@ -262,117 +364,235 @@ "properties": { "integ_method": { "type": "string", - "default": "Liger" + "default": "Liger", + "description": "Choice of integration method.", + "fa_icon": "fas fa-toolbox" }, "integ_unique_id_var": { "type": "string", - "default": "manifest" + "default": "manifest", + "description": "Unique sample identifier variable.", + "fa_icon": "fas fa-key" }, "integ_take_gene_union": { - "type": "string" + "type": "string", + "default": "false", + "description": "Fill out matrices with union of genes.", + "help_text": "See rliger::createLiger(). Whether to fill out raw.data matrices with union of genes across all datasets (filling in 0 for missing data) (requires make.sparse = TRUE) (default FALSE).", + "fa_icon": "fas fa-cut" }, "integ_remove_missing": { "type": "string", - "default": "true" + "default": "true", + "description": "Remove non-expressing cells/genes.", + "help_text": "See rliger::createLiger(). Whether to remove cells not expressing any measured genes, and genes not expressed in any cells (if take.gene.union = TRUE, removes only genes not expressed in any dataset) (default TRUE).", + "fa_icon": "fas fa-cut" }, "integ_num_genes": { "type": "integer", - "default": 3000 + "default": 3000, + "description": "Number of genes to find for each dataset.", + "help_text": "See rliger::selectGenes(). Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes.", + "fa_icon": "fas fa-calculator" }, "integ_combine": { "type": "string", - "default": "union" + "default": "union", + "description": "How to combine variable genes across experiments.", + "help_text": "See rliger::selectGenes(). Either \"union\" or \"intersection\".", + "fa_icon": "fas fa-calculator" }, "integ_keep_unique": { - "type": "string" + "type": "string", + "default": "false", + "description": "Keep unique genes.", + "help_text": "See rliger::selectGenes().", + "fa_icon": "fas fa-cut" }, "integ_capitalize": { - "type": "string" + "type": "string", + "default": "false", + "description": "Capitalize gene names to match homologous genes.", + "help_text": "See rliger::selectGenes().", + "fa_icon": "fab fa-adn" }, "integ_use_cols": { "type": "string", - "default": "true" + "default": "true", + "description": "Treat each column as a cell.", + "help_text": "See rliger::removeMissingObs().", + "fa_icon": "fas fa-columns" }, "integ_k": { "type": "integer", - "default": 30 + "default": 30, + "description": "Inner dimension of factorization (n factors).", + "help_text": "See rliger::optimizeALS(). Inner dimension of factorization (number of factors). Run suggestK to determine appropriate value; a general rule of thumb is that a higher k will be needed for datasets with more sub-structure.", + "fa_icon": "fas fa-calculator" }, "integ_lambda": { - "type": "integer", - "default": 5 + "type": "number", + "default": 5, + "description": "Regularization parameter.", + "help_text": "See rliger::optimizeALS(). Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). Run suggestLambda to determine most appropriate value for balancing dataset alignment and agreement (default 5.0).", + "fa_icon": "fas fa-calculator" }, "integ_thresh": { "type": "number", - "default": 0.0001 + "default": 0.0001, + "description": "Convergence threshold.", + "help_text": "See rliger::optimizeALS().", + "fa_icon": "fas fa-calculator" }, "integ_max_iters": { "type": "integer", - "default": 100 + "default": 100, + "description": "Maximum number of block coordinate descent iterations.", + "help_text": "See rliger::optimizeALS().", + "fa_icon": "fas fa-less-than-equal" }, "integ_nrep": { "type": "integer", - "default": 1 + "default": 1, + "description": "Number of restarts to perform.", + "help_text": "See rliger::optimizeALS().", + "fa_icon": "fas fa-calculator" }, "integ_rand_seed": { "type": "integer", - "default": 1 + "default": 1, + "description": "Random seed for reproducible results.", + "fa_icon": "fas fa-calculator" }, "integ_knn_k": { "type": "integer", - "default": 20 + "default": 20, + "description": "Number of neearest neighbours for within-dataset knn graph.", + "help_text": "See rliger::quantile_norm().", + "fa_icon": "fas fa-calculator" }, "integ_k2": { "type": "integer", - "default": 500 + "default": 500, + "description": "Horizon parameter for shared nearest factor graph.", + "help_text": "See rliger::quantileAlignSNF(). Distances to all but the k2 nearest neighbors are set to 0 (cuts down on memory usage for very large graphs).", + "fa_icon": "fas fa-calculator" }, "integ_prune_thresh": { "type": "number", - "default": 0.2 + "default": 0.2, + "description": "Minimum allowed edge weight.", + "help_text": "See rliger::quantileAlignSNF().", + "fa_icon": "fas fa-greater-than-equal" }, "integ_ref_dataset": { "type": "string", - "default": "NULL" + "default": "NULL", + "description": "Name of dataset to use as a reference.", + "help_text": "See rliger::quantile_norm(). Name of dataset to use as a \"reference\" for normalization. By default, the dataset with the largest number of cells is used.", + "fa_icon": "fas fa-quote-left" }, "integ_min_cells": { "type": "integer", - "default": 2 + "default": 2, + "description": "Minimum number of cells to consider a cluster shared across datasets.", + "help_text": "See rliger::quantile_norm().", + "fa_icon": "fas fa-greater-than-equal" }, "integ_quantiles": { "type": "integer", - "default": 50 + "default": 50, + "description": "Number of quantiles to use for normalization.", + "help_text": "See rliger::quantile_norm().", + "fa_icon": "fas fa-calculator" }, "integ_nstart": { "type": "integer", - "default": 10 + "default": 10, + "description": "Number of times to perform Louvain community detection.", + "help_text": "See rliger::quantileAlignSNF(). Number of times to perform Louvain community detection with different random starts (default 10).", + "fa_icon": "fas fa-recycle" }, "integ_resolution": { "type": "integer", - "default": 1 + "default": 1, + "description": "Controls the number of communities detected.", + "help_text": "See rliger::quantileAlignSNF().", + "fa_icon": "fas fa-calculator" }, "integ_dims_use": { "type": "string", - "default": "NULL" + "default": "NULL", + "description": "Indices of factors to use for shared nearest factor determination.", + "help_text": "See rliger::quantile_norm().", + "fa_icon": "fas fa-calculator" }, "integ_dist_use": { "type": "string", - "default": "CR" + "default": "CR", + "description": "Distance metric to use in calculating nearest neighbour.", + "help_text": "See rliger::quantileAlignSNF(). Default \"CR\".", + "fa_icon": "fas fa-digital-tachograph" }, "integ_center": { - "type": "string" + "type": "string", + "default": "false", + "description": "Center the data when scaling factors.", + "help_text": "See rliger::quantile_norm().", + "fa_icon": "fas fa-compress-arrows-alt" }, "integ_small_clust_thresh": { - "type": "integer" + "type": "integer", + "help_text": "See rliger::quantileAlignSNF(). Extracts small clusters loading highly on single factor with fewer cells than this before regular alignment (default 0 \u2013 no small cluster extraction).", + "description": "Small cluster extraction cells threshold.", + "fa_icon": "fas fa-calculator" }, "integ_categorical_covariates": { "type": "string", - "default": "individual,diagnosis,region,sex" + "default": "individual,diagnosis,region,sex", + "description": "Categorical variables for integration report metrics.", + "help_text": "The integration report will provide plots and integration metrics for these categorical variables.", + "fa_icon": "fas fa-layer-group" }, "integ_input_reduced_dim": { "type": "string", - "default": "UMAP" + "default": "UMAP", + "description": "Reduced dimension embedding for the integration report.", + "help_text": "The integration report will provide with and without integration plots using this embedding.", + "fa_icon": "fas fa-chess-board" } }, - "fa_icon": "far fa-object-group" + "fa_icon": "far fa-object-group", + "required": [ + "integ_method", + "integ_unique_id_var", + "integ_take_gene_union", + "integ_remove_missing", + "integ_num_genes", + "integ_combine", + "integ_keep_unique", + "integ_capitalize", + "integ_use_cols", + "integ_k", + "integ_lambda", + "integ_thresh", + "integ_max_iters", + "integ_nrep", + "integ_rand_seed", + "integ_knn_k", + "integ_k2", + "integ_prune_thresh", + "integ_ref_dataset", + "integ_min_cells", + "integ_quantiles", + "integ_nstart", + "integ_resolution", + "integ_dims_use", + "integ_dist_use", + "integ_center", + "integ_categorical_covariates", + "integ_input_reduced_dim" + ] }, "dimensionality_reduction": { "title": "Dimensionality Reduction", @@ -382,128 +602,274 @@ "properties": { "reddim_input_reduced_dim": { "type": "string", - "default": "PCA,Liger" + "default": "PCA,Liger", + "description": "Input matrix for dimension reduction.", + "fa_icon": "fas fa-chess-board" }, "reddim_reduction_methods": { "type": "string", - "default": "tSNE,UMAP,UMAP3D" + "default": "tSNE,UMAP,UMAP3D", + "description": "Dimension reduction outputs to generate.", + "help_text": "Typically 'UMAP,UMAP3D' or 'tSNE'.", + "fa_icon": "fas fa-toolbox" }, "reddim_vars_to_regress_out": { "type": "string", - "default": "nCount_RNA,pc_mito" + "default": "nCount_RNA,pc_mito", + "description": "Variables to regress out before dimension reduction.", + "fa_icon": "fas fa-layer-group" }, "reddim_umap_pca_dims": { "type": "integer", - "default": 30 + "default": 30, + "description": "Number of PCA dimensions.", + "help_text": "See uwot::umap().", + "fa_icon": "fas fa-calculator" }, "reddim_umap_n_neighbors": { "type": "integer", - "default": 35 + "default": 35, + "description": "Number of nearest neighbours to use.", + "help_text": "See uwot::umap().", + "fa_icon": "fas fa-calculator" }, "reddim_umap_n_components": { "type": "integer", - "default": 2 + "default": 2, + "description": "The dimension of the space to embed into.", + "help_text": "See uwot::umap(). The dimension of the space to embed into. This defaults to 2 to provide easy visualization, but can reasonably be set to any integer value in the range 2 to 100.", + "fa_icon": "fas fa-calculator" }, "reddim_umap_init": { "type": "string", - "default": "spectral" + "default": "spectral", + "description": "Type of initialization for the coordinates.", + "help_text": "See uwot::umap().", + "enum": [ + "spectral", + "normlaplacian", + "random", + "lvrandom", + "laplacian", + "pca", + "spca", + "agspectral" + ], + "fa_icon": "fas fa-calculator" }, "reddim_umap_metric": { "type": "string", - "default": "euclidean" + "default": "euclidean", + "description": "Distance metric for finding nearest neighbours.", + "help_text": "See uwot::umap().", + "enum": [ + "euclidean", + "cosine", + "manhattan", + "hamming", + "correlation", + "categorical" + ], + "fa_icon": "fas fa-digital-tachograph" }, "reddim_umap_n_epochs": { "type": "integer", - "default": 200 + "default": 200, + "description": "Number of epochs to us during optimization of embedded coordinates.", + "help_text": "See uwot::umap().", + "fa_icon": "fas fa-calculator" }, "reddim_umap_learning_rate": { "type": "integer", - "default": 1 + "default": 1, + "description": "Initial learning rate used in optimization of coordinates.", + "help_text": "See uwot::umap().", + "fa_icon": "fas fa-calculator" }, "reddim_umap_min_dist": { "type": "number", - "default": 0.4 + "default": 0.4, + "description": "Effective minimum distance between embedded points.", + "help_text": "See uwot::umap(). Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.", + "fa_icon": "fas fa-greater-than-equal" }, "reddim_umap_spread": { "type": "number", - "default": 0.85 + "default": 0.85, + "description": "Effective scale of embedded points.", + "help_text": "See uwot::umap(). In combination with min_dist, this determines how clustered/clumped the embedded points are.", + "fa_icon": "fas fa-arrows-alt-h" }, "reddim_umap_set_op_mix_ratio": { - "type": "integer", - "default": 1 + "type": "number", + "default": 1, + "description": "Interpolation to combine local fuzzy sets.", + "help_text": "See uwot::umap(). The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.", + "minimum": 0, + "maximum": 1, + "fa_icon": "fas fa-adjust" }, "reddim_umap_local_connectivity": { "type": "integer", - "default": 1 + "default": 1, + "description": "Local connectivity required.", + "help_text": "See uwot::umap(). The local connectivity required \u2013 i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally.", + "fa_icon": "fas fa-calculator" }, "reddim_umap_repulsion_strength": { "type": "integer", - "default": 1 + "default": 1, + "description": "Weighting applied to negative samples in embedding optimization.", + "help_text": "See uwot::umap(). Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.", + "fa_icon": "fas fa-calculator" }, "reddim_umap_negative_sample_rate": { "type": "integer", - "default": 5 + "default": 5, + "description": "Number of negative edge samples to use per positive edge sample.", + "help_text": "See uwot::umap(). The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.", + "fa_icon": "fas fa-calculator" }, "reddim_umap_fast_sgd": { - "type": "string" + "type": "string", + "default": "false", + "description": "Use fast SGD.", + "help_text": "See uwot::umap(). Setting this to TRUE will speed up the stochastic optimization phase, but give a potentially less accurate embedding, and which will not be exactly reproducible even with a fixed seed. For visualization, fast_sgd = TRUE will give perfectly good results. For more generic dimensionality reduction, it's safer to leave fast_sgd = FALSE.", + "fa_icon": "fas fa-skiing" }, "reddim_tsne_dims": { "type": "integer", - "default": 2 + "default": 2, + "description": "Output dimensionality.", + "help_text": "See Rtsne::Rtsne().", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_initial_dims": { "type": "integer", - "default": 50 + "default": 50, + "description": "Number of dimensions retained in the initial PCA step.", + "help_text": "See Rtsne::Rtsne().", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_perplexity": { "type": "integer", - "default": 150 + "default": 150, + "description": "Perplexity parameter.", + "help_text": "See Rtsne::Rtsne().", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_theta": { "type": "number", - "default": 0.5 + "default": 0.5, + "description": "Speed/accuracy trade-off.", + "help_text": "See Rtsne::Rtsne(). Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5).", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_stop_lying_iter": { "type": "integer", - "default": 250 + "default": 250, + "description": "Iteration after which perplexities are no longer exaggerated.", + "help_text": "See Rtsne::Rtsne(). Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0).", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_mom_switch_iter": { "type": "integer", - "default": 250 + "default": 250, + "description": "Iteration after which the final momentum is used.", + "help_text": "See Rtsne::Rtsne(). Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0).", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_max_iter": { "type": "integer", - "default": 1000 + "default": 1000, + "description": "Number of iterations.", + "help_text": "See Rtsne::Rtsne(). ", + "fa_icon": "fas fa-less-than-equal" }, "reddim_tsne_pca_center": { "type": "string", - "default": "true" + "default": "true", + "description": "Center data before PCA.", + "help_text": "See Rtsne::Rtsne(). Should data be centered before pca is applied? (default: TRUE)", + "fa_icon": "fas fa-compress-arrows-alt" }, "reddim_tsne_pca_scale": { - "type": "string" + "type": "string", + "default": "false", + "description": "Scale data before PCA.", + "help_text": "See Rtsne::Rtsne(). Should data be scaled before pca is applied? (default: FALSE).", + "fa_icon": "fas fa-balance-scale" }, "reddim_tsne_normalize": { "type": "string", - "default": "true" + "default": "true", + "description": "Normalize data before distance calculations.", + "help_text": "See Rtsne::Rtsne(). Should data be normalized internally prior to distance calculations with normalize_input? (default: TRUE)", + "fa_icon": "fas fa-balance-scale" }, "reddim_tsne_momentum": { "type": "number", - "default": 0.5 + "default": 0.5, + "description": "Momentum used in the first part of optimization.", + "help_text": "See Rtsne::Rtsne(). ", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_final_momentum": { "type": "number", - "default": 0.8 + "default": 0.8, + "description": "Momentum used in the final part of optimization.", + "help_text": "See Rtsne::Rtsne(). ", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_eta": { "type": "integer", - "default": 1000 + "default": 1000, + "description": "Learning rate.", + "help_text": "See Rtsne::Rtsne(). ", + "fa_icon": "fas fa-calculator" }, "reddim_tsne_exaggeration_factor": { "type": "integer", - "default": 12 + "default": 12, + "description": "Exaggeration factor used in the first part of the optimization.", + "help_text": "See Rtsne::Rtsne(). Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0).", + "fa_icon": "fas fa-calculator" } }, - "fa_icon": "fas fa-cubes" + "fa_icon": "fas fa-cubes", + "required": [ + "reddim_input_reduced_dim", + "reddim_reduction_methods", + "reddim_vars_to_regress_out", + "reddim_umap_pca_dims", + "reddim_umap_n_neighbors", + "reddim_umap_n_components", + "reddim_umap_init", + "reddim_umap_metric", + "reddim_umap_n_epochs", + "reddim_umap_learning_rate", + "reddim_umap_min_dist", + "reddim_umap_spread", + "reddim_umap_set_op_mix_ratio", + "reddim_umap_local_connectivity", + "reddim_umap_repulsion_strength", + "reddim_umap_negative_sample_rate", + "reddim_umap_fast_sgd", + "reddim_tsne_dims", + "reddim_tsne_initial_dims", + "reddim_tsne_perplexity", + "reddim_tsne_theta", + "reddim_tsne_stop_lying_iter", + "reddim_tsne_mom_switch_iter", + "reddim_tsne_max_iter", + "reddim_tsne_pca_center", + "reddim_tsne_pca_scale", + "reddim_tsne_normalize", + "reddim_tsne_momentum", + "reddim_tsne_final_momentum", + "reddim_tsne_eta", + "reddim_tsne_exaggeration_factor" + ] }, "clustering": { "title": "Clustering", @@ -511,28 +877,48 @@ "description": "Parameters used to tune louvain/leiden clustering.", "default": "", "properties": { - "clust_method": { + "clust_cluster_method": { "type": "string", - "default": "leiden" + "default": "leiden", + "description": "Clustering method.", + "help_text": "Specify \"leiden\" or \"louvain\".", + "fa_icon": "fas fa-toolbox" }, "clust_reduction_method": { "type": "string", - "default": "UMAP_Liger" + "default": "UMAP_Liger", + "description": "Reduced dimension input(s) for clustering.", + "help_text": "One or more of \"UMAP\", \"tSNE\", \"PCA\", \"LSI\".", + "fa_icon": "fas fa-chess-board" }, "clust_res": { "type": "number", - "default": 0.001 + "default": 0.001, + "description": "The resolution of clustering.", + "fa_icon": "fas fa-calculator" }, "clust_k": { "type": "integer", - "default": 50 + "default": 50, + "description": "Integer number of nearest neighbours for clustering.", + "help_text": "Integer number of nearest neighbors to use when creating the k nearest neighbor graph for Louvain/Leiden clustering. k is related to the resolution of the clustering result, a bigger k will result in lower resolution and vice versa.", + "fa_icon": "fas fa-calculator" }, "clust_louvain_iter": { "type": "integer", - "default": 1 + "default": 1, + "description": "The number of iterations for clustering.", + "fa_icon": "fas fa-recycle" } }, - "fa_icon": "fas fa-braille" + "fa_icon": "fas fa-braille", + "required": [ + "clust_cluster_method", + "clust_reduction_method", + "clust_res", + "clust_k", + "clust_louvain_iter" + ] }, "cell_type_annotation": { "title": "Cell-type Annotation", @@ -542,26 +928,57 @@ "properties": { "cta_clusters_colname": { "type": "string", - "default": "clusters" + "default": "clusters", + "description": "SingleCellExperiment clusters colData variable name.", + "fa_icon": "fas fa-quote-left" }, "cta_cells_to_sample": { "type": "integer", - "default": 10000 + "default": 10000, + "description": "Max cells to sample.", + "fa_icon": "fas fa-calculator" }, "cta_unique_id_var": { "type": "string", - "default": "individual" + "default": "individual", + "description": "A sample metadata unique sample ID.", + "fa_icon": "fas fa-key" }, "cta_celltype_var": { "type": "string", - "default": "cluster_celltype" + "default": "cluster_celltype", + "description": "SingleCellExperiment cell-type colData variable name.", + "fa_icon": "fas fa-quote-left" }, "cta_facet_vars": { "type": "string", - "default": "manifest,diagnosis,sex,capdate,prepdate,seqdate" + "default": "manifest,diagnosis,sex,capdate,prepdate,seqdate", + "description": "Cell-type metrics for categorical variables.", + "fa_icon": "fas fa-layer-group" + }, + "cta_metric_vars": { + "type": "string", + "default": "pc_mito,pc_ribo,total_counts,total_features_by_counts", + "description": "Cell-type metrics for numeric variables.", + "fa_icon": "fas fa-layer-group" + }, + "cta_top_n": { + "type": "integer", + "default": 5, + "description": "Number of top marker genes for plot/table generation.", + "fa_icon": "fas fa-calculator" } }, - "fa_icon": "fas fa-brain" + "fa_icon": "fas fa-brain", + "required": [ + "cta_clusters_colname", + "cta_cells_to_sample", + "cta_unique_id_var", + "cta_celltype_var", + "cta_facet_vars", + "cta_metric_vars", + "cta_top_n" + ] }, "differential_gene_expression": { "title": "Differential Gene Expression", @@ -569,66 +986,143 @@ "description": "Parameters for differential gene expression.", "default": "", "properties": { - "dge_method": { + "dge_de_method": { "type": "string", - "default": "MASTZLM" + "default": "MASTZLM", + "description": "Differential gene expression method.", + "fa_icon": "fas fa-toolbox" }, "dge_mast_method": { "type": "string", - "default": "bayesglm" + "default": "bayesglm", + "help_text": "See MAST::zlm(). Either 'glm', 'glmer' or 'bayesglm'.", + "description": "MAST method.", + "enum": [ + "glm", + "glmer", + "bayesglm" + ], + "fa_icon": "fas fa-toolbox" }, "dge_min_counts": { "type": "integer", - "default": 1 + "default": 1, + "description": "Expressive gene minimum counts.", + "help_text": "Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression.", + "fa_icon": "fas fa-greater-than-equal" }, "dge_min_cells_pc": { "type": "number", - "default": 0.1 + "default": 0.1, + "minimum": 0, + "maximum": 1, + "description": "Expressive gene minimum cells fraction.", + "help_text": "Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression. Default 0.1 (i.e. 10% of cells).", + "fa_icon": "fas fa-greater-than-equal" }, "dge_rescale_numerics": { "type": "string", - "default": "true" + "default": "true", + "description": "Re-scale numeric covariates.", + "help_text": "Re-scaling and centring numeric covariates in a model can improve model performance.", + "fa_icon": "fas fa-balance-scale" }, "dge_pseudobulk": { - "type": "string" + "type": "string", + "default": "false", + "description": "Pseudobulked differential gene expression.", + "help_text": "Perform differential gene expression on a smaller matrix where counts are first summed across all cells within a sample (defined by dge_sample_var level).", + "fa_icon": "far fa-object-group" }, "dge_celltype_var": { "type": "string", - "default": "cluster_celltype" + "default": "cluster_celltype", + "description": "Cell-type annotation variable name.", + "help_text": "Differential gene expression is performed separately for each cell-type of this colData variable.", + "fa_icon": "fas fa-quote-left" }, "dge_sample_var": { "type": "string", - "default": "manifest" + "default": "manifest", + "description": "Unique sample identifier variable.", + "fa_icon": "fas fa-key" }, "dge_dependent_var": { "type": "string", - "default": "group" + "default": "group", + "description": "Dependent variable of DGE model.", + "help_text": "The dependent variable may be a categorical (e.g. diagnosis) or a numeric (e.g. histopathology score) variable.", + "fa_icon": "fas fa-quote-left" }, "dge_ref_class": { "type": "string", - "default": "Control" + "default": "Control", + "help_text": "If a categorical dependent variable is specified, then the reference class of the dependent variable is specified here (e.g. 'Control').", + "description": "Reference class of categorical dependent variable.", + "fa_icon": "fas fa-quote-left" }, "dge_confounding_vars": { "type": "string", - "default": "cngeneson,seqdate,pc_mito" + "default": "cngeneson,seqdate,pc_mito", + "description": "Confounding variables.", + "help_text": "A comma-separated list of confounding variables to account for in the DGE model.", + "fa_icon": "fas fa-layer-group" }, "dge_random_effects_var": { "type": "string", - "default": "NULL" + "default": "NULL", + "description": "Random effect confounding variable.", + "help_text": "If specified, the term `+ (1 | x ) +`is added to the model, where x is the specified random effects variable.", + "fa_icon": "fas fa-quote-left" }, "dge_fc_threshold": { "type": "number", - "default": 1.1 + "default": 1.1, + "description": "Fold-change threshold for plotting.", + "help_text": "This absolute fold-change cut-off value is used in plots (e.g. volcano) and the DGE report.", + "fa_icon": "fas fa-calculator" }, "dge_pval_cutoff": { "type": "number", - "default": 0.05 + "default": 0.05, + "description": "Adjusted p-value cutoff.", + "help_text": "The adjusted p-value cutoff value is used in plots (e.g. volcano) and the DGE report.", + "fa_icon": "fas fa-less-than-equal" }, "dge_force_run": { - "type": "string" + "type": "string", + "default": "false", + "description": "Force model fit for non-full rank.", + "help_text": "A non-full rank model specification will return an error; to override this to return a warning only, set to TRUE.", + "fa_icon": "fas fa-exclamation" + }, + "dge_max_cores": { + "type": "string", + "default": "'null'", + "description": "Maximum CPU cores.", + "help_text": "The default value of 'null' utilizes all available CPU cores. As each additional CPU core increases the number of genes simultaneously fit, the RAM/memory demand increases concomitantly. Manually overriding this parameter can reduce the memory demands of parallelization across multiple cores.", + "fa_icon": "fas fa-microchip" } }, - "fa_icon": "fas fa-chart-bar" + "fa_icon": "fas fa-chart-bar", + "required": [ + "dge_de_method", + "dge_mast_method", + "dge_min_counts", + "dge_min_cells_pc", + "dge_rescale_numerics", + "dge_pseudobulk", + "dge_celltype_var", + "dge_sample_var", + "dge_dependent_var", + "dge_ref_class", + "dge_confounding_vars", + "dge_random_effects_var", + "dge_fc_threshold", + "dge_pval_cutoff", + "dge_force_run", + "dge_max_cores" + ] }, "impacted_pathway_analysis": { "title": "Impacted Pathway Analysis", @@ -636,24 +1130,37 @@ "description": "Parameters for impacted pathway analysis of differentially expressed genes.", "default": "", "properties": { - "ipa_reference_file": { - "type": "string", - "default": "NULL" - }, "ipa_enrichment_tool": { "type": "string", - "default": "WebGestaltR" + "default": "WebGestaltR", + "description": "Pathway enrichment tool(s) to use.", + "enum": [ + "WebGestaltR", + "ROntoTools", + "enrichR" + ], + "fa_icon": "fas fa-toolbox" }, "ipa_enrichment_method": { "type": "string", - "default": "ORA" + "default": "ORA", + "description": "Enrichment method.", + "fa_icon": "fas fa-layer-group" }, "ipa_enrichment_database": { "type": "string", - "default": "GO_Biological_Process" + "default": "GO_Biological_Process", + "description": "Database(s) to use for enrichment.", + "help_text": "See scFlow::list_databases(). Name of the database(s) for enrichment. Examples include \"GO_Biological_Process\", \"GO_Cellular_Component\", \"GO_Molecular_Function\", \"KEGG\", \"Reactome\", \"Wikipathway\".", + "fa_icon": "fas fa-layer-group" } }, - "fa_icon": "fas fa-project-diagram" + "fa_icon": "fas fa-project-diagram", + "required": [ + "ipa_enrichment_tool", + "ipa_enrichment_method", + "ipa_enrichment_database" + ] }, "dirichlet_modeling": { "title": "Dirichlet Modeling", @@ -663,26 +1170,44 @@ "properties": { "dirich_unique_id_var": { "type": "string", - "default": "individual" + "default": "individual", + "description": "Unique sampler identifier.", + "fa_icon": "fas fa-key" }, "dirich_celltype_var": { "type": "string", - "default": "cluster_celltype" + "default": "cluster_celltype", + "description": "Cell-type annotation variable name.", + "fa_icon": "fas fa-quote-left" }, "dirich_dependent_var": { "type": "string", - "default": "group" + "default": "group", + "description": "Dependent variable of Dirichlet model.", + "fa_icon": "fas fa-quote-left" }, "dirich_ref_class": { "type": "string", - "default": "Control" + "default": "Control", + "description": "Reference class of categorical dependent variable.", + "fa_icon": "fas fa-quote-left" }, "dirich_var_order": { "type": "string", - "default": "Control,Low,High" + "default": "Control,Low,High", + "description": "Dependent variable classes order.", + "help_text": "For plotting and reports, the order of classes for the dependent variable can be manually specified (e.g. 'Control,Low,High').", + "fa_icon": "fas fa-layer-group" } }, - "fa_icon": "fas fa-chart-pie" + "fa_icon": "fas fa-chart-pie", + "required": [ + "dirich_unique_id_var", + "dirich_celltype_var", + "dirich_dependent_var", + "dirich_ref_class", + "dirich_var_order" + ] }, "general_plotting": { "title": "General - Plotting", @@ -692,68 +1217,31 @@ "properties": { "plotreddim_reduction_methods": { "type": "string", - "default": "UMAP_Liger" - } - }, - "fa_icon": "fas fa-chart-area" - }, - "generic_options": { - "title": "Generic options", - "type": "object", - "fa_icon": "fas fa-file-import", - "description": "Less common options for the pipeline, typically set in a config file.", - "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", - "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "hidden": true, - "fa_icon": "fas fa-question-circle" - }, - "name": { - "type": "string", - "description": "Workflow name.", - "fa_icon": "fas fa-fingerprint", - "hidden": true, - "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles." - }, - "email": { - "type": "string", - "description": "Email address for completion summary.", - "fa_icon": "fas fa-envelope", - "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", - "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" - }, - "email_on_fail": { - "type": "string", - "description": "Email address for completion summary, only when pipeline fails.", - "fa_icon": "fas fa-exclamation-triangle", - "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", - "hidden": true, - "help_text": "This works exactly as with `--email`, except emails are only sent if the workflow is not successful." - }, - "plaintext_email": { - "type": "boolean", - "description": "Send plain-text email instead of HTML.", - "fa_icon": "fas fa-remove-format", - "hidden": true, - "help_text": "Set to receive plain-text e-mails instead of HTML formatted." + "default": "UMAP_Liger", + "description": "Preferred embedding for plots.", + "fa_icon": "fas fa-hand-pointer" }, - "monochrome_logs": { - "type": "boolean", - "description": "Do not use coloured log outputs.", - "fa_icon": "fas fa-palette", - "hidden": true, - "help_text": "Set to disable colourful command line output and live life in monochrome." + "reddimplot_pointsize": { + "type": "number", + "description": "Point size for reduced dimension plots.", + "default": 0.1, + "help_text": "To improve visualization the point size should be adjusted according to the total number of cells plotted.", + "fa_icon": "fas fa-expand-alt" }, - "tracedir": { - "type": "string", - "description": "Directory to keep pipeline Nextflow logs and reports.", - "default": "${params.outdir}/pipeline_info", - "fa_icon": "fas fa-cogs", - "hidden": true + "reddimplot_alpha": { + "type": "number", + "description": "Alpha (transparency) value for reduced dimension plots.", + "default": 0.2, + "help_text": "To improve visualization the alpha (transparency) value should be adjusted according to the total number of cells plotted.", + "fa_icon": "fas fa-braille" } - } + }, + "fa_icon": "fas fa-chart-area", + "required": [ + "plotreddim_reduction_methods", + "reddimplot_pointsize", + "reddimplot_alpha" + ] }, "institutional_config_options": { "title": "Institutional config options", @@ -767,15 +1255,14 @@ "description": "Git commit id for Institutional configs.", "default": "master", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" + "fa_icon": "fas fa-users-cog" }, "custom_config_base": { "type": "string", "description": "Base directory for Institutional configs.", "default": "https://raw.githubusercontent.com/nf-core/configs/master", "hidden": true, - "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", "fa_icon": "fas fa-users-cog" }, "hostnames": { @@ -784,6 +1271,12 @@ "hidden": true, "fa_icon": "fas fa-users-cog" }, + "config_profile_name": { + "type": "string", + "description": "Institutional config name.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, "config_profile_description": { "type": "string", "description": "Institutional config description.", @@ -813,7 +1306,7 @@ "properties": { "max_cpus": { "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", + "description": "Maximum number of CPUs that can be requested for any single job.", "default": 16, "fa_icon": "fas fa-microchip", "hidden": true, @@ -822,8 +1315,9 @@ "max_memory": { "type": "string", "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", + "default": "256.GB", "fa_icon": "fas fa-memory", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", "hidden": true, "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" }, @@ -832,10 +1326,108 @@ "description": "Maximum amount of time that can be requested for any single job.", "default": "240.h", "fa_icon": "far fa-clock", + "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", "hidden": true, "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" } } + }, + "generic_options": { + "title": "Generic options", + "type": "object", + "fa_icon": "fas fa-file-import", + "description": "Less common options for the pipeline, typically set in a config file.", + "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "fa_icon": "fas fa-question-circle", + "hidden": true + }, + "publish_dir_mode": { + "type": "string", + "default": "copy", + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": [ + "symlink", + "rellink", + "link", + "copy", + "copyNoFollow", + "move" + ], + "hidden": true + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "hidden": true + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "hidden": true + }, + "validate_params": { + "type": "boolean", + "description": "Boolean whether to validate parameters against the schema at runtime", + "default": true, + "fa_icon": "fas fa-check-square", + "hidden": true + }, + "show_hidden_params": { + "type": "boolean", + "fa_icon": "far fa-eye-slash", + "description": "Show all params when using `--help`", + "hidden": true, + "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." + }, + "enable_conda": { + "type": "boolean", + "description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.", + "hidden": true, + "fa_icon": "fas fa-bacon" + }, + "singularity_pull_docker_container": { + "type": "boolean", + "description": "Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.", + "hidden": true, + "fa_icon": "fas fa-toolbox", + "help_text": "This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues." + }, + "email": { + "type": "string", + "description": "E-mail address for optional workflow completion notification.", + "hidden": true, + "fa_icon": "fas fa-envelope" + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "hidden": true, + "fa_icon": "fas fa-envelope" + }, + "options": { + "type": "string", + "description": "NA", + "hidden": true, + "fa_icon": "fas fa-filter" + } + } } }, "allOf": [ @@ -878,14 +1470,14 @@ { "$ref": "#/definitions/general_plotting" }, - { - "$ref": "#/definitions/generic_options" - }, { "$ref": "#/definitions/institutional_config_options" }, { "$ref": "#/definitions/max_job_request_options" + }, + { + "$ref": "#/definitions/generic_options" } ] } \ No newline at end of file diff --git a/subworkflows/local/input_check.nf b/subworkflows/local/input_check.nf new file mode 100644 index 0000000..b664bc8 --- /dev/null +++ b/subworkflows/local/input_check.nf @@ -0,0 +1,42 @@ +// +// Check input samplesheet and get read channels +// + +params.options = [:] + +include { SAMPLESHEET_CHECK } from '../../modules/local/samplesheet_check' addParams( options: params.options ) + +workflow INPUT_CHECK { + take: + samplesheet // file: /path/to/samplesheet.csv + + main: + SAMPLESHEET_CHECK ( samplesheet ) + .splitCsv ( header:true, sep:',' ) + .map { create_fastq_channels(it) } + .set { reads } + + emit: + reads // channel: [ val(meta), [ reads ] ] +} + +// Function to get list of [ meta, [ fastq_1, fastq_2 ] ] +def create_fastq_channels(LinkedHashMap row) { + def meta = [:] + meta.id = row.sample + meta.single_end = row.single_end.toBoolean() + + def array = [] + if (!file(row.fastq_1).exists()) { + exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.fastq_1}" + } + if (meta.single_end) { + array = [ meta, [ file(row.fastq_1) ] ] + } else { + if (!file(row.fastq_2).exists()) { + exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.fastq_2}" + } + array = [ meta, [ file(row.fastq_1), file(row.fastq_2) ] ] + } + return array +} diff --git a/workflows/scflow.nf b/workflows/scflow.nf new file mode 100644 index 0000000..9f27efd --- /dev/null +++ b/workflows/scflow.nf @@ -0,0 +1,374 @@ +/* +======================================================================================== + VALIDATE INPUTS +======================================================================================== +*/ + +def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) + +// Validate input parameters +WorkflowScflow.initialise(params, log) + +// Check input path parameters to see if they exist +def checkPathParamList = [ params.input, params.manifest ] +for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true) } } + +// Check mandatory parameters +if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet not specified!' } + +/* + * Create a channel for input read files + */ +if (params.manifest) { ch_manifest = file(params.manifest, checkIfExists: true) } +if (params.input) { ch_input = file(params.input, checkIfExists: true) } +if (params.input) { ch_input2 = file(params.input, checkIfExists: true) } // copy for qc +if (params.ctd_path) { ch_ctd_path = file(params.ctd_path, checkIfExists: true) } +if (params.celltype_mappings) { ch_celltype_mappings = file(params.celltype_mappings, checkIfExists: false) } +if (params.ensembl_mappings) { ch_ensembl_mappings = file(params.ensembl_mappings, checkIfExists: false) } +if (params.ensembl_mappings) { ch_ensembl_mappings2 = file(params.ensembl_mappings, checkIfExists: false) } +if (params.ensembl_mappings) { ch_ensembl_mappings3 = file(params.ensembl_mappings, checkIfExists: false) } +if (params.reddim_genes_yml) { ch_reddim_genes_yml = file(params.reddim_genes_yml, checkIfExists: true) } + +/* +======================================================================================== + CONFIG FILES +======================================================================================== +*/ + +/* +======================================================================================== + IMPORT LOCAL MODULES/SUBWORKFLOWS +======================================================================================== +*/ + +// Don't overwrite global params.modules, create a copy instead and use that within the main script. +def modules = params.modules.clone() + +def scflow_checkinputs_options = modules['scflow_checkinputs'] +scflow_checkinputs_options.args = '' + +def scflow_qc_options = modules['scflow_qc'] +scflow_qc_options.args = + "--key_colname ${params.qc_key_colname} \ + --factor_vars ${params.qc_factor_vars} \ + --min_library_size ${params.qc_min_library_size} \ + --max_library_size ${params.qc_max_library_size} \ + --min_features ${params.qc_min_features} \ + --max_features ${params.qc_max_features} \ + --max_mito ${params.qc_max_mito} \ + --min_ribo ${params.qc_min_ribo} \ + --max_ribo ${params.qc_max_ribo} \ + --min_counts ${params.qc_min_counts} \ + --min_cells ${params.qc_min_cells} \ + --drop_unmapped ${params.qc_drop_unmapped} \ + --drop_mito ${params.qc_drop_mito} \ + --drop_ribo ${params.qc_drop_ribo} \ + --nmads ${params.qc_nmads} \ + --find_singlets ${params.mult_find_singlets} \ + --singlets_method ${params.mult_singlets_method} \ + --vars_to_regress_out ${params.mult_vars_to_regress_out} \ + --pca_dims ${params.mult_pca_dims} \ + --var_features ${params.mult_var_features} \ + --doublet_rate ${params.mult_doublet_rate} \ + --dpk ${params.mult_dpk} \ + --pK ${params.mult_pK} \ + --find_cells ${params.amb_find_cells} \ + --lower ${params.amb_lower} \ + --retain ${params.amb_retain} \ + --alpha_cutoff ${params.amb_alpha_cutoff} \ + --niters ${params.amb_niters} \ + --expect_cells ${params.amb_expect_cells} \ + --species ${params.species} " + +def scflow_mergeqctables_options = modules['scflow_mergeqctables'] +scflow_mergeqctables_options.args = '' + +def scflow_merge_options = modules['scflow_merge'] +scflow_merge_options.args = + "--unique_id_var ${params.qc_key_colname} \ + --plot_vars ${params.merge_plot_vars} \ + --facet_vars ${params.merge_facet_vars} \ + --outlier_vars ${params.merge_outlier_vars} \ + --species ${params.species}" + +def scflow_integrate_options = modules['scflow_integrate'] +scflow_integrate_options.args = + "--method ${params.integ_method} \ + --unique_id_var ${params.integ_unique_id_var} \ + --take_gene_union ${params.integ_take_gene_union} \ + --remove_missing ${params.integ_remove_missing} \ + --num_genes ${params.integ_num_genes} \ + --combine ${params.integ_combine} \ + --keep_unique ${params.integ_keep_unique} \ + --capitalize ${params.integ_capitalize} \ + --use_cols ${params.integ_use_cols} \ + --k ${params.integ_k} \ + --lambda ${params.integ_lambda} \ + --thresh ${params.integ_thresh} \ + --max_iters ${params.integ_max_iters} \ + --nrep ${params.integ_nrep} \ + --rand_seed ${params.integ_rand_seed} \ + --knn_k ${params.integ_knn_k} \ + --k2 ${params.integ_k2} \ + --prune_thresh ${params.integ_prune_thresh} \ + --ref_dataset ${params.integ_ref_dataset} \ + --min_cells ${params.integ_min_cells} \ + --quantiles ${params.integ_quantiles} \ + --nstart ${params.integ_nstart} \ + --resolution ${params.integ_resolution} \ + --dims_use ${params.integ_dims_use} \ + --dist_use ${params.integ_dist_use} \ + --center ${params.integ_center} \ + --small_clust_thresh ${params.integ_small_clust_thresh}" + +def scflow_reducedims_options = modules['scflow_reducedims'] +scflow_reducedims_options.args = + "--input_reduced_dim ${params.reddim_input_reduced_dim} \ + --reduction_methods ${params.reddim_reduction_methods} \ + --vars_to_regress_out ${params.reddim_vars_to_regress_out} \ + --pca_dims ${params.reddim_umap_pca_dims} \ + --n_neighbors ${params.reddim_umap_n_neighbors} \ + --n_components ${params.reddim_umap_n_components} \ + --init ${params.reddim_umap_init} \ + --metric ${params.reddim_umap_metric} \ + --n_epochs ${params.reddim_umap_n_epochs} \ + --learning_rate ${params.reddim_umap_learning_rate} \ + --min_dist ${params.reddim_umap_min_dist} \ + --spread ${params.reddim_umap_spread} \ + --set_op_mix_ratio ${params.reddim_umap_set_op_mix_ratio} \ + --local_connectivity ${params.reddim_umap_local_connectivity} \ + --repulsion_strength ${params.reddim_umap_repulsion_strength} \ + --negative_sample_rate ${params.reddim_umap_negative_sample_rate} \ + --fast_sgd ${params.reddim_umap_fast_sgd} \ + --dims ${params.reddim_tsne_dims} \ + --initial_dims ${params.reddim_tsne_initial_dims} \ + --perplexity ${params.reddim_tsne_perplexity} \ + --theta ${params.reddim_tsne_theta} \ + --stop_lying_iter ${params.reddim_tsne_stop_lying_iter} \ + --mom_switch_iter ${params.reddim_tsne_mom_switch_iter} \ + --max_iter ${params.reddim_tsne_max_iter} \ + --pca_center ${params.reddim_tsne_pca_center} \ + --pca_scale ${params.reddim_tsne_pca_scale} \ + --normalize ${params.reddim_tsne_normalize} \ + --momentum ${params.reddim_tsne_momentum} \ + --final_momentum ${params.reddim_tsne_final_momentum} \ + --eta ${params.reddim_tsne_eta} \ + --exaggeration_factor ${params.reddim_tsne_exaggeration_factor}" + +def scflow_cluster_options = modules['scflow_cluster'] +scflow_cluster_options.args = + "--cluster_method ${params.clust_cluster_method} \ + --reduction_method ${params.clust_reduction_method} \ + --res ${params.clust_res} \ + --k ${params.clust_k} \ + --louvain_iter ${params.clust_louvain_iter}" + +def scflow_reportintegrated_options = modules['scflow_reportintegrated'] +scflow_reportintegrated_options.args = + "--categorical_covariates ${params.integ_categorical_covariates} \ + --input_reduced_dim ${params.integ_input_reduced_dim} \ + --reddimplot_pointsize ${params.reddimplot_pointsize} \ + --reddimplot_alpha ${params.reddimplot_alpha}" + +def scflow_mapcelltypes_options = modules['scflow_mapcelltypes'] +scflow_mapcelltypes_options.args = + "--clusters_colname ${params.cta_clusters_colname} \ + --cells_to_sample ${params.cta_cells_to_sample} \ + --species ${params.species} \ + --reddimplot_pointsize ${params.reddimplot_pointsize} \ + --reddimplot_alpha ${params.reddimplot_alpha}" + +def scflow_finalize_options = modules['scflow_finalize'] +scflow_finalize_options.args = + "--clusters_colname ${params.cta_clusters_colname} \ + --celltype_var ${params.cta_celltype_var} \ + --unique_id_var ${params.cta_unique_id_var} \ + --facet_vars ${params.cta_facet_vars} \ + --input_reduced_dim ${params.clust_reduction_method} \ + --metric_vars ${params.cta_metric_vars} \ + --top_n ${params.cta_top_n} \ + --reddimplot_pointsize ${params.reddimplot_pointsize} \ + --reddimplot_alpha ${params.reddimplot_alpha}" + +def scflow_dge_options = modules['scflow_dge'] +scflow_dge_options.args = + "--mast_method ${params.dge_mast_method} \ + --min_counts ${params.dge_min_counts} \ + --min_cells_pc ${params.dge_min_cells_pc} \ + --rescale_numerics ${params.dge_rescale_numerics} \ + --force_run ${params.dge_force_run} \ + --pseudobulk ${params.dge_pseudobulk} \ + --celltype_var ${params.dge_celltype_var} \ + --sample_var ${params.dge_sample_var} \ + --dependent_var ${params.dge_dependent_var} \ + --ref_class ${params.dge_ref_class} \ + --confounding_vars ${params.dge_confounding_vars} \ + --random_effects_var ${params.dge_random_effects_var} \ + --pval_cutoff ${params.dge_pval_cutoff} \ + --fc_threshold ${params.dge_fc_threshold} \ + --species ${params.species} \ + --max_cores ${params.dge_max_cores}" + +def scflow_plotreddimgenes_options = modules['scflow_plotreddimgenes'] +scflow_plotreddimgenes_options.args = + "--reduction_methods ${params.plotreddim_reduction_methods} \ + --reddimplot_pointsize ${params.reddimplot_pointsize} \ + --reddimplot_alpha ${params.reddimplot_alpha}" + +def scflow_ipa_options = modules['scflow_ipa'] +scflow_ipa_options.args = + "--enrichment_tool ${params.ipa_enrichment_tool} \ + --enrichment_method ${params.ipa_enrichment_method} \ + --enrichment_database ${params.ipa_enrichment_database}" + +def scflow_dirichlet_options = modules['scflow_dirichlet'] +scflow_dirichlet_options.args = + "--unique_id_var ${params.dirich_unique_id_var} \ + --celltype_var ${params.dirich_celltype_var} \ + --dependent_var ${params.dirich_dependent_var} \ + --ref_class ${params.dirich_ref_class} \ + --var_order ${params.dirich_var_order}" + +def get_software_versions = modules['get_software_versions'] +get_software_versions.args = '' + +include { SCFLOW_CHECKINPUTS } from '../modules/local/process/scflow/checkinputs' addParams( options: scflow_checkinputs_options ) +include { SCFLOW_QC } from '../modules/local/process/scflow/qc' addParams( options: scflow_qc_options ) +include { SCFLOW_MERGEQCTABLES } from '../modules/local/process/scflow/mergeqctables' addParams( options: scflow_mergeqctables_options ) +include { SCFLOW_MERGE } from '../modules/local/process/scflow/merge' addParams( options: scflow_merge_options ) +include { SCFLOW_INTEGRATE } from '../modules/local/process/scflow/integrate' addParams( options: scflow_integrate_options ) +include { SCFLOW_REDUCEDIMS } from '../modules/local/process/scflow/reducedims' addParams( options: scflow_reducedims_options ) +include { SCFLOW_CLUSTER } from '../modules/local/process/scflow/cluster' addParams( options: scflow_cluster_options ) +include { SCFLOW_REPORTINTEGRATED } from '../modules/local/process/scflow/reportintegrated' addParams( options: scflow_reportintegrated_options ) +include { SCFLOW_MAPCELLTYPES } from '../modules/local/process/scflow/mapcelltypes' addParams( options: scflow_mapcelltypes_options ) +include { SCFLOW_FINALIZE } from '../modules/local/process/scflow/finalize' addParams( options: scflow_finalize_options ) +include { SCFLOW_PLOTREDDIMGENES } from '../modules/local/process/scflow/plotreddimgenes' addParams( options: scflow_plotreddimgenes_options ) +include { SCFLOW_DGE } from '../modules/local/process/scflow/dge' addParams( options: scflow_dge_options ) +include { SCFLOW_IPA } from '../modules/local/process/scflow/ipa' addParams( options: scflow_ipa_options ) +include { SCFLOW_DIRICHLET } from '../modules/local/process/scflow/dirichlet' addParams( options: scflow_dirichlet_options ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files : ['tsv':'']] ) + + +// +// MODULE: Local to the pipeline +// + +// +// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules +// + + +/* +======================================================================================== + IMPORT NF-CORE MODULES/SUBWORKFLOWS +======================================================================================== +*/ + +/* +======================================================================================== + RUN MAIN WORKFLOW +======================================================================================== +*/ + +// Info required for completion email and summary +//def multiqc_report = [] + +workflow SCFLOW { + + main: + SCFLOW_CHECKINPUTS ( + ch_manifest, + ch_input + ) + + SCFLOW_QC ( + SCFLOW_CHECKINPUTS.out.checked_manifest.splitCsv( + header:['key', 'filepath'], + skip: 1, sep: '\t' + ) + .map { row -> tuple(row.key, row.filepath) }, + ch_input2, + ch_ensembl_mappings + ) + + SCFLOW_MERGEQCTABLES ( + SCFLOW_QC.out.qc_summary.collect() + ) + + SCFLOW_MERGE ( + SCFLOW_QC.out.qc_sce.collect(), + ch_ensembl_mappings2 + ) + + SCFLOW_INTEGRATE ( + SCFLOW_MERGE.out.merged_sce + ) + + SCFLOW_REDUCEDIMS ( + SCFLOW_INTEGRATE.out.integrated_sce + ) + + SCFLOW_CLUSTER ( + SCFLOW_REDUCEDIMS.out.reddim_sce + ) + + SCFLOW_REPORTINTEGRATED ( + SCFLOW_CLUSTER.out.clustered_sce + ) + + SCFLOW_MAPCELLTYPES ( + SCFLOW_CLUSTER.out.clustered_sce, + ch_ctd_path + ) + + SCFLOW_FINALIZE ( + SCFLOW_MAPCELLTYPES.out.celltype_mapped_sce, + ch_celltype_mappings + ) + + SCFLOW_DGE ( + SCFLOW_FINALIZE.out.final_sce, + params.dge_de_method, + SCFLOW_FINALIZE.out.celltypes.splitCsv ( + header:['celltype', 'n_cells'], skip: 1, sep: '\t' + ) + .map { row -> tuple(row.celltype, row.n_cells) }, + ch_ensembl_mappings3 + ) + + SCFLOW_IPA ( + SCFLOW_DGE.out.de_table + ) + + SCFLOW_DIRICHLET ( + SCFLOW_FINALIZE.out.final_sce + ) + + SCFLOW_PLOTREDDIMGENES ( + SCFLOW_CLUSTER.out.clustered_sce, + ch_reddim_genes_yml + ) + + GET_SOFTWARE_VERSIONS ( + ) +} + + +/* +======================================================================================== + COMPLETION EMAIL AND SUMMARY +======================================================================================== +*/ + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} + +/* +======================================================================================== + THE END +======================================================================================== +*/