Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#169](https://github.com/nf-core/seqinspector/pull/169) Rescue missing versions from PREPARE_GENOME subworkflow
- [#171](https://github.com/nf-core/seqinspector/pull/171) Rescue number of tasks in the pipeline level tests
- [#172](https://github.com/nf-core/seqinspector/pull/172) More complete conda environment for rundir parser
- [#173](https://github.com/nf-core/seqinspector/pull/173) Fix warning message for tag name collision

### `Changed`

Expand All @@ -72,6 +73,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#164](https://github.com/nf-core/seqinspector/pull/164) Refactor local subworkflow and pipeline tests
- [#168](https://github.com/nf-core/seqinspector/pull/168) Adhere to strict syntax
- [#169](https://github.com/nf-core/seqinspector/pull/169) Prepare release 1.0.0
- [#173](https://github.com/nf-core/seqinspector/pull/173) Improve documentation

### `Dependencies`

Expand Down
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ It can perform subsampling, quality assessment, duplication level analysis, and
The pipeline generates detailed MultiQC reports with flexible output options, ranging from individual sample reports to project-wide summaries, making it particularly useful for sequencing core facilities and research groups with access to sequencing instruments.
If provided, nf-core/seqinspector can also parse statistics from an Illumina run folder directory into the final MultiQC reports.

### Compatibility between tools and data type

<!-- TODO: add a search tool that accepts a tree for `Compatibility with Data`. -->

| Tool Type | Tool Name | Tool Description | Compatibility with Data | Dependencies | Default tool |
Expand All @@ -40,6 +42,8 @@ If provided, nf-core/seqinspector can also parse statistics from an Illumina run
| `QC` | [`Picard_collecthsmetrics`](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard) | Collect alignment QC metrics of hybrid-selection data. | [RNA, DNA] | [Bwamem2, SAMtools, `--fasta`, `--run_picard_collecths_metrics`, `--bait_intervals`, `--target_intervals` (`--ref_dict`)] | no |
| `Reporting` | [`MultiQC`](http://multiqc.info/) | Present QC for raw reads | [RNA, DNA, synthetic] | [N/A] | yes |

### Workflow diagram

<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/images/seqinspector_tubemap_V1.0_dark.png">
<source media="(prefers-color-scheme: light)" srcset="docs/images/seqinspector_tubemap_V1.0.png">
Expand Down Expand Up @@ -86,7 +90,9 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/seqinspector was originally written by [@agrima2010](https://github.com/agrima2010), [@Aratz](https://github.com/Aratz), [@FranBonath](https://github.com/FranBonath), [@kedhammar](https://github.com/kedhammar), and [@MatthiasZepper](https://github.com/MatthiasZepper) from the Swedish [@NationalGenomicsInfrastructure](https://github.com/NationalGenomicsInfrastructure/) and [Clinical Genomics Stockholm](https://clinical.scilifelab.se/).
nf-core/seqinspector was originally written by [@agrima2010](https://github.com/agrima2010), [@Aratz](https://github.com/Aratz), [@FranBonath](https://github.com/FranBonath), [@kedhammar](https://github.com/kedhammar), and [@MatthiasZepper](https://github.com/MatthiasZepper) from the Swedish [National Genomics Infrastructure](https://github.com/NationalGenomicsInfrastructure/) and [Clinical Genomics Stockholm](https://clinical.scilifelab.se/).

Maintenance is now lead by Maxime U Garcia ([National Genomics Infrastructure](https://github.com/NationalGenomicsInfrastructure/))

We thank the following people for their extensive assistance in the development of this pipeline:

Expand All @@ -100,7 +106,6 @@ We thank the following people for their extensive assistance in the development
- [@kjellinjonas](https://github.com/kjellinjonas)
- [@mahesh-panchal](https://github.com/mahesh-panchal)
- [@matrulda](https://github.com/matrulda)
- [@maxulysse](https://github.com/maxulysse)
- [@mirpedrol](https://github.com/mirpedrol)
- [@nggvs](https://github.com/nggvs)
- [@nkongenelly](https://github.com/nkongenelly)
Expand Down
39 changes: 25 additions & 14 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,18 @@

### General points

The nf-core/seqinspector pipeline is a general QC pipeline for sequencing data. The current version only supports data in fastq format.
The pipeline is meant to include a large amount of possible QC tools to chose from, but not all of them may be relevant to your data. As such we highly recommend to familiarize yourself with the different QC tools available and to remove any QC tool you would like to exclude with the `--skip-tools` command line parameter. For repeated use we suggest to create a params file containing the `--skip-tools` parameters (for details see the "Running the pipeline" section).
Be aware that some tools are skipped by default and will need to be included in the list of skipped tools when curating your own list. To identify defaults included or excluded please check out the overview table in the Introduction.
The nf-core/seqinspector pipeline is a general QC pipeline for sequencing data.
The current version only supports data in fastq format.
The pipeline is meant to include a large amount of possible QC tools to chose from, but not all of them may be relevant to your data.
As such we highly recommend to familiarize yourself with the different QC tools available and to remove any QC tool you would like to exclude with the `--skip-tools` command line parameter.
For repeated use we suggest to create a params file containing the `--skip-tools` parameters (for details see the "Running the pipeline" section).
Be aware that some tools are skipped by default and will need to be included in the list of skipped tools when curating your own list.
To identify defaults included or excluded please check out [the overview compatibility between tools and data type table](../#compatibility-between-tools-and-data-type).

### What nf-core/seqinspector is not for

The results of the nf-core/seqinspector pipeline are not meant to be used for any downstream analysis, but are exclusively for QC purposes. Even tools that may be used in other pipelines as a starting point for analysis are run in a QC perspective, most likely with a downsampled input.
The results of the nf-core/seqinspector pipeline are not meant to be used for any downstream analysis, but are exclusively for QC purposes.
Even tools that may be used in other pipelines as a starting point for analysis are run in a QC perspective, most likely with a downsampled input.

## Samplesheet input

Expand All @@ -26,7 +31,7 @@ You will need to create a samplesheet with information about the samples/fastq f

The following simple run dir structure...

```
```bash
run_dir
├── sample1_lane1_group1_r1.fq.gz
├── sample2_lane1_group1_r1.fq.gz
Expand All @@ -42,7 +47,6 @@ sample1 path/to/run_dir/sample1_lane1_group1_r1.fq.gz path/to/run_dir pr
sample2 path/to/run_dir/sample2_lane1_group1_r1.fq.gz path/to/run_dir project1:group1
sample3 path/to/run_dir/sample3_lane2_group2_r1.fq.gz path/to/run_dir project1:group2
sample4 path/to/run_dir/sample4_lane2_group3_r1.fq.gz path/to/run_dir control

```

| Column | Description |
Expand Down Expand Up @@ -100,17 +104,28 @@ genome: 'GRCh37'

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).

Optionally, the `sample_size` parameter allows you to subset a random number of reads to be analysed. Both absolute numbers (e.g 100) and relative numbers (e.g 0.25) can be specified.
### Sample size selection

Optionally, the `sample_size` parameter allows you to subset a random number of reads to be analysed.
Both absolute numbers (e.g 100) and relative numbers (e.g 0.25) can be specified.

```bash
nextflow run nf-core/seqinspector --input ./samplesheet.csv --outdir ./results --sample_size 1000000 -profile docker
```

### Hybrid-selection QC metrics

The pipeline supports hybrid-selection (HS) QC metrics collection .
Use `--run_picard_collecthsmetrics true` to run the QC tool [picard CollectHSmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard).
This tool is otherwise not run by default.

### Skipping tools

Some tools might not be compatible with your data. In this case you can skip them by providing a comma-separated list of tools to be skipped with the `--skip_tools` parameter.
Some tools might not be compatible with your data.
In this case you can skip them by providing a comma-separated list of tools to be skipped with the `--skip_tools` parameter.

In case you want to make this more permanent, it is recommended to specify this in a params file, or even in your own nextflow configuration file. The nextflow configuration file can also be use to customise tool arguments. See official [nexflow](https://www.nextflow.io/docs/latest/config.html) and [nf-core](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) documentation for further details.
The nextflow configuration file can also be use to customise tool arguments.
See official [nexflow](https://www.nextflow.io/docs/latest/config.html) and [nf-core](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) documentation for further details.

### Updating the pipeline

Expand Down Expand Up @@ -170,7 +185,7 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- `apptainer`
- A generic configuration profile to be used with [Apptainer](https://apptainer.org/)
- `wave`
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later).
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow `24.03.0-edge` or later).
- `conda`
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.

Expand Down Expand Up @@ -229,7 +244,3 @@ We recommend adding the following line to your environment to limit this (typica
```bash
NXF_OPTS='-Xms1g -Xmx4g'
```

## Hybrid-selection QC metrics

The pipeline supports hybrid-selection (HS) QC metrics collection . Use `--run_picard_collecthsmetrics true` to run the QC tool [picard CollectHSmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard). This tool is otherwise not run by default.
4 changes: 3 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@ manifest {
affiliation: 'Pixelgen Technologies',
github: 'Aratz',
contribution: ['author'],
orcid: '0000-0003-2702-1103'
],
[
name: 'Alfred Kedhammar',
Expand All @@ -274,7 +275,7 @@ manifest {
name: 'Maxime U Garcia',
affiliation: 'National Genomics Infrastructure',
github: 'maxulysse',
contribution: ['maintainer'],
contribution: ['contributor', 'maintainer'],
orcid: '0000-0003-2827-9261',
],
[
Expand Down Expand Up @@ -307,6 +308,7 @@ manifest {
affiliation: 'National Bioinformatics Infrastructure Sweden',
github: 'mahesh-panchal',
contribution: ['contributor'],
orcid: '0000-0003-1675-0677'
],
[
name: 'Ramprasad Neethiraj',
Expand Down
Loading
Loading