Skip to content

Commit

Permalink
fixed linting
Browse files Browse the repository at this point in the history
  • Loading branch information
JudithBernett committed Nov 15, 2024
1 parent 246d54d commit eed2178
Show file tree
Hide file tree
Showing 15 changed files with 100 additions and 126 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ results/
testing/
testing*
*.pyc
*.idea/
null/
22 changes: 19 additions & 3 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
bump_version: null
lint: null
nf_core_version: 3.0.1
lint:
files_exist:
- conf/igenomes.config
- conf/igenomes_ignored.config
- assets/multiqc_config.yml
- conf/igenomes.config
- conf/igenomes_ignored.config
- assets/multiqc_config.yml
files_unchanged:
- .github/CONTRIBUTING.md
- assets/sendmail_template.txt
- .github/CONTRIBUTING.md
- assets/sendmail_template.txt
multiqc_config: false
nf_core_version: 3.0.2
org_path: null
repository_type: pipeline
template:
Expand All @@ -15,6 +28,9 @@ template:
name: drugresponseeval
org: nf-core
outdir: .
skip_features: null
skip_features:
- igenomes
- multiqc
- fastqc
version: 1.0dev
update: null
19 changes: 5 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,6 @@ DrEval catalog, you can increase your work's exposure, reusability, and transfer

# ![DrEval_pipeline](assets/DrEval_pipeline_simplified.png)

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

1. The response data is loaded
2. All models are trained and evaluated in a cross-validation setting
3. For each CV split, the best hyperparameters are determined using a grid search per model
Expand All @@ -66,8 +56,6 @@ For baseline models, no randomization or robustness tests are performed.
Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run nf-core/drugresponseeval \
-profile <docker/singularity/.../institute> \
Expand Down Expand Up @@ -95,10 +83,13 @@ Berlin).

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->

## Contributions and Support

Contributors to nf-core/drugresponseeval and the drevalpy PyPI package:
- [Judith Bernett](https://github.com/JudithBernett) (TUM)
- [Pascal Iversen](https://github.com/PascalIversen) (FU Berlin)
- [Mario Picciani](https://github.com/picciama) (TUM)

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

For further information or help, don't hesitate to get in touch on the [Slack `#drugresponseeval` channel](https://nfcore.slack.com/channels/drugresponseeval) (you can join with [this invite](https://nf-co.re/join/slack)).
Expand Down
Binary file modified assets/nf-core-drugresponseeval_logo_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 0 additions & 2 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@

process {

// TODO nf-core: Check the defaults for all processes
cpus = { 1 * task.attempt }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
Expand All @@ -24,7 +23,6 @@ process {
// These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
// If possible, it would be nice to keep the same label naming convention when
// adding in your local modules too.
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
cpus = { 1 }
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

//TODO do this for the proper pipeline
// TODO nf-core: do this for the proper pipeline
// Input data
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
Expand Down
Binary file modified docs/images/nf-core-drugresponseeval_logo_dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 15 additions & 28 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,38 +12,25 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [FastQC](#fastqc) - Raw read QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
1. [Load response](#load-response) The response data is loaded
2. All models are trained and evaluated in a cross-validation setting
3. For each CV split, the best hyperparameters are determined using a grid search per model
4. The model is trained on the full training set (train & validation) with the best
hyperparameters to predict the test set
5. If randomization tests are enabled, the model is trained on the full training set with the best
hyperparameters to predict the randomized test set
6. If robustness tests are enabled, the model is trained N times on the full training set with the
best hyperparameters
7. Plots are created summarizing the results
8. [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### FastQC
### Load response
The response data is loaded into the pipeline. This step is necessary to provide the pipeline with the response data that will be used to train and evaluate the models.

<details markdown="1">
<summary>Output files</summary>

- `fastqc/`
- `*_fastqc.html`: FastQC report containing quality metrics.
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.

</details>

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).

### MultiQC

<details markdown="1">
<summary>Output files</summary>

- `multiqc/`
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
- `multiqc_plots/`: directory containing static images from the report in various formats.

</details>
### Train and evaluate models

[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
All models are trained and evaluated in a cross-validation setting. The models are trained on the training set and evaluated on the validation set. The performance of the models is evaluated using various metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.

Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.

### Pipeline information

Expand Down
61 changes: 12 additions & 49 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,65 +8,26 @@

<!-- TODO nf-core: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website. -->

## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.

```bash
--input '[path to samplesheet file]'
```

### Multiple runs of the same sample

The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
```

### Full samplesheet

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,
TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,
```

| Column | Description |
| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |

An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

## Running the pipeline

The typical command for running the pipeline is as follows:

```bash
nextflow run nf-core/drugresponseeval --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker
nextflow run nf-core/drugresponseeval \
-profile <docker/singularity/.../institute> \
--models <model1,model2,...> \
--baselines <baseline1,baseline2,...> \
--dataset_name <dataset_name> \
--path_data <path_data> \
```

This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
This will launch the pipeline with the `docker/singularity/.../institute` configuration profile. See below for more information about profiles.

Note that the pipeline will create the following files in your working directory:

```bash
work # Directory containing the nextflow working files
<OUTDIR> # Finished results in specified location (defined with --outdir)
<OUTDIR> # Finished results in specified location (defined with --outdir), defaults to 'results'
.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```
Expand All @@ -88,9 +49,11 @@ nextflow run nf-core/drugresponseeval -profile docker -params-file params.yaml
with:

```yaml title="params.yaml"
input: './samplesheet.csv'
models: 'ElasticNet'
baselines: 'NaivePredictor,NaiveCellLineMeanPredictor,NaiveDrugMeanPredictor'
outdir: './results/'
genome: 'GRCh37'
dataset_name: 'GDSC2'
path_data: '/path/to/data'
<...>
```

Expand Down
9 changes: 6 additions & 3 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,24 @@
"homePage": "https://github.com/nf-core/drugresponseeval",
"repos": {
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {}
},
"subworkflows": {
"nf-core": {
"utils_nextflow_pipeline": {
"branch": "master",
"git_sha": "9d05360da397692321d377b6102d2fb22507c6ef",
"git_sha": "56372688d8979092cafbe0c5c3895b491166ca1c",
"installed_by": ["subworkflows"]
},
"utils_nfcore_pipeline": {
"branch": "master",
"git_sha": "772684d9d66f37b650c8ba5146ac1ee3ecba2acb",
"git_sha": "1b6b9a3338d011367137808b49b923515080e3ba",
"installed_by": ["subworkflows"]
},
"utils_nfschema_plugin": {
"branch": "master",
"git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c",
"git_sha": "2fd2cd6d0e7b273747f32e465fdc6bcc3ae0814e",
"installed_by": ["subworkflows"]
}
}
Expand Down
12 changes: 12 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,18 @@
"fa_icon": "fas fa-envelope",
"help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.",
"pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$"
},
"input": {
"type": "string",
"format": "file-path",
"exists": true,
"schema": "assets/schema_input.json",
"mimetype": "text/csv",
"pattern": "^\\S+\\.csv$",
"description": "Unnecessary parameter for the pipeline, added to satisfy linting.",
"help_text": "Unnecessary parameter for the pipeline, added to satisfy linting.",
"fa_icon": "fas fa-file-csv",
"hidden": true
}
}
},
Expand Down
30 changes: 16 additions & 14 deletions subworkflows/nf-core/utils_nextflow_pipeline/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions subworkflows/nf-core/utils_nfcore_pipeline/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit eed2178

Please sign in to comment.