Skip to content

Commit

Permalink
Revise handling of COSMIC mutational signatures (#17)
Browse files Browse the repository at this point in the history
  • Loading branch information
brendanreardon authored Nov 22, 2023
1 parent f8c4501 commit 37c7b04
Show file tree
Hide file tree
Showing 37 changed files with 129 additions and 327 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM vanallenlab/almanac:base
FROM vanallenlab/miniconda:3.11

WORKDIR /

Expand Down
11 changes: 1 addition & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Molecular Oncology Almanac is a clinical interpretation algorithm for cancer gen
- Identify overlap between somatic variants observed from both DNA and RNA, or any other source of validation sequencing.
- Identify somatic and germline variants that may be related to microsatellite stability.
- Calculate coding mutational burden and compare your patient to TCGA.
- Calculate contribution of known [COSMIC mutational signatures](https://cancer.sanger.ac.uk/signatures/signatures_v2/) with [deconstructsigs](https://github.com/raerose01/deconstructSigs).
- Identify genomic features that may be related to one another.
- Create portable web-based actionability reports, summarizing clinically relevant findings.

Expand All @@ -19,7 +18,7 @@ You can view additional documentation, including [descriptions of inputs](docs/d
The codebase is available for download through this GitHub repository, [Dockerhub](https://hub.docker.com/r/vanallenlab/moalmanac/), and [Terra](https://portal.firecloud.org/#methods/vanallenlab/moalmanac/2). The method can also be run on Terra, without having to use Terra, by using [our portal](https://portal.moalmanac.org/). **Accessing Molecular Oncology Almanac through GitHub will require building some of the [datasources](moalmanac/datasources/) but they are also contained in the Docker container**.

### Installation
Molecular Oncology Almanac is a Python application using Python 3.11 but also utilizes R to run [deconstructSigs](https://github.com/raerose01/deconstructSigs) as a subprocess. This application, datasources, and all dependencies are packaged on Docker and can be downloaded with the command
Molecular Oncology Almanac is a Python application using Python 3.11. This application, datasources, and all dependencies are packaged on Docker and can be downloaded with the command
```bash
docker pull vanallenlab/moalmanac
```
Expand All @@ -36,14 +35,6 @@ source activate moalmanac
pip install -r requirements.txt
```

You can install [deconstructSigs](https://github.com/raerose01/deconstructSigs) after [installing R](https://www.r-project.org/) with the following commands
```bash
Rscript -e 'install.packages("RCurl", repos = "http://cran.rstudio.com/")' \
&& Rscript -e 'source("http://bioconductor.org/biocLite.R"); biocLite("BSgenome"); biocLite("BSgenome.Hsapiens.UCSC.hg19"); biocLite("GenomeInfoDb")' \
&& Rscript -e 'install.packages("reshape2", repos = "http://cran.rstudio.com/")' \
&& Rscript -e 'install.packages("deconstructSigs", repos = "http://cran.rstudio.com/")'
```

## Usage
Usage documentation can be found within the [moalmanac/](moalmanac) directory of this repository.

Expand Down
23 changes: 0 additions & 23 deletions base-image/Dockerfile

This file was deleted.

10 changes: 0 additions & 10 deletions base-image/README.md

This file was deleted.

20 changes: 19 additions & 1 deletion docs/description-of-inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Example inputs can be found in the [`example_data/`](/example_data/) folder, fou
- [Germline variants](#germline-variants)
- [Somatic variants from validation sequencing](#somatic-variants-from-validation-sequencing)
- [Microsatellite status](#microsatellite-status)
- [Mutational signatures](#mutational-signatures)
- [Purity](#purity)
- [Ploidy](#ploidy)
- [Whole genome doubling](#whole-genome-doubling)
Expand Down Expand Up @@ -124,7 +125,7 @@ This input is looking for an integer value.

The rows associated with _TP53_, _CDKN2A_, and _EGFR_ will be interpreted and scored by Molecular Oncology Almanac while _BRAF_ will be filtered.

### Required files
### Required fields
Required fields can be changed from their default expectations by editing the appropriate section of [colnames.ini](https://github.com/vanallenlab/moalmanac/blob/main/moalmanac/colnames.ini). Column names are **not** case-sensitive.
- `gene`, gene symbol associated with the copy number alteration
- `call`, copy number event of the gene. `Amplification` and `Deletion` are accepted and all other values will be filtered.
Expand Down Expand Up @@ -238,6 +239,23 @@ At least one of the following also must be included:

Microsatellite status is reported in the clinical actionability report.

## Mutational signatures
`--mutational_signatures` anticipates a tab delimited file which contains contributions to Single Base Substitution (SBS) Mutational Signatures from [COSMIC version 3.4](https://cancer.sanger.ac.uk/signatures/sbs/). The file should only contain signature contributions for the tumor sample being studied. We recommend generating SBS mutational signatures with [SigProfilerAssignment](https://github.com/AlexandrovLab/SigProfilerAssignment), and have prepared [a wrapper GitHub repository](https://github.com/vanallenlab/SigProfilerAssignment-wrapper) to run SigProfilerAssignment and format signature contributions as expected.

### Example
| signature | contribution |
|---|--------------|
| SBS1 | 0.03846154 |
| SBS2 | 0 |
| SBS3 | 0.8525641 |
| ... | ... |
| SBS95 | 0 |

### Required fields,
The required fields for this file can be changed from their default expectations by editing the appropriate section of `colnames.ini`. Column names are **not** case sensitive.
- `signature`, labels for each of the 79 SBS mutational signatures included in COSMIC mutational signatures [version 3.4](https://cancer.sanger.ac.uk/signatures/sbs/)
- `contribution`, a float value between 0 and 1 for the row's associated signature weight. This column's values should sum to 1.

## Purity
`--purity` anticipates a float value between 0.0 and 1.0 for the reported tumor purity. This is just used for reporting in the clinical actionability report.

Expand Down
34 changes: 3 additions & 31 deletions docs/description-of-outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,6 @@ All outputs will be produced by Molecular Oncology Almanac, though some may not
* [Integrated summary](#integrated-summary)
* [Microsatellite Instability variants](#microsatellite-instability-variants)
* [Mutational burden](#mutational-burden)
* [Mutational signatures](#mutational-signatures)
* [Trinucleotide context counts](#trinucleotide-context-counts)
* [COSMIC signature (v2) weights](#cosmic-signature-v2-weights)
* [Trinucleotide context counts image](#trinucleotide-context-counts-image)
* [Trinucleotide context normalized counts image](#trinucleotide-context-normalized-counts-image)
* [Preclinical efficacy](#preclinical-efficacy)
* [Profile-to-cell line matchmaking](#profile-to-cell-line-matchmaking)
* [Report](#report)
Expand Down Expand Up @@ -63,7 +58,7 @@ Molecular Oncology Almanac standardizes primary descriptors for molecular featur
* Rearrangements: gene name, Molecular Oncology Almanac will process each partner in the fusion separately
* Microsatellite stability: microsatellite stability status (MSI-High or MSI-Low)
* Mutational burden: High Mutational Burden, if the mutational burden is deemed to be high
* Mutational signatures: the specific COSMIC (v2) mutational signature, formatted as "COSMIC Signature (number)"
* Mutational signatures: the specific COSMIC (v3.4) mutational signature, formatted as "COSMIC Signature (number)"
* Aneuploidy: Whole-genome doubling, this will only be populated if the `--wgd` value is passed to Molecular Oncology Almanac.
* `alteration_type` is a descriptor to provide more granular detail on the molecular event.
* Somatic variants: variant classification of the variant (Missense, Nonsense, etc.)
Expand Down Expand Up @@ -319,31 +314,8 @@ Molecular Oncology Almanac designates high mutational burden under two circumsta
- Mutations per Mb > 10
- At least a mutational burden of 80th percentile of TCGA tumor type, if matched, or TCGA generally, if not matched.

## Mutational signatures
Molecular Oncology Almanac runs [deconstructSigs](https://github.com/raerose01/deconstructSigs) as a subprocess based on the MAF file passed with the input argument `--snv_handle`, performing NMF against the 30 COSMIC v2 signatures.

### Trinucleotide context counts
Filename suffix: `.sigs.context.txt`

Trinucleotide context counts of observed somatic variants for all 96 bins are listed in this tab delimited file.

### COSMIC signature (v2) weights
Filename suffix: `.sigs.cosmic.txt`

Weights for the 30 COSMIC (v2) mutational signatures are listed in this tab delimited file. Thresholds for a signature to be considered present or not present by Molecular Oncology Almanac are specified in [config.ini](/moalmanac/config.ini) under the `[signatures]` heading.

### Trinucleotide context counts image
Filename suffix: `.sigs.tricontext.counts.png`

Trinucleotide context raw counts of observed somatic variants for all 96 bins are visualized in this png file.

### Trinucleotide context normalized counts image
Filename suffix: `.sigs.tricontext.normalized.png`

Trinucleotide context normalized counts of observed somatic variants for all 96 bins are visualized in this png file.

## Preclinical efficacy
Filename suffix: `.preclinical.efficacy.txt`
Filename suffix: `.preclinical_efficacy.txt`

Therapies listed in [actionable](#actionable) that have been evaluated on cancer cell lines through the Sanger Institute's GDSC are evaluated for efficacy in the presence and absence of the associated molecular feature. This is performed for relationships associated with therapeutic sensitivity. Columns include:
- `patient_id` (str) - the string associated with the given molecular profile (`--patient_id`)
Expand Down Expand Up @@ -396,7 +368,7 @@ Additional equivalent within a provided ontology or stronger matches from anothe

For molecular features associated with therapeutic sensitivity that have a therapy evaluated on cancer cell lines, a button `[Preclinical evidence]` will appear below the therapy and rationale which will open a modal to compare the sensitivity to the therapy of interest between mutant and wild type cell lines.

Molecular features which are biologically relevant are listed without clinical association. Molecular features will appear here if the associated gene is catalogued in the Molecular Oncology Almanac but under a different feature type, variants are associated with microsatellite stability, and all present COSMIC version 2 mutational signatures not associated with a clinical assertion are reported.
Molecular features which are biologically relevant are listed without clinical association. Molecular features will appear here if the associated gene is catalogued in the Molecular Oncology Almanac but under a different feature type, variants are associated with microsatellite stability, and all present COSMIC v3.4 mutational signatures not associated with a clinical assertion are reported.

The last section of the report, comparison of molecular profile to cancer cell lines, displays results from Molecular Oncology Almanac's patient-to-cell line matchmaking module. **This will not appear in the report if `--disable_matchmaking` is passed as an argument**. The 5 most similar cancer cell lines to the provided profile are listed each listing the cell line name, sensitive therapies from GDSC, and clinically relevant features present. Users can click `[More details]` under each cell line's name for more details about a given cell line: aliases, sensitive therapies, clinically relevant molecular features, all somatic variants, copy number alterations, and fusions occuring in cancer gene census genes, and the 10 most sensitive therapies to the cancer cell line.

Expand Down
Loading

0 comments on commit 37c7b04

Please sign in to comment.