Skip to content

Commit

Permalink
Merge pull request #4 from opentargets/il-multiple-testing
Browse files Browse the repository at this point in the history
P value multiple correction, permutation testing, and reason/phase association
  • Loading branch information
ireneisdoomed authored Sep 13, 2023
2 parents 2ccce1a + 8a93040 commit 68bdd2a
Show file tree
Hide file tree
Showing 7 changed files with 723 additions and 3 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,8 @@ derby.log
.vscode/dryrun.log
.vscode/targets.log
.vscode/configurationCache.log
temp
temp
data/baseline_predictions_aggregations
data/predictions_aggregations_*
data/chembl*
reports/*
16 changes: 15 additions & 1 deletion analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This pipeline consists of the following steps:
- Enrichment analysis.
To launch the script as a Pyspark job on a cluster:
```bash
gcloud dataproc clusters create il-big-stop-reasons \
gcloud dataproc clusters create il-big-stop-reasons-2 \
--image-version=2.1 \
--project=open-targets-eu-dev \
--region=europe-west1 \
Expand All @@ -24,5 +24,19 @@ gcloud dataproc jobs submit pyspark \
--region=europe-west1 \
--project=open-targets-eu-dev \
analysis/python/enrichments.py -- config.yml --stratify-therapeutic-area non_oncology

# To determine a baseline between all sets of comparisons, run the analysis with a randomly generated set of associations
gcloud dataproc jobs submit pyspark \
--cluster=il-big-stop-reasons \
--files=config.yml \
--region=europe-west1 \
--project=open-targets-eu-dev

# Adjust the p values for multiple testing using the Benjamini-Hochberg procedure by providing the path to the file with the results of the enrichment analysis
gcloud dataproc jobs submit pyspark \
--cluster=il-big-stop-reasons \
--region=europe-west1 \
--project=open-targets-eu-dev \
analysis/python/multiple_testing_correction.py -- "gs://ot-team/irene/stop_reasons/predictions_aggregations_non_oncology"
```
- Visualization of the results.
1 change: 0 additions & 1 deletion analysis/python/enrichments.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from enum import Enum
from functools import reduce

import pandas as pd
import pyspark.sql.functions as F
import typer
from omegaconf import OmegaConf
Expand Down
Loading

0 comments on commit 68bdd2a

Please sign in to comment.