Skip to content

Commit

Permalink
Merge branch 'main' into do_clump_step
Browse files Browse the repository at this point in the history
  • Loading branch information
d0choa authored Nov 23, 2023
2 parents b4483c0 + 24addab commit 4ef0cd2
Show file tree
Hide file tree
Showing 31 changed files with 641 additions and 222 deletions.
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ repos:
- id: python-check-blanket-noqa

- repo: https://github.com/hadialqattan/pycln
rev: v2.3.0
rev: v2.4.0
hooks:
- id: pycln
args: [--all]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.5
rev: v0.1.6
hooks:
- id: ruff

Expand All @@ -50,7 +50,7 @@ repos:
- id: black

- repo: https://github.com/alessandrojcm/commitlint-pre-commit-hook
rev: v9.8.0
rev: v9.10.0
hooks:
- id: commitlint
additional_dependencies: ["@commitlint/config-conventional"]
Expand Down
7 changes: 6 additions & 1 deletion config/datasets/gcp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@ catalog_associations: ${datasets.inputs}/v2d/gwas_catalog_v1.0.2-associations_e1
catalog_studies: ${datasets.inputs}/v2d/gwas-catalog-v1.0.3-studies-r2023-09-11.tsv
catalog_ancestries: ${datasets.inputs}/v2d/gwas-catalog-v1.0.3-ancestries-r2023-09-11.tsv
catalog_sumstats_lut: ${datasets.inputs}/v2d/harmonised_list-r2023-09-11.txt
finngen_phenotype_table_url: https://r9.finngen.fi/api/phenos
ukbiobank_manifest: gs://genetics-portal-input/ukb_phenotypes/neale2_saige_study_manifest.190430.tsv
l2g_gold_standard_curation: ${datasets.inputs}/l2g/gold_standard/curation.json
gene_interactions: ${datasets.inputs}/l2g/interaction # 23.09 data
finngen_phenotype_table_url: https://r9.finngen.fi/api/phenos
eqtl_catalogue_paths_imported: ${datasets.inputs}/preprocess/eqtl_catalogue/tabix_ftp_paths_imported.tsv

# Output datasets
gene_index: ${datasets.outputs}/gene_index
Expand All @@ -27,6 +28,7 @@ variant_index: ${datasets.outputs}/variant_index
study_locus: ${datasets.outputs}/study_locus
study_index: ${datasets.outputs}/study_index
summary_statistics: ${datasets.outputs}/summary_statistics
credible_set: ${datasets.outputs}/credible_set
study_locus_overlap: ${datasets.outputs}/study_locus_overlap
colocalisation: ${datasets.outputs}/colocalisation
v2g: ${datasets.outputs}/v2g
Expand All @@ -37,8 +39,11 @@ finngen_study_index: ${datasets.study_index}/finngen
finngen_summary_stats: ${datasets.summary_statistics}/finngen
from_sumstats_study_locus: ${datasets.study_locus}/from_sumstats
ukbiobank_study_index: ${datasets.study_index}/ukbiobank
from_sumstats_pics: ${datasets.credible_set}/from_sumstats
l2g_model: ${datasets.outputs}/l2g_model
l2g_predictions: ${datasets.outputs}/l2g_predictions
eqtl_catalogue_study_index_out: ${datasets.outputs}/preprocess/eqtl_catalogue/study_index
eqtl_catalogue_summary_stats_out: ${datasets.outputs}/preprocess/eqtl_catalogue/summary_stats

# Constants
finngen_release_prefix: FINNGEN_R9
Expand Down
4 changes: 4 additions & 0 deletions config/step/eqtl_catalogue.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
_target_: otg.eqtl_catalogue.EqtlCatalogueStep
eqtl_catalogue_paths_imported: ${datasets.eqtl_catalogue_paths_imported}
eqtl_catalogue_study_index_out: ${datasets.eqtl_catalogue_study_index_out}
eqtl_catalogue_summary_stats_out: ${datasets.eqtl_catalogue_summary_stats_out}
3 changes: 3 additions & 0 deletions config/step/pics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
_target_: otg.pics.PICSStep
study_locus_ld_annotated_in: ${datasets.from_sumstats_study_locus}
picsed_study_locus_out: ${datasets.from_sumstats_pics}
23 changes: 11 additions & 12 deletions docs/development/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,25 +29,23 @@ All pipelines in this repository are intended to be run in Google Dataproc. Runn

In order to run the code:

1. Manually edit your local `workflow/dag.yaml` file and comment out the steps you do not want to run.
1. Manually edit your local `src/airflow/dags/*` file and comment out the steps you do not want to run.

2. Manually edit your local `pyproject.toml` file and modify the version of the code.
- This must be different from the version used by any other people working on the repository to avoid any deployment conflicts, so it's a good idea to use your name, for example: `1.2.3+jdoe`.
- You can also add a brief branch description, for example: `1.2.3+jdoe.myfeature`.
- Note that the version must comply with [PEP440 conventions](https://peps.python.org/pep-0440/#normalization), otherwise Poetry will not allow it to be deployed.
- Do not use underscores or hyphens in your version name. When building the WHL file, they will be automatically converted to dots, which means the file name will no longer match the version and the build will fail. Use dots instead.

3. Run `make build`.
3. Manually edit your local `src/airflow/dags/common_airflow.py` and set `OTG_VERSION` to the same version as you did in the previous step.

4. Run `make build`.
- This will create a bundle containing the neccessary code, configuration and dependencies to run the ETL pipeline, and then upload this bundle to Google Cloud.
- A version specific subpath is used, so uploading the code will not affect any branches but your own.
- If there was already a code bundle uploaded with the same version number, it will be replaced.

4. Submit the Dataproc job with `poetry run python workflow/workflow_template.py`
- You will need to specify additional parameters, some are mandatory and some are optional. Run with `--help` to see usage.
- The script will provision the cluster and submit the job.
- The cluster will take a few minutes to get provisioned and running, during which the script will not output anything, this is normal.
- Once submitted, you can monitor the progress of your job on this page: https://console.cloud.google.com/dataproc/jobs?project=open-targets-genetics-dev.
- On completion (whether successful or a failure), the cluster will be automatically removed, so you don't have to worry about shutting it down to avoid incurring charges.
5. Open Airflow UI and run the DAG.


## Contributing checklist
When making changes, and especially when implementing a new module or feature, it's essential to ensure that all relevant sections of the code base are modified.
Expand All @@ -57,19 +55,20 @@ When making changes, and especially when implementing a new module or feature, i
- [ ] Update the documentation and check it with `make build-documentation`. This will start a local server to browse it (URL will be printed, usually `http://127.0.0.1:8000/`)

For more details on each of these steps, see the sections below.

### Documentation
* If during development you had a question which wasn't covered in the documentation, and someone explained it to you, add it to the documentation. The same applies if you encountered any instructions in the documentation which were obsolete or incorrect.
* Documentation autogeneration expressions start with `:::`. They will automatically generate sections of the documentation based on class and method docstrings. Be sure to update them for:
+ Dataset definitions in `docs/reference/dataset` (example: `docs/reference/dataset/study_index/study_index_finngen.md`)
+ Step definition in `docs/reference/step` (example: `docs/reference/step/finngen.md`)
+ Dataset definitions in `docs/python_api/datasource/STEP` (example: `docs/python_api/datasource/finngen/study_index.md`)
+ Step definition in `docs/python_api/step/STEP.md` (example: `docs/python_api/step/finngen.md`)

### Configuration
* Input and output paths in `config/datasets/gcp.yaml`
* Step configuration in `config/step/STEP.yaml` (example: `config/step/finngen.yaml`)

### Classes
* Dataset class in `src/org/dataset/` (example: `src/otg/dataset/study_index.py``StudyIndexFinnGen`)
* Step main running class in `src/org/STEP.py` (example: `src/org/finngen.py`)
* Dataset class in `src/otg/datasource/STEP` (example: `src/otg/datasource/finngen/study_index.py``FinnGenStudyIndex`)
* Step main running class in `src/otg/STEP.py` (example: `src/otg/finngen.py`)

### Tests
* Test study fixture in `tests/conftest.py` (example: `mock_study_index_finngen` in that module)
Expand Down
4 changes: 4 additions & 0 deletions docs/python_api/datasource/eqtl_catalogue/study_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: Study Index
---
::: otg.datasource.eqtl_catalogue.study_index.EqtlCatalogueStudyIndex
4 changes: 4 additions & 0 deletions docs/python_api/datasource/eqtl_catalogue/summary_stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: Summary Stats
---
::: otg.datasource.eqtl_catalogue.summary_stats.EqtlCatalogueSummaryStats
4 changes: 4 additions & 0 deletions docs/python_api/step/eqtl_catalogue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: eQTL Catalogue
---
::: otg.eqtl_catalogue.EqtlCatalogueStep
4 changes: 4 additions & 0 deletions docs/python_api/step/pics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: PICS
---
::: otg.pics.PICSStep
Loading

0 comments on commit 4ef0cd2

Please sign in to comment.