Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ena-submission): Create ena samples #2312

Closed
wants to merge 10 commits into from
Closed
37 changes: 37 additions & 0 deletions .github/workflows/ena-submission-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: ena-submission-tests
on:
# test
pull_request:
paths:
- "ena-submission/**"
- ".github/workflows/ena-submission-tests.yml"
push:
branches:
- main
workflow_dispatch:
concurrency:
group: ci-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}-ena-submission-tests
cancel-in-progress: true
jobs:
unitTests:
name: Unit Tests
runs-on: codebuild-loculus-ci-${{ github.run_id }}-${{ github.run_attempt }}
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- name: Set up micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: ena-submission/environment.yml
micromamba-version: 'latest'
init-shell: >-
bash
powershell
cache-environment: true
post-cleanup: 'all'
- name: Run tests
run: |
micromamba activate loculus-ena-submission
python3 scripts/test_ena_submission.py
shell: micromamba-shell {0}
working-directory: ena-submission
2 changes: 1 addition & 1 deletion ena-submission/ENA_submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ We require the following components:

- Analysis: An analysis contains secondary analysis results derived from sequence reads (e.g. a genome assembly).

At the time of writing (October 2023), in contrast to ENA, Pathoplexus has no hierarchy of study/sample/sequence: every sequence is its own study and sample. Therefore we need to figure out how to map sequences to projects, each submitter could have exactly _one_ study pre organism (this is the approach we are currently taking), or each sequence could be associated with its own study.
At the time of writing (October 2023), in contrast to ENA, Pathoplexus has no hierarchy of study/sample/sequence: every sequence is its own study and sample. Thus, each sequence will have to be submitted to ENA as its own study and sample. Alternatively, each submitter could have exactly _one_ study pre organism (this is the approach we are currently taking).

### Mapping sequences and studies

Expand Down
38 changes: 26 additions & 12 deletions ena-submission/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
## ENA Submission
# ENA Submission

### Developing Locally
## Developing Locally

The ENA submission pod creates a new schema in the loculus DB, this is managed by flyway. This means to develop locally you will have to start the postgres DB locally e.g. by using the ../deploy.py script or using
### Database

The ENA submission service creates a new schema in the Loculus Postgres DB, managed by flyway. To develop locally you will have to start the postgres DB locally e.g. by using the `../deploy.py` script or using

```sh
docker run -d \
docker run -d \
--name loculus_postgres \
-e POSTGRES_DB=loculus \
-e POSTGRES_USER=postgres \
Expand All @@ -14,34 +16,46 @@ The ENA submission pod creates a new schema in the loculus DB, this is managed b
postgres:latest
```

In our kubernetes pod we run flyway in a docker container, however when running locally it is best to [download the flyway CLI](https://documentation.red-gate.com/fd/command-line-184127404.html).
### Install and run flyway

You can then run flyway using the
In our kubernetes pod we run flyway in a docker container, however when running locally it is [download the flyway CLI](https://documentation.red-gate.com/fd/command-line-184127404.html) (or `brew install flyway` on macOS).

```
flyway -user=postgres -password=unsecure -url=jdbc:postgresql://127.0.0.1:5432/loculus -schemas=ena-submission -locations=filesystem:./sql migrate
You can then create the schema using the following command:

```sh
flyway -user=postgres -password=unsecure -url=jdbc:postgresql://127.0.0.1:5432/loculus -schemas=ena-submission -locations=filesystem:./flyway/sql migrate
```

If you want to test the docker image locally. It can be built and run using the commands:

```
```sh
docker build -t ena-submission-flyway .
docker run -it -e FLYWAY_URL=jdbc:postgresql://127.0.0.1:5432/loculus -e FLYWAY_USER=postgres -e FLYWAY_PASSWORD=unsecure ena-submission-flyway flyway migrate
```

### Setting up micromamba environment

<details>

<summary> Setting up micromamba </summary>

The rest of the ena-submission pod uses micromamba:

```bash
```sh
brew install micromamba
micromamba shell init --shell zsh --root-prefix=~/micromamba
source ~/.zshrc
```

<details>

Then activate the loculus-ena-submission environment

```bash
micromamba create -f environment.yml --platform osx-64 --rc-file .mambarc
```sh
micromamba create -f environment.yml --rc-file .mambarc
micromamba activate loculus-ena-submission
```

### Running snakemake

Then run snakemake using `snakemake` or `snakemake {rule}`.
44 changes: 40 additions & 4 deletions ena-submission/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,22 @@ with open("results/config.yaml", "w") as f:
f.write(yaml.dump(config))

LOG_LEVEL = config.get("log_level", "INFO")
ORGANISMS = config['organisms'].keys()
ORGANISMS = config["organisms"].keys()


rule submit_all_external_metadata:
input:
expand("results/submitted_{organism}.json", organism=ORGANISMS)

expand("results/submitted_{organism}.json", organism=ORGANISMS),


rule submit_external_metadata:
input:
script="scripts/call_loculus.py",
# Where does the metadata come from?
metadata="results/external_metadata_{organism}.ndjson",
config="results/config.yaml",
output:
submitted="results/submitted_{organism}.json"
submitted="results/submitted_{organism}.json",
params:
log_level=LOG_LEVEL,
shell:
Expand Down Expand Up @@ -63,6 +66,7 @@ rule get_ena_submission_list:
--log-level {params.log_level} \
"""


rule trigger_submission_to_ena:
input:
script="scripts/trigger_submission_to_ena.py",
Expand All @@ -78,6 +82,7 @@ rule trigger_submission_to_ena:
--log-level {params.log_level} \
"""


rule trigger_submission_to_ena_from_file: # for testing
input:
script="scripts/trigger_submission_to_ena.py",
Expand All @@ -93,4 +98,35 @@ rule trigger_submission_to_ena_from_file: # for testing
--config-file {input.config} \
--input-file {input.input_file} \
--log-level {params.log_level} \
"""


rule create_project:
input:
script="scripts/create_project.py",
config="results/config.yaml",
output:
project_created=touch("results/project_created"),
params:
log_level=LOG_LEVEL,
shell:
"""
python {input.script} \
--config-file {input.config} \
--log-level {params.log_level} \
"""

rule create_sample:
input:
script="scripts/create_sample.py",
config="results/config.yaml",
output:
sample_created=touch("results/sample_created"),
params:
log_level=LOG_LEVEL,
shell:
"""
python {input.script} \
--config-file {input.config} \
--log-level {params.log_level} \
"""
2 changes: 2 additions & 0 deletions ena-submission/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ organisms:
- M
- S
taxon_id: 3052518
scientific_name: "Orthonairovirus haemorrhagiae"
organismName: "Crimean-Congo Hemorrhagic Fever Virus"
externalMetadata:
- externalMetadataUpdater: ena
Expand Down Expand Up @@ -78,6 +79,7 @@ organisms:
ebola-sudan:
ingest:
taxon_id: 3052460
scientific_name: "Orthoebolavirus sudanense"
organismName: "Ebola Sudan"
externalMetadata:
- externalMetadataUpdater: ena
Expand Down
50 changes: 49 additions & 1 deletion ena-submission/config/defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,54 @@ username: external_metadata_updater
password: external_metadata_updater
keycloak_client_id: backend-client
ingest_pipeline_submitter: insdc_ingest_user
db_name: Loculus
unique_project_suffix: Loculus
ena_submission_username: fake-user
ena_submission_password: fake-password
ena_submission_url: https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit # TODO(https://github.com/loculus-project/loculus/issues/2425): update in production
github_username: fake_username
github_pat: fake_pat
github_url: https://api.github.com/repos/pathoplexus/ena-submission/contents/test/approved_ena_submission_list.json?ref=main
github_url: https://api.github.com/repos/pathoplexus/ena-submission/contents/test/approved_ena_submission_list.json?ref=main # TODO(https://github.com/loculus-project/loculus/issues/2425): update in production
metadata_mapping:
'subject exposure':
loculus_fields: [exposureEvent]
'type exposure':
loculus_fields: [exposureEvent]
hospitalisation:
loculus_fields: [hostHealthState]
function: match
args: [Hospital]
'illness symptoms':
loculus_fields: [signsAndSymptoms]
'collection date':
loculus_fields: [sampleCollectionDate]
'geographic location (country and/or sea)':
loculus_fields: [geoLocCountry]
'geographic location (region and locality)':
loculus_fields: [geoLocAdmin1]
'sample capture status':
loculus_fields: [purposeOfSampling]
'host disease outcome':
loculus_fields: [hostHealthOutcome]
'host common name':
loculus_fields: [hostNameCommon]
'host age':
loculus_fields: [hostAge]
'host health state':
loculus_fields: [hostHealthState]
'host sex':
loculus_fields: [hostGender]
'host scientific name':
loculus_fields: [hostNameScientific]
'isolate':
loculus_fields: [specimenCollectorSampleId]
'collecting institution':
loculus_fields: [sequencedByOrganization, authorAffiliations]
'receipt date':
loculus_fields: [sampleReceivedDate]
'isolation source host-associated':
loculus_fields: [anatomicalMaterial, anatomicalPart, bodyProduct]
'isolation source non-host-associated':
loculus_fields: [environmentalSite, environmentalMaterial]
'authors':
loculus_fields: [authors]
1 change: 1 addition & 0 deletions ena-submission/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ dependencies:
- unzip
- psycopg2
- slack_sdk
- xmltodict
2 changes: 2 additions & 0 deletions ena-submission/flyway/sql/V1.1__add_center_name.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ALTER TABLE submission_table ADD center_name text;
ALTER TABLE project_table ADD center_name text;
Loading
Loading