Skip to content

Commit

Permalink
arc42 ena deposition risk and technical debt
Browse files Browse the repository at this point in the history
  • Loading branch information
fengelniederhammer committed Nov 6, 2024
1 parent 47ab988 commit 3f8b8cb
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 1 deletion.
16 changes: 16 additions & 0 deletions architecture_docs/11_risks_and_technical_debt.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,19 @@ It is also (as of now) mostly undocumented.
Some parts of the configuration are redundant and could be simplified.
Also, the Helm chart contains a lot of default values
that are not suitable for general Loculus instances and will result in unexpected behavior if not overwritten.

## ENA Deposition

The ENA deposition service was written to sync sequences that have been uploaded to Loculus back to INSDC (ENA in this case).
When there is an ingest service running for the same organism,
then there is the risk of uploading the sequences to ENA that have been previously downloaded from NCBI.
There is also a risk of uploading the same sequence twice (e.g. once the original version, once a revised version).

To prevent this, the ENA deposition and the ingest service were given direct database access:
1. The ENA deposition service accesses a separate schema in the same database as the backend,
where it duplicates the data from the backend to keep track of which sequences have already been uploaded.
2. The ingest service accesses the same schema as the deposition to check which ingested sequences have been uploaded by Loculus.

A solution to the first problem would be to adapt the backend such that it can track which sequences have been uploaded to ENA.
A solution to the second problem could be merging the ENA deposition and the ingest service into a single service.
Both services should not need direct database access.
2 changes: 2 additions & 0 deletions architecture_docs/plantuml/05_level_2_backend.puml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,11 @@ backend --> db
prepro --> backend : Fetch unprocessed data /\nSubmit processed data

ingest -up-> backend
ingest -up-> db
ingest --> ncbi : Download data

deposition -up-> backend
deposition -up-> db
deposition --> ena : Upload data

@enduml
Loading

0 comments on commit 3f8b8cb

Please sign in to comment.