From ebcb2676611097c5256707508555833a79b95b32 Mon Sep 17 00:00:00 2001
From: Stef Piatek <s.piatek@ucl.ac.uk>
Date: Thu, 12 Dec 2024 14:45:42 +0000
Subject: [PATCH 1/4] Start documenting design choices

---
 .../0001-multiservice-architecture.md         | 28 ++++++++
 .../0002-message-processing.md                | 47 ++++++++++++
 .../decision-record/0003-dicom-processing.md  | 44 ++++++++++++
 .../0004-multiple-project-configuration.md    | 65 +++++++++++++++++
 .../0005-project-based-study-routing.md       | 50 +++++++++++++
 docs/design/decision-record/index.md          | 39 ++++++++++
 docs/design/decision-record/template.md       | 72 +++++++++++++++++++
 7 files changed, 345 insertions(+)
 create mode 100644 docs/design/decision-record/0001-multiservice-architecture.md
 create mode 100644 docs/design/decision-record/0002-message-processing.md
 create mode 100644 docs/design/decision-record/0003-dicom-processing.md
 create mode 100644 docs/design/decision-record/0004-multiple-project-configuration.md
 create mode 100644 docs/design/decision-record/0005-project-based-study-routing.md
 create mode 100644 docs/design/decision-record/index.md
 create mode 100644 docs/design/decision-record/template.md

diff --git a/docs/design/decision-record/0001-multiservice-architecture.md b/docs/design/decision-record/0001-multiservice-architecture.md
new file mode 100644
index 000000000..c879f8153
--- /dev/null
+++ b/docs/design/decision-record/0001-multiservice-architecture.md
@@ -0,0 +1,28 @@
+# Multi-service Architecture
+
+* Status: accepted
+* Deciders: Original PIXL team
+* Date: 2023-11-01 (retrospectively)
+
+## Context and Problem Statement
+
+We want a software solution that has distinct functionality. How do we structure those services.
+
+## Decision Drivers <!-- optional -->
+
+* Will need to allow for concurrent processing of tasks
+* Lingua franca of the team is python
+
+## Considered Options
+
+* Multi-service architecture
+* Monolith architecture
+
+## Decision Outcome
+
+Chosen option: "Multi-service architecture", because it allows us to:
+
+- Break up our code into logical packages and services
+- Get around a single python process being blocked by the global interpreter lock, as each service runs in its own process
+- Forces us to consider where code should go, and restrict services from using other services code
+- Works well with UCLH's restriction to deploy services using docker and docker compose, open to extend into kubernetes should we want that
diff --git a/docs/design/decision-record/0002-message-processing.md b/docs/design/decision-record/0002-message-processing.md
new file mode 100644
index 000000000..03601aaa5
--- /dev/null
+++ b/docs/design/decision-record/0002-message-processing.md
@@ -0,0 +1,47 @@
+# Message-based Processing of Images
+
+* Status: accepted
+* Deciders: Original PIXL team, most recent changes: Stef Piatek & Paul Smith
+* Date: 2024-12-12
+
+## Context and Problem Statement
+
+- We need a way to buffer the messages awaiting processing. 
+  We expect hundreds to tens of thousands of imaging studies to be requested per project, 
+  and want to find each study in the source systems individually. 
+
+
+## Decision Drivers <!-- optional -->
+
+- Be able to process multiple research projects at the same time
+- Should be persistent if services are taken down
+- Allow for a secondary DICOM source to be used if study isn't found in the primary
+- Limit the total number of images that are being processed for a given source system
+- Studies that have already been successfully exported for a project should not get processed again
+
+## Considered Options
+
+* Use of database alone to schedule run, with the CLI driving a long-running job to process all studies in research project
+* Use of queues to buffer requests that the `imaging-api` processes, database to track the status of exported studies
+
+## Decision Outcome
+
+Chosen option: `Queue for buffering requests, database to track status of export`, 
+because it fulfills all requirements and allows us to invest in use of generic technologies.
+
+## Pros and Cons of the Options <!-- optional -->
+
+### Database alone
+
+
+* Good, simple to set up and use. Single solution for buffering requests and tracking status
+* Bad, bespoke solution where we could use a technology that can be transferrable to other situations 
+* Bad, complex setup to limit total queries per source system
+
+### Queue with database for status
+
+
+* Good, fulfills all requirements fairly easily, creating a queue for primary and another for secondary imaging sources
+* Good, because we have previously invested in rabbitmq as a technology
+* Bad, extra services to manage and extra development
+* Bad, because original implementation was broken and required effort to fix. Though we learned more about the libraries we're using.
diff --git a/docs/design/decision-record/0003-dicom-processing.md b/docs/design/decision-record/0003-dicom-processing.md
new file mode 100644
index 000000000..b70bcd71c
--- /dev/null
+++ b/docs/design/decision-record/0003-dicom-processing.md
@@ -0,0 +1,44 @@
+# DICOM server and processing
+
+* Status: accepted
+* Deciders: Original PIXL team
+* Date: 2023-11-01 (retrospectively)
+
+## Context and Problem Statement
+
+We need to have a DICOM server to query DICOM images, store them, anonymise them and export them.
+
+
+## Decision Drivers <!-- optional -->
+
+* Will need to have a robust DICOM server that has been in use
+* Keep original studies in a cache locally to reduce use of clinical imaging systems if failures in anonymisation or export
+* The team's lingua franca is Python
+* Per-project anonymisation profiles and custom hashing of fields will require plugins to be written for anonymisation
+* UCLH infrastructure allows for running docker, but we don't have admin accounts and cannot install software directly onto the machines.
+
+## Considered Options
+
+* XNAT server
+* Orthanc server
+
+## Decision Outcome
+
+Chosen option: `Orthanc`, 
+because it's relatively lightweight, under development and allows for python-based extensions to be written. 
+
+## Pros and Cons of the Options <!-- optional -->
+
+### XNAT
+
+* Good, ARC has a history of using this in the medical imaging subgroup.
+* Good, widely regarded
+* Bad, heavyweight and has many more features than we need to use. May take longer to learn and deploy
+* Bad, does not allow for python-based plugins to be used for anonymisation without getting into running docker in docker
+
+### Orthanc
+
+* Good, has been battle tested
+* Good, has DICOMWeb plugin to allow for export via that modality
+* Good, allows for python-based plugins and running in docker
+* Bad, no previous usage within ARC. Will be teething problems
diff --git a/docs/design/decision-record/0004-multiple-project-configuration.md b/docs/design/decision-record/0004-multiple-project-configuration.md
new file mode 100644
index 000000000..5c4f2647e
--- /dev/null
+++ b/docs/design/decision-record/0004-multiple-project-configuration.md
@@ -0,0 +1,65 @@
+# Multiple-project configuration
+
+* Status: accepted
+* Deciders: Milan Malfait, Peter Tsrunchev, Jeremy Stein,  Stef Piatek
+* Date: 2024-03-05
+
+Technical Story: [PIXL can take multiple projects](https://github.com/SAFEHR-data/PIXL/issues/330)
+
+## Context and Problem Statement
+
+Each project should be able to define its own anonymisation profile and export destinations.
+
+## Decision Drivers <!-- optional -->
+
+* When hashing a given field for an imaging study for two different projects, each project's hash should be different.
+  This it to avoid inappropriate linkage of data between research projects. 
+* Secure storage of secrets, especially for connection to destinations and per-project hashing salts
+
+## Considered Options
+
+* file-based: env files for docker or structured files for secrets 
+* self-hosted secret storage service
+* azure keyvault
+
+## Decision Outcome
+
+Chosen option: "azure keyvault", because its low hassle and gives us good control of secret access.
+
+### Positive Consequences <!-- optional -->
+
+* Single place for secrets, which multiple deployments can access
+* Can have secrets which expire, so even if compromised we limit the amount of secret leakage possible
+* Per-project salts stored in a separate keyvault than the export endpoints
+* Only need to define the destination type, with the keyvalut defining all the connection details
+
+### Negative Consequences <!-- optional -->
+
+* Requires other implementations to set up their own azure storage accounts or develop new secret management
+* Developers also have to update a `.env` file for running the system test
+* Slight increase in cost, can be slightly offset by caching credentials 
+
+## Pros and Cons of the Options <!-- optional -->
+
+### File-based
+
+* Good, simple to do
+* Bad, will keep on expanding as time goes on which can be a pain to maintain
+* Bad, no access control beyond unix permissions
+
+### Self-hosted secret storage
+
+* Good, fine-grained access control possible
+* Good, free in terms of upfront cost
+* Bad, another service to maintain - residual costs
+
+### Azure keyvault
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, fine-grained access control possible
+* Bad, slight increase in cost
+
+## Links <!-- optional -->
+
+* Routing of studies based on projects in [ADR-0005](0005-project-based-study-routing.md) 
diff --git a/docs/design/decision-record/0005-project-based-study-routing.md b/docs/design/decision-record/0005-project-based-study-routing.md
new file mode 100644
index 000000000..3b2686ead
--- /dev/null
+++ b/docs/design/decision-record/0005-project-based-study-routing.md
@@ -0,0 +1,50 @@
+# Project-based study routing
+
+* Status: accepted
+* Deciders: Stef Piatek, Paul Smith
+* Date: 2024-11-27
+
+Technical Story: [PIXL can take multiple projects](https://github.com/SAFEHR-data/PIXL/issues/330)
+
+## Context and Problem Statement
+
+Each study sent to `orthanc-anon` needs to be de-identified using the project-specific configuration. 
+We need a way to pass along this information along with the DICOM file.
+
+## Considered Options
+
+* DICOM tag: Adding a custom DICOM tag for the project name
+* custom REST API endpoint: Creating a custom REST API endpoint for `orthanc-anon` to pull the data from `orthanc-raw`
+
+## Decision Outcome
+
+Chosen option: "custom REST API endpoint", because the project tag updating was causing `orthanc-raw` to crash,
+and we were no longer using study stability to control export.
+
+## Pros and Cons of the Options <!-- optional -->
+
+### DICOM tag
+
+Add private creator group to instances as they arrive, and a dummy value in the custom tag. 
+Once the study has been pulled from the DICOM source, update the tag with the filename stem of the project config.
+
+* Good, because you can see which study was most recently exported
+* Bad, `orthanc-raw` started crashing when updating the DICOM tag via the orthanc API on large studies
+* Bad, because we were having to update the DICOM tag without changing the instance UIDs. 
+  So that we can check for missing instances for studies which already exist in `orthanc-raw`
+* Bad, studies appeared where instances didn't have their custom DICOM tag updates
+* Bad, could have a race condition where the same study is trying to be exported by two projects
+
+### Custom REST API endpoint
+
+Once the study has been pulled from the DICOM source, send a REST request to `orthanc-anon` with the study UID and project.
+`orthanc-anon` then adds this job to a threadpool and returns a 200 status to the client. 
+
+* Good, keeping original instances unaltered
+* Good, thread-pooling allows for faster de-identification
+* Good, simpler flow
+* Bad, we can't alter the queue of de-identification jobs short of taking down `orthanc-anon`
+
+## Links <!-- optional -->
+
+* Related to multiple project configuration [ADR-0004](0004-multiple-project-configuration.md)
diff --git a/docs/design/decision-record/index.md b/docs/design/decision-record/index.md
new file mode 100644
index 000000000..57caedd91
--- /dev/null
+++ b/docs/design/decision-record/index.md
@@ -0,0 +1,39 @@
+# Architectural Decision Log
+
+This log lists the architectural decisions for PIXL.
+
+Using the [Markdown Architectural Decision Records](https://adr.github.io/madr/)
+
+Regenerate the content by using `adr-log -i` from this directory. 
+You can install it via `npm install -g adr-log`, removing the template
+
+<!-- adrlog -->
+
+* [ADR-0001](0001-multiservice-architecture.md) - Multi-service Architecture
+* [ADR-0002](0002-message-processing.md) - Message-based Processing of Images
+* [ADR-0003](0003-dicom-processing.md) - DICOM server and processing
+* [ADR-0004](0004-multiple-project-configuration.md) - Multiple-project configuration
+* [ADR-0005](0005-project-based-study-routing.md) - Project-based study routing
+
+<!-- adrlogstop -->
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/design/decision-record/template.md b/docs/design/decision-record/template.md
new file mode 100644
index 000000000..b779044d8
--- /dev/null
+++ b/docs/design/decision-record/template.md
@@ -0,0 +1,72 @@
+# [Template: short title of solved problem and solution]
+
+* Status: [proposed | rejected | accepted | deprecated | … | superseded by [ADR-0105](0105-example.md)] <!-- optional -->
+* Deciders: [list everyone involved in the decision] <!-- optional -->
+* Date: [YYYY-MM-DD when the decision was last updated] <!-- optional -->
+
+Technical Story: [description | ticket/issue URL] <!-- optional -->
+
+## Context and Problem Statement
+
+[Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.]
+
+## Decision Drivers <!-- optional -->
+
+* [driver 1, e.g., a force, facing concern, …]
+* [driver 2, e.g., a force, facing concern, …]
+* … <!-- numbers of drivers can vary -->
+
+## Considered Options
+
+* [option 1]
+* [option 2]
+* [option 3]
+* … <!-- numbers of options can vary -->
+
+## Decision Outcome
+
+Chosen option: "[option 1]", because [justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force force | … | comes out best (see below)].
+
+### Positive Consequences <!-- optional -->
+
+* [e.g., improvement of quality attribute satisfaction, follow-up decisions required, …]
+* …
+
+### Negative Consequences <!-- optional -->
+
+* [e.g., compromising quality attribute, follow-up decisions required, …]
+* …
+
+## Pros and Cons of the Options <!-- optional -->
+
+### [option 1]
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, because [argument a]
+* Good, because [argument b]
+* Bad, because [argument c]
+* … <!-- numbers of pros and cons can vary -->
+
+### [option 2]
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, because [argument a]
+* Good, because [argument b]
+* Bad, because [argument c]
+* … <!-- numbers of pros and cons can vary -->
+
+### [option 3]
+
+[example | description | pointer to more information | …] <!-- optional -->
+
+* Good, because [argument a]
+* Good, because [argument b]
+* Bad, because [argument c]
+* … <!-- numbers of pros and cons can vary -->
+
+## Links <!-- optional -->
+
+* [Link type] [Link to ADR] <!-- example: Refined by [ADR-0005](0005-example.md) -->
+* … <!-- numbers of links can vary -->

From b60dbbe6456bc909485837e093cd735e776df737 Mon Sep 17 00:00:00 2001
From: Stef Piatek <s.piatek@ucl.ac.uk>
Date: Thu, 12 Dec 2024 14:50:34 +0000
Subject: [PATCH 2/4] Be explicit about secrets for multiple projects

---
 .../decision-record/0004-multiple-project-configuration.md   | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/design/decision-record/0004-multiple-project-configuration.md b/docs/design/decision-record/0004-multiple-project-configuration.md
index 5c4f2647e..5b7510e7a 100644
--- a/docs/design/decision-record/0004-multiple-project-configuration.md
+++ b/docs/design/decision-record/0004-multiple-project-configuration.md
@@ -8,7 +8,10 @@ Technical Story: [PIXL can take multiple projects](https://github.com/SAFEHR-dat
 
 ## Context and Problem Statement
 
-Each project should be able to define its own anonymisation profile and export destinations.
+Each project should be able to define its own anonymisation profile and export destinations. 
+How can we store the secrets.
+
+![pixl-multi-project-config.png](../diagrams/pixl-multi-project-config.png)
 
 ## Decision Drivers <!-- optional -->
 

From c6cc969b51f7ab76e99a7488cac0493d27787584 Mon Sep 17 00:00:00 2001
From: Stef Piatek <s.piatek@ucl.ac.uk>
Date: Thu, 12 Dec 2024 14:50:47 +0000
Subject: [PATCH 3/4] Add design choices index

---
 index.md | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 index.md

diff --git a/index.md b/index.md
new file mode 100644
index 000000000..badac46d3
--- /dev/null
+++ b/index.md
@@ -0,0 +1,43 @@
+
+
+<!-- adrlog -->
+
+* [ADR-1](cli/README.md) - PIXL Driver + Command line interface
+* [ADR-2](cli/tests/README.md) - 
+* [ADR-3](CODE_OF_CONDUCT.md) - Contributor Covenant Code of Conduct
+* [ADR-4](CONTRIBUTING.md) - Contributing to `PIXL`.
+* [ADR-5](docs/design/bigger_picture.md) - The bigger picture
+* [ADR-0001](docs/design/decision-record/0001-multiservice-architecture.md) - Multi-service Architecture
+* [ADR-0002](docs/design/decision-record/0002-message-processing.md) - Message-based Processing of Images
+* [ADR-0003](docs/design/decision-record/0003-dicom-processing.md) - DICOM server and processing
+* [ADR-0004](docs/design/decision-record/0004-secrets-storage.md) - Secrets Storage
+* [ADR-6](docs/design/decision-record/index.md) - Architectural Decision Log
+* [ADR-7](docs/design/decision-record/template.md) - [short title of solved problem and solution]
+* [ADR-8](docs/file_types/parquet_files.md) - Parquet files you might encounter throughout PIXL
+* [ADR-9](docs/services/ftp-server.md) - FTPS server
+* [ADR-10](docs/services/pixl_database.md) - The PIXL database
+* [ADR-11](docs/setup/azure-keyvault.md) - Azure Keyvault setup
+* [ADR-12](docs/setup/developer.md) - Developer setup
+* [ADR-13](docs/setup/uclh-infrastructure-setup.md) - UCLH Infrastructure setup instructions
+* [ADR-14](hasher/README.md) - Hasher API
+* [ADR-15](orthanc/orthanc-anon/docs/DicomServiceViaAAD.md) - Retrieving an access token for the DICOM service using and Azure AD application
+* [ADR-16](orthanc/orthanc-anon/README.md) - Orthanc Anon
+* [ADR-17](orthanc/orthanc-raw/README.md) - Orthanc Raw
+* [ADR-18](orthanc/README.md) - ORTHANC instances
+* [ADR-19](pixl_core/README.md) - Core
+* [ADR-20](pixl_dcmd/README.md) - PIXL DICOM de-identifier
+* [ADR-21](pixl_export/README.md) - PIXL Export API
+* [ADR-22](pixl_imaging/alembic/README.md) - Alembic configuration
+* [ADR-23](pixl_imaging/README.md) - PIXL Imaging API
+* [ADR-24](postgres/README.md) - PIXL pipeline database
+* [ADR-25](pytest-pixl/README.md) - PIXL pytest plugin
+* [ADR-26](README.md) - [pixl-ci](https://github.com/SAFEHR-data/PIXL/actions/workflows/main.yml/badge.svg)](https://github.com/SAFEHR-data/PIXL/actions/workflows/main.yml)
+* [ADR-27](test/README.md) - PIXL System Tests
+
+<!-- adrlogstop -->
+
+
+
+
+
+

From c5c8046f16411a45a024612e4aca410d9cea09ce Mon Sep 17 00:00:00 2001
From: Stef Piatek <s.piatek@ucl.ac.uk>
Date: Mon, 23 Dec 2024 10:56:37 +0000
Subject: [PATCH 4/4] Add data export decision

---
 .../decision-record/0006-data-export.md       | 45 +++++++++++++++++++
 docs/design/decision-record/index.md          | 22 +--------
 2 files changed, 46 insertions(+), 21 deletions(-)
 create mode 100644 docs/design/decision-record/0006-data-export.md

diff --git a/docs/design/decision-record/0006-data-export.md b/docs/design/decision-record/0006-data-export.md
new file mode 100644
index 000000000..127714c01
--- /dev/null
+++ b/docs/design/decision-record/0006-data-export.md
@@ -0,0 +1,45 @@
+# Export of parquet files and DICOM data
+
+* Status: accepted
+* Deciders: Haroon Chughtai, Jeremy Stein, Milan Malfait, Ruaridh Gollifer, Stef Piatek
+* Date: 2024-02-26
+
+## Context and Problem Statement
+
+The pipeline needs to be able to export DICOM images and structured data files to different endpoints. 
+
+## Decision Drivers <!-- optional -->
+
+* We expect that some projects will have more data than we can store locally. Will need a rolling export of images
+* We will need to be able to export images and structured data via FTPS in an automated fashion
+* We will need to be able to export images via DICOMWeb
+
+
+## Considered Options
+
+* Shared python library for exporting of data, used in `orthanc-anon` and the `pixl` CLI.
+* `export-api` service, which can export both DICOM and structured data files. 
+
+## Decision Outcome
+
+Chosen options: "`export-api` service", for clear separation of responsibilities.
+
+## Pros and Cons of the Options <!-- optional -->
+
+### Shared python library
+
+Add private creator group to instances as they arrive, and a dummy value in the custom tag.
+Once the study has been pulled from the DICOM source, update the tag with the filename stem of the project config.
+
+* Good, one less service to maintain
+* Good, export via DICOMWeb is using the orthanc API already
+* Bad, duplication of implementation for export
+* Bad, duplication of areas where secrets are used
+
+### `export-api` service
+
+Instead of shared library the code would be in the service alone. 
+
+* Good, single service that will access all secrets and orchestrate exports
+* Good, allows caching of export secrets in a long-running service
+* Bad, would require extra code for interacting with the service from the CLI for parquet export
diff --git a/docs/design/decision-record/index.md b/docs/design/decision-record/index.md
index 57caedd91..ef2f7031f 100644
--- a/docs/design/decision-record/index.md
+++ b/docs/design/decision-record/index.md
@@ -14,26 +14,6 @@ You can install it via `npm install -g adr-log`, removing the template
 * [ADR-0003](0003-dicom-processing.md) - DICOM server and processing
 * [ADR-0004](0004-multiple-project-configuration.md) - Multiple-project configuration
 * [ADR-0005](0005-project-based-study-routing.md) - Project-based study routing
+* [ADR-0006](0006-data-export.md) - Export of parquet files and DICOM data
 
 <!-- adrlogstop -->
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-