diff --git a/README.md b/README.md
index 0dd1a786d..abc276d17 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,5 @@
# PIXL
+
PIXL Image eXtraction Laboratory
`PIXL` is a system for extracting, linking and de-identifying DICOM imaging data, structured EHR data and free-text data from radiology reports at UCLH.
@@ -8,62 +9,173 @@ PIXL is intended run on one of the [GAE](https://github.com/UCLH-Foundry/Book-of
several services orchestrated by [Docker Compose](https://docs.docker.com/compose/).
## Services
+
### [PIXL CLI](./cli/README.md)
+
Primary interface to the PIXL system.
+
### [Hasher API](./hasher/README.md)
+
HTTP API to securely hash an identifier using a key stored in Azure Key Vault.
+
### [Orthanc Raw](./orthanc/orthanc-raw/README.md)
+
A DICOM node which receives images from the upstream hospital systems and acts as cache for PIXL.
+
### [Orthanc Anon](./orthanc/orthanc-anon/README.md)
+
A DICOM node which wraps our de-identifcation and cloud transfer components.
+
### PostgreSQL
+
RDBMS which stores DICOM metadata, application data and anonymised patient record data.
+
### [Electronic Health Record Extractor](./pixl_ehr/README.md)
-HTTP API to process messages from the `ehr` queue and populate raw and anon tables in the PIXL postgres instance.
+
+HTTP API to process messages from the `ehr` queue and populate raw and anon tables in the PIXL postgres instance.
+
### [PACS Image Extractor](./pixl_pacs/README.md)
-HTTP API to process messages from the `pacs` queue and populate the raw orthanc instance with images from PACS/VNA.
+
+HTTP API to process messages from the `pacs` queue and populate the raw orthanc instance with images from PACS/VNA.
## Setup
-### 0. Choose deployment environment
+### 0. UCLH infrastructure setup
+
+
+Install shared miniforge installation if it doesn't exist
+
+Follow the suggestion for installing a central [miniforge](https://github.com/conda-forge/miniforge)
+installation to allow all users to be able to run modern python without having admin permissions.
+
+```shell
+# Create directory with correct structure (only if it doesn't exist yet)
+mkdir /gae/miniforge3
+chgrp -R docker /gae/miniforge3
+chmod -R g+rwxs /gae/miniforge3 # inherit group when new directories or files are created
+setfacl -R -m d:g::rwX /gae/miniforge3
+# Install miniforge
+wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
+bash Miniforge3-$(uname)-$(uname -m).sh -p /gae/miniforge3
+conda update -n base -c conda-forge conda
+conda create -n pixl_dev python=3.10.*
+```
+
+The directory should now have these permissions
+
+```shell
+> ls -lah /gae/miniforge3/
+total 88K
+drwxrws---+ 19 jstein01 docker 4.0K Nov 28 12:27 .
+drwxrwx---. 18 root docker 4.0K Dec 1 19:35 ..
+drwxrws---+ 2 jstein01 docker 8.0K Nov 28 12:27 bin
+drwxrws---+ 2 jstein01 docker 30 Nov 28 11:49 compiler_compat
+drwxrws---+ 2 jstein01 docker 32 Nov 28 11:49 condabin
+drwxrws---+ 2 jstein01 docker 8.0K Nov 28 12:27 conda-meta
+-rw-rws---. 1 jstein01 docker 24 Nov 28 11:49 .condarc
+...
+```
+
+
+
+
+If you haven't just installed the miniforge yourself, update your configuration
+
+Edit `~/.bash_profile` to add `/gae/miniforge3/bin` to the PATH. for example
+
+```shell
+PATH=$PATH:$HOME/.local/bin:$HOME/bin:/gae/miniforge3/bin
+```
+
+Run the updated profile (or reconnect to the GAE) so that conda is in your PATH
+
+```shell
+source ~/.bash_profile
+```
+
+Initialise conda
+
+```shell
+conda init bash
+```
+
+Run the updated profile (or reconnect to the GAE) so that conda is in your PATH
+
+```shell
+source ~/.bash_profile
+```
+
+Activate an existing pixl environment
+
+```shell
+conda activate pixl_dev
+```
+
+
+
+Create an instance for the GAE if it doesn't already exist
+
+Select a place for the deployment. On UCLH infrastructure this will be in `/gae`, so `/gae/pixl_dev` for example.
+
+```shell
+mkdir /gae/pixl_dev
+chgrp -R docker /gae/pixl_dev
+chmod -R g+rwxs /gae/pixl_dev # inherit group when new directories or files are created
+setfacl -R -m d:g::rwX /gae/pixl_dev
+# now clone the repository or copy an existing deployment
+```
+
+
+
+### 1. Choose deployment environment
+
This is one of dev|test|staging|prod and referred to as `` in the docs.
-### 1. Initialise environment configuration
+### 2. Initialise environment configuration
+
Create a local `.env` and `pixl_config.yml` file in the _PIXL_ directory:
+
```bash
cp .env.sample .env && cp pixl_config.yml.sample pixl_config.yml
```
+
Add the missing configuration values to the new files:
#### Environment
+
Set `ENV` to ``.
#### Credentials
-- `EMAP_DB_`*
-UDS credentials are only required for `prod` or `staging` deployments of when working on the EHR & report retriever component.
-You can leave them blank for other dev work.
-- `PIXL_DB_`*
-These are credentials for the containerised PostgreSQL service and are set in the official PostgreSQL image.
+
+- `EMAP_DB_`*
+UDS credentials are only required for `prod` or `staging` deployments of when working on the EHR & report retriever component.
+You can leave them blank for other dev work.
+- `PIXL_DB_`*
+These are credentials for the containerised PostgreSQL service and are set in the official PostgreSQL image.
Use a strong password for `prod` deployment but the only requirement for other environments is consistency as several services interact with the database.
- `PIXL_EHR_API_AZ_`*
These credentials are used for uploading a PIXL database to Azure blob storage. They should be for a service principal that has `Storage Blob Data Contributor`
on the target storage account. The storage account must also allow network access from the PIXL host machine.
#### Ports
-Most services need to expose ports that must be mapped to ports on the host. The host port is specified in `.env`
-Ports need to be configured such that they don't clash with any other application running on that GAE.
+Most services need to expose ports that must be mapped to ports on the host. The host port is specified in `.env`
+Ports need to be configured such that they don't clash with any other application running on that GAE.
## Run
### Start
+
From the _PIXL_ directory:
+
```bash
bin/pixldc pixl_dev up
```
### Stop
+
From the _PIXL_ directory:
+
```bash
bin/pixldc pixl_dev down
```
@@ -71,29 +183,31 @@ bin/pixldc pixl_dev down
## Analysis
The number of DICOM instances in the raw Orthanc instance can be accessed from
-`http://:/ui/app/#/settings` and similarly with
+`http://:/ui/app/#/settings` and similarly with
the Orthanc Anon instance, where `pixl_host` is the host of the PIXL services
and `ORTHANC_RAW_WEB_PORT` is defined in `.env`.
-The number of reports and EHR can be interrogated by connecting to the PIXL
-database with a database client (e.g. [DBeaver](https://dbeaver.io/)), using
-the connection parameters defined in `.env`. For example, to find the number of
+The number of reports and EHR can be interrogated by connecting to the PIXL
+database with a database client (e.g. [DBeaver](https://dbeaver.io/)), using
+the connection parameters defined in `.env`. For example, to find the number of
non-null reports
```sql
select count(*) from emap_data.ehr_anon where xray_report is not null;
```
-
## Develop
-See each service's README for instructions for individual developing and testing instructions.
-For Python development we use [isort](https://github.com/PyCQA/isort) and [black](https://black.readthedocs.io/en/stable/index.html) alongside [pytest](https://www.pytest.org/).
-There is support (sometimes through plugins) for these tools in most IDEs & editors.
+
+See each service's README for instructions for individual developing and testing instructions.
+For Python development we use [ruff](https://docs.astral.sh/ruff/) alongside [pytest](https://www.pytest.org/).
+There is support (sometimes through plugins) for these tools in most IDEs & editors.
Before raising a PR, **run the full test suite** from the _PIXL_ directory with
+
```bash
bin/run-all-tests.sh
```
-and not just the component you have been working on as this will help us catch unintentional regressions without spending GH actions minutes :-)
+
+and not just the component you have been working on as this will help us catch unintentional regressions without spending GH actions minutes :-)
We run [pre-commit](https://pre-commit.com/) as part of the GitHub Actions CI. To install and run it locally, do: