This project automates end-to-end water quality compliance checks using data from the UK Environment Agency Hydrology API (environment.data.gov.uk).
Compliance is evaluated against the following criteria:
| Determinand | Unit | Compliance Criteria |
|---|---|---|
| Dissolved Oxygen | mg/L | > 4 |
| pH | — | Between 6.5 and 8.5 |
| Ammonium | mg/L | < 0.5 |
| Turbidity | NTU | < 25 |
| Temperature | °C | < 20 |
The project follows a four-step workflow and uses a medallion data architecture (bronze, silver, and gold layers) for data storage and processing.
-
00_data_ingestion.ipynb
Retrieves data from the Environment Agency API. Focuses on active monitoring stations publishing the required water properties within the 2020–2025 period. The retrieved data is stored in the bronze layer. -
01_EDA.ipynb
Performs exploratory data analysis (EDA) on a selected station to design data preparation steps such as resampling and data quality checks. Processed data is saved to the silver layer. -
02_compliance_check.ipynb
Executes compliance checks using the silver-layer dataset and generates an Excel report. The final processed data is stored in the gold layer, which can also be used for dashboard visualisations. -
streamlit_app.py
Provides an interactive Streamlit interface that automates the above steps end to end. Users can download data, perform real-time compliance checks, and export Excel reports. -
docker-compose.yml
Enables running the entire project in a containerised environment.
.
|-- streamlit_app.py # Streamlit UI for data download and compliance checks
|-- src/
| |-- app_services.py # File management, caching, ingest and preprocessing helpers
| |-- compliance_checks.py # Compliance rule definitions and evaluation logic
| |-- config.py # API endpoints and colour configuration
| |-- station_catalog.py # Station catalogue discovery using the EA Hydrology API
| |-- visualisations.py # Plotly charts for data quality and compliance
| `-- water_quality.py # API ingestion, resampling, and data quality tagging
|-- data/ # Bronze, silver, and gold outputs (generated at runtime)
|-- assets/ # UI assets (e.g. logo)
|-- 00_data_ingestion.ipynb # Notebook: API ingestion prototype
|-- 01_EDA.ipynb # Notebook: exploratory data analysis
|-- 02_compliance_check.ipynb # Notebook: compliance rule development
|-- requirements.txt
|-- Dockerfile
`-- docker-compose.yml
- Python 3.11 or later
pipfor dependency management- Optional: Docker and Docker Compose for containerised deployment
-
Create and activate a virtual environment.
python -m venv .venv source .venv/bin/activate -
Install dependencies.
pip install --upgrade pip pip install -r requirements.txt
-
Launch the Streamlit app.
streamlit run streamlit_app.py
-
Streamlit runs on
http://localhost:8501. The first download may take several minutes because the API is paginated.
-
Build and run with Compose.
docker compose up --build
-
The app runs on
http://localhost:8501. Thedata/directory is mounted to persist downloads between sessions.
data/meta/— caches the station catalogue (stations_water_quality.parquet).data/bronze/— stores raw readings per station (*_readings.parquet).data/silver/— contains hourly, data-quality-tagged datasets (*_readings.parquet).data/gold/— stores compliance outputs (*_compliance_results.parquet) and generates Excel reports on demand.
These folders are created automatically. Ensure the repository has write access to the data/ directory.
