Skip to content

Anishrkhadka/HydroComply

Repository files navigation

Logo

This project automates end-to-end water quality compliance checks using data from the UK Environment Agency Hydrology API (environment.data.gov.uk).
Compliance is evaluated against the following criteria:

Determinand Unit Compliance Criteria
Dissolved Oxygen mg/L > 4
pH Between 6.5 and 8.5
Ammonium mg/L < 0.5
Turbidity NTU < 25
Temperature °C < 20

Demo

The project follows a four-step workflow and uses a medallion data architecture (bronze, silver, and gold layers) for data storage and processing.

  1. 00_data_ingestion.ipynb
    Retrieves data from the Environment Agency API. Focuses on active monitoring stations publishing the required water properties within the 2020–2025 period. The retrieved data is stored in the bronze layer.

  2. 01_EDA.ipynb
    Performs exploratory data analysis (EDA) on a selected station to design data preparation steps such as resampling and data quality checks. Processed data is saved to the silver layer.

  3. 02_compliance_check.ipynb
    Executes compliance checks using the silver-layer dataset and generates an Excel report. The final processed data is stored in the gold layer, which can also be used for dashboard visualisations.

  4. streamlit_app.py
    Provides an interactive Streamlit interface that automates the above steps end to end. Users can download data, perform real-time compliance checks, and export Excel reports.

  5. docker-compose.yml
    Enables running the entire project in a containerised environment.

Repository Structure

.
|-- streamlit_app.py          # Streamlit UI for data download and compliance checks
|-- src/
|   |-- app_services.py       # File management, caching, ingest and preprocessing helpers
|   |-- compliance_checks.py  # Compliance rule definitions and evaluation logic
|   |-- config.py             # API endpoints and colour configuration
|   |-- station_catalog.py    # Station catalogue discovery using the EA Hydrology API
|   |-- visualisations.py     # Plotly charts for data quality and compliance
|   `-- water_quality.py      # API ingestion, resampling, and data quality tagging
|-- data/                     # Bronze, silver, and gold outputs (generated at runtime)
|-- assets/                   # UI assets (e.g. logo)
|-- 00_data_ingestion.ipynb   # Notebook: API ingestion prototype
|-- 01_EDA.ipynb              # Notebook: exploratory data analysis
|-- 02_compliance_check.ipynb # Notebook: compliance rule development
|-- requirements.txt
|-- Dockerfile
`-- docker-compose.yml

Getting Started

Prerequisites

  • Python 3.11 or later
  • pip for dependency management
  • Optional: Docker and Docker Compose for containerised deployment

Local Setup

  1. Create and activate a virtual environment.

    python -m venv .venv
    source .venv/bin/activate
  2. Install dependencies.

    pip install --upgrade pip
    pip install -r requirements.txt
  3. Launch the Streamlit app.

    streamlit run streamlit_app.py
  4. Streamlit runs on http://localhost:8501. The first download may take several minutes because the API is paginated.

Docker

  1. Build and run with Compose.

    docker compose up --build
  2. The app runs on http://localhost:8501. The data/ directory is mounted to persist downloads between sessions.

Outputs and Data Layout

  • data/meta/ — caches the station catalogue (stations_water_quality.parquet).
  • data/bronze/ — stores raw readings per station (*_readings.parquet).
  • data/silver/ — contains hourly, data-quality-tagged datasets (*_readings.parquet).
  • data/gold/ — stores compliance outputs (*_compliance_results.parquet) and generates Excel reports on demand.

These folders are created automatically. Ensure the repository has write access to the data/ directory.

About

Automated water quality compliance checks using Environment Agency hydrology data, with an end-to-end workflow, medallion architecture, and Streamlit interface.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors