AZR-inspired Energy Forecasting & Anomaly Detection

A machine learning system that adapts the propose→solve→verify self-play paradigm from Absolute Zero Reasoner (AZR) (arXiv:2505.03335) to time series forecasting and anomaly detection in energy consumption data.

Project Vision

This Final Year Project explores how self-play reinforcement learning can enhance time series forecasting by training models to propose challenging scenarios, solve them accurately, and verify solutions through realistic constraints. We focus on household energy consumption prediction with validation against real distribution network feeders.

Key Innovation: Unlike traditional supervised learning on historical data, our approach generates synthetic scenarios that stress-test model capabilities while maintaining physical plausibility through verifiable reward signals.

Data Flow Architecture

graph TB
    subgraph "Raw Data Sources"
        A[UK-DALE<br/>Household Energy]
        B[London Smart Meters<br/>LCL Dataset]
        C[SSEN LV Feeder<br/>Distribution Network]
    end

    subgraph "Processing Pipeline"
        D[Data Harmonization<br/>30-min resolution]
        E[Feature Engineering<br/>Weather, Calendar, Lags]
    end

    subgraph "Self-Play Training"
        F[Proposer<br/>Scenario Generation]
        G[Solver<br/>TS Forecasting Model]
        H[Verifier<br/>Constraint Validation]
    end

    subgraph "Validation & Evaluation"
        I[Pseudo-Feeder<br/>Aggregation]
        J[Distributional<br/>Comparison]
        K[Anomaly Case<br/>Studies]
    end

    A --> D
    B --> D
    D --> E
    E --> F
    F --> G
    G --> H
    H --> F
    E --> I
    C --> J
    I --> J
    J --> K

USP

Real-World Grid Validation: Actual SSEN distribution network data (100K consumption readings from operational feeders)
Latest Architectures: PatchTST and N-BEATS variants with uncertainty quantification
Verifiable Rewards: Physics-based constraints ensure realistic scenario generation
Multi-Scale Validation: Household-level accuracy with distribution-feeder-level realism checks
Production MLOps: DVC data versioning, MLflow experiment tracking, comprehensive CI/CD
Uncertainty Quantification: Quantile regression heads and Monte Carlo dropout
Open Science: Reproducible experiments with clear data governance

Project Status

Current Phase: Data Ingestion & Exploration Next Milestone: Self-Play Prototype Implementation

Completed	In Progress	Upcoming
Data infrastructure (DVC) Datasets acquired (15.2GB) Ingestion pipeline built Baseline models implemented Testing framework	Full dataset ingestion Exploratory analysis Anomaly strategy defined SSEN constraint extraction	Self-play architecture Proposer/Verifier agents Model training Evaluation & writing

Datasets Overview

Dataset	Size	Records	Households/Feeders	Purpose
LCL (London Smart Meters)	8.54 GB	~167M readings	5,567 households	Training & validation
UK-DALE	6.33 GB	~114M readings	5 houses	Appliance-level analysis
SSEN (LV Feeder Data)	37 MB	100K metadata + 100K consumption	100K feeders (28 with time-series)	Real-world validation
Total	~15 GB	~281M readings	5,572+ entities	—

All datasets tracked with DVC. SSEN provides actual operational grid data for validating pseudo-feeder realism. See data/README_raw.md for access instructions.

Quick Start

Prerequisites

Python 3.11+
Poetry for dependency management
Git with LFS support

Installation

# Clone the repository
git clone https://github.com/vatsalmehta/FYP-Predictive_Anomaly_Detection.git
cd FYP-Predictive_Anomaly_Detection

# Install dependencies
poetry install

# Activate virtual environment
poetry shell

# Install pre-commit hooks
pre-commit install

# Pull data if remote configured (optional)
# dvc pull

# Run smoke tests
pytest tests/

# Verify pipeline (placeholder stages)
dvc repro

Data Onboarding

This project uses DVC (Data Version Control) to manage large datasets while keeping Git repositories lightweight.

For Quick Testing/CI

# Use built-in synthetic samples (already available)
ls data/samples/
# → lcl_sample.csv, ukdale_sample.csv, ssen_sample.csv

For Full Development

# 1. Download datasets (see docs/download_links.md for sources)
#    Place in: data/raw/ukdale/, data/raw/lcl/, data/raw/ssen/

# 2. Track with DVC
dvc add data/raw/ukdale
dvc add data/raw/lcl
dvc add data/raw/ssen

# 3. Commit pointers (not data!) to Git
git add data/raw/*.dvc dvc.lock
git commit -m "DVC: track raw datasets via pointers"

# 4. Optional: Set up remote storage for team sharing
dvc remote add -d myremote s3://my-bucket/fyp-data/
dvc push

Dataset Locations:

data/raw/ukdale/ → UK-DALE household consumption
data/raw/lcl/ → London Smart Meters data
data/raw/ssen/ → SSEN distribution feeder data
data/samples/ → Tiny synthetic samples for demos/CI

Resources:

Data Ingestion

# Quick test with samples (no downloads needed)
python -m fyp.ingestion.cli lcl --use-samples
python -m fyp.ingestion.cli ukdale --use-samples
python -m fyp.ingestion.cli ssen --use-samples

# Full ingestion (requires raw data)
python -m fyp.ingestion.cli lcl
python -m fyp.ingestion.cli ukdale --downsample-30min
python -m fyp.ingestion.cli ssen  # Uses CKAN API

Baseline Models

# Quick forecasting baselines on samples
python -m fyp.runner forecast --dataset lcl --use-samples

# Anomaly detection baselines
python -m fyp.runner anomaly --dataset ukdale --use-samples

# Full evaluation with custom horizon
python -m fyp.runner forecast --dataset ssen --horizon 96

# Modern neural models with uncertainty quantification
python -m fyp.runner forecast --dataset lcl --model-type patchtst --use-samples
python -m fyp.runner anomaly --dataset ukdale --model-type autoencoder --use-samples

# Note: Use canonical import path fyp.anomaly.autoencoder
# (old path fyp.models.autoencoder still works but deprecated)

Running Locally

# Check code quality
pre-commit run --all-files

# Run full test suite
pytest tests/ -v

# Check pipeline status
dvc status

# View experiment tracking (when available)
mlflow ui

Project Structure

├── .github/           # GitHub workflows and issue templates
├── docs/              # Comprehensive documentation
├── notebooks/         # Jupyter notebooks for exploration
├── src/fyp/          # Main package source code
├── tests/            # Test suite
├── data/             # Data directories (DVC tracked)
│   ├── raw/          # Original datasets (gitignored)
│   ├── processed/    # Cleaned and transformed data
│   └── derived/      # Model outputs and artifacts
└── dvc.yaml          # DVC pipeline definition

Known Issues & Limitations

Data Limitations

No Ground-Truth Anomaly Labels: Datasets lack labeled anomalies. We address this through:
- Physics-based constraints from SSEN
- Self-play learning without labels
- Synthetic test set for quantitative evaluation
SSEN Time-Series Data: Currently have feeder metadata only. Time-series consumption requires:
- Research partnership agreement, OR
- API access (pending), OR
- Pseudo-feeder generation from LCL aggregations (our approach)

Technical Constraints

Large Dataset Processing: LCL CSV (8.5GB) requires:
- Chunked reading for memory efficiency
- Parquet conversion for fast queries
- Current implementation tested on 16GB+ RAM
HDF5 Dependencies: UK-DALE requires h5py library and proper HDF5 handling

Scope Decisions

Focus on Novelty Over SOTA: This project prioritizes:
- Novel self-play approach to unsupervised anomaly detection
- Physics-informed verification using real network constraints
- Demonstrating feasibility of label-free learning
- NOT achieving state-of-the-art forecasting accuracy

These are documented features, not bugs. See docs/anomaly_strategy.md for our approach.

Ethics & Privacy

No PII Joins: Personal identifiable information is never linked across datasets
SSEN Validation Only: Distribution network data used solely for external validation
Anonymized Analysis: All household-level analysis maintains user anonymity
Data Minimization: Only essential features extracted for modeling purposes
Transparent Methods: All processing steps documented and reproducible

Documentation

Datasets: UK-DALE, London Smart Meters, and SSEN LV Feeder details
Data Governance: DVC setup, provenance, and retention policies
Self-Play Design: Propose→solve→verify architecture for time series
Experiments: MLflow organization and naming conventions
Feeder Evaluation: Validation methodology against real networks

Contributing

We welcome contributions! Please see our Contributing Guide for details on:

Development workflow and branch management
Code style and testing requirements
Experiment tracking best practices

Please read our Code of Conduct before participating.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this work in your research, please cite:

@software{fyp_energy_forecasting,
  title = {AZR-inspired Energy Forecasting & Anomaly Detection},
  author = {Your Name},
  year = {2025},
  url = {https://github.com/vatsalmehta2001/FYP-Predictive_Anomaly_Detection}
}

See CITATION.cff for complete citation metadata.

Related Work

Absolute Zero Reasoner (AZR) - Propose→solve→verify paradigm we adapt
PatchTST - Patch-based transformer for time series
N-BEATS - Neural basis expansion analysis for forecasting
UK-DALE - UK Domestic Appliance-Level Electricity dataset

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.dvc		.dvc
.github		.github
.planning		.planning
data		data
docs		docs
examples		examples
notebooks		notebooks
scripts		scripts
src/fyp		src/fyp
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FYP_Term1_Report_Outline.md		FYP_Term1_Report_Outline.md
Grid-Guardian-Final.pdf		Grid-Guardian-Final.pdf
LICENSE		LICENSE
README.md		README.md
check_progress.sh		check_progress.sh
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
monitor_ingestion.sh		monitor_ingestion.sh
params.yaml		params.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
references.bib		references.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AZR-inspired Energy Forecasting & Anomaly Detection

Project Vision

Data Flow Architecture

USP

Project Status

Datasets Overview

Quick Start

Prerequisites

Installation

Data Onboarding

For Quick Testing/CI

For Full Development

Data Ingestion

Baseline Models

Running Locally

Project Structure

Known Issues & Limitations

Data Limitations

Technical Constraints

Scope Decisions

Ethics & Privacy

Documentation

Contributing

License

Citation

Related Work

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

vmehtacode/FYP-Predictive_Anomaly_Detection

Folders and files

Latest commit

History

Repository files navigation

AZR-inspired Energy Forecasting & Anomaly Detection

Project Vision

Data Flow Architecture

USP

Project Status

Datasets Overview

Quick Start

Prerequisites

Installation

Data Onboarding

For Quick Testing/CI

For Full Development

Data Ingestion

Baseline Models

Running Locally

Project Structure

Known Issues & Limitations

Data Limitations

Technical Constraints

Scope Decisions

Ethics & Privacy

Documentation

Contributing

License

Citation

Related Work

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages