feat: modular pipeline — registries, hook-based trainer, and uncertainty quantification by samrat-rm · Pull Request #8 · Orion-AI-Lab/TIRAuxCloud

samrat-rm · 2026-03-19T15:05:32Z

Summary

This PR started as a structural refactor but grew into a full pipeline overhaul with real new capabilities alongside the modularization.

Original goal: Restructure the codebase into a clean, extensible pipeline with no functional changes.

What actually landed:

Structural Refactor

Moved models, data loaders, loss functions, metrics, and evaluation into separate modules
Added BaseModel abstract class with decorator-based MODEL_REGISTRY, loss_registry, and PipelineConfig dataclass with from_json()
Added pipeline/runner.py as a clean entry point

Hook-Based Training Architecture

Introduced a TrainingHook extension system — the core loop never changes, features plug in as hooks (on_batch_end, on_epoch_end):

UncertaintyHook — tracks mean softmax entropy per epoch during training
EntropyRegHook — adds multi-scale entropy regularization per batch
AttributionHook — placeholder registered for explainability (GradCAM)

Uncertainty Quantification

Mean softmax entropy is tracked in two places intentionally — training (UncertaintyHook) and validation (validate_all), reported alongside Pixel Accuracy and mIoU. Decreasing validation uncertainty over epochs signals the model gaining confidence; sustained high values indicate the model is struggling.

Entropy Regularization

Without EntropyRegHook, models on imbalanced cloud datasets collapse toward the dominant class (clear sky), inflating accuracy while missing thin clouds. The entropy penalty keeps the model from defaulting to the easy answer.

Explainability (AttributionHook)

Keeps attribution logic decoupled from the training loop. Planned methods: GradCAM++ for spatial activation maps and DeepSHAP for per-feature (band, DEM, weather) contribution scores.

What Changed

1. Project Structure

Reorganized the project into dedicated modules with clear responsibilities:

.
├── configs/          # Pipeline and training configurations
├── data/             # Dataset loaders and augmentation
├── evaluation/       # Metrics and validation logic
├── model_builder/    # Model registry and abstract base class
├── models/           # All model architectures
├── training/         # Trainer, hooks, loss registry
├── utils/            # Shared utilities
└── results/          # Outputs and checkpoints

Split common_metrics.py into:

loss.py — loss function definitions
validation.py — validation loop logic
metrics.py — metric computation

Added __init__.py across all modules for reliable package imports.

2. Model Registry (`model_builder/`)

BaseModel (ABC) — enforces a common interface on all models:
- forward()
- from_config()
- name
@register_model decorator — register a new model with one line,
no changes to core logic
get_model(name, config) — single entry point for model instantiation

All models migrated to inherit BaseModel:
UnetModel, SegFormerModel, DeepLabV3Model, SwinUnetModel,
SiameseUNet, SwinCloud, CDnetV2, HRCloudNet, BAM-CD

3. Loss Registry (`training/`)

@register_loss decorator — mirrors the model registry pattern
loss.py — loss definitions
loss_builders.py — instantiation logic
get_loss(name, class_counts, device) — single entry point for loss creation

4. Hook-Based Trainer (`training/`)

Introduces an extensible training architecture — new behaviors (regularization,
logging, attribution) attach as hooks without touching the core training loop.

Core abstractions:

TrainingHook (ABC)
- on_batch_end() → returns optional auxiliary loss tensor
- on_epoch_end() → handles logging and scheduling
BaseTrainer (ABC) — accumulates hook losses:

  total_loss = seg_loss + sum(hook.on_batch_end())

CloudTrainer (concrete) — handles dataloaders, dual-encoder models,
CDnetV2 auxiliary outputs, early stopping, W&B logging, seeding

Built-in hooks :

EntropyRegHook — entropy-based regularization loss
UncertaintyHook — logs mean softmax entropy per epoch
AttributionHook — placeholder for Grad-CAM++ / DeepSHAP attribution

5. `EntropyRegHook` — Entropy Regularization to Mitigate Unimodal Dominance

It implements multi-scale functional entropy regularization per batch, directly adopted from Section IV of the paper. Addresses the unimodal dominance problem — where a model collapses to predicting one class (e.g. always "cloudy") because it minimizes loss without learning class boundaries. Penalizes over-confident predictions by adding -λ · H(p) to the total loss each batch, forcing the model to maintain spread across classes throughout training. λ is configurable via lambda_reg in the config.

6. `UncertaintyHook` — Mean Entropy Uncertainty Logged Per Epoch

It tracks mean softmax entropy across the full validation set after each epoch, computed inside validate_all() and reported alongside IoU and loss:

Decreasing uncertainty over epochs indicates a model gaining confidence on unseen data. Sustained high values signal the model is struggling — a diagnostic signal beyond accuracy alone.

Mean softmax entropy is tracked in two separate places intentionally:

Training (UncertaintyHook.on_epoch_end) — accumulates per-batch entropy during
the forward pass and averages it at epoch end. Stored as mean_uncertainty in the
training metrics dict.
Validation (validate_all) — the same computation runs inline over the full
validation set and is reported on the same line as Pixel Accuracy and mIoU:

7. Pipeline Entry Point (`pipeline/`)

Provides a unified entry point to construct and validate the full training pipeline with minimal setup.

Core components:

runner.py
- build_pipeline(params_dict) → wires registry, optimizer, loss functions, and hooks into a CloudTrainer in a single call
smoke_test.py
- End-to-end validation script to ensure forward pass, backward pass, and hook accumulation execute correctly on CPU
- Uses dummy data — no dataset or GPU required
- Verified working:
```
python pipeline/smoke_test.py
```

8. Typed Config (`configs/`)

Introduces a structured and validated configuration system to replace untyped parameter dictionaries.

Core components:

PipelineConfig (dataclass)
- Strongly typed configuration schema
- __post_init__() → validates:
  - loss name
  - optimizer type
  - learning rate
  - batch size
  - lambda_reg
Utility methods:
- from_json(path, key) → loads config from JSON with:
  - unknown-field detection
  - missing-key validation
- to_dict() → converts back to params_dict for compatibility with existing pipeline components

9. Dependencies

Added requirements.txt for one-step environment setup:

pip install -r requirements.txt

What Does NOT Change

Ensures backward compatibility and stability across the existing training ecosystem.

Unaffected components:

Data loading, validation, and metrics
- Existing pipelines remain unchanged and fully operational
JSON run configurations
- All current config files are fully compatible without modification
fine_tune_models.py
- Continues to function using the deprecated train_model() interface
- No required changes for existing fine-tuning workflows

Pre-refactor .pth checkpoints load correctly into BaseModel wrappers via
automatic key remapping — no retraining required ✓

Backward Compatibility

train_model() is deprecated but retained — still used by fine_tune_models.py.
Migration to the hook-based trainer is tracked in the next steps.

Verified Working

The full pipeline wiring — model instantiation via registry, loss construction,
hook accumulation, and forward + backward pass — has been validated end-to-end
on CPU with no dataset or GPU required.

Smoke test (pipeline/smoke_test.py):

python pipeline/smoke_test.py

Confirms:

build_pipeline() correctly wires registry → optimizer → loss → hooks → CloudTrainer
Forward pass produces expected output shape
on_batch_end() hook losses accumulate into total_loss without errors
Backward pass completes cleanly

Model test flow (evaluation/model_test.py):

Post-refactor checkpoint loading verified with key remapping for BaseModel wrapper compatibility (see commit e6c74fb)
Evaluation metrics (accuracy, mIoU, precision, recall, F1) confirmed working through the refactored evaluation/ module
Single-pass entropy uncertainty metric (from PR Compute and print mean single-pass uncertainty metrics during model test #4) confirmed compatible with refactored evaluation path

Next Steps

Pipeline config schema — typed dataclass / YAML config for all
pipeline parameters (model name, loss, hooks, data paths, etc.)
Pipeline runner — single entry point: run_pipeline(config)
that wires model registry → loss registry → trainer → hooks
Explainability hook — implement AttributionHook with
Grad-CAM++ (spatial inputs) and DeepSHAP (scalar/meteorological inputs)
Uncertainty hook (advanced) — MC Dropout-based uncertainty estimation [If required]
Loader registry — decouple data loader selection from model type;
config declares data format, registry maps to the right Dataset class
Migrate fine_tune_models.py — remove deprecated train_model() path
Refine requirements.txt and update README.md

AI Disclosure:
AI assistance (Claude) was used in parts of this PR in the following ways:

Idea generation — exploring design patterns (registry, decorator, hook/callback
architecture) and discussing tradeoffs before settling on an approach
Draft implementation — generating initial scaffolding for some components
(e.g., BaseModel ABC structure, registry decorator pattern, hook interface skeleton)
PR description — drafting and structuring this description

All AI-generated code and ideas were manually reviewed, tested, and validated
before being committed. Final implementation decisions and code ownership rest
entirely with the contributor.

…o loss.py and move training files from models/ to data/ and update imports

…y and moved to evaluation dir and imports are updated respectively

…ng : there will be more config files as the project evolves. So, config folder score is increased

…ed in the models dir. Remiving the copy of loaders.

…odel_builder module and updated imports respectively

…EGISTRY

…ODEL_REGISTRY

…om loss classes

…ng loop Add TrainingHook ABC, BaseTrainer, and CloudTrainer to decouple training extensions (entropy reg, uncertainty logging) from core loop logic.

build_pipeline() wires registry + CloudTrainer + hooks into a single callable. smoke_test.py verifies forward + backward + hook accumulation runs cleanly on CPU with dummy data — no real dataset required.

…bility

…now has real values

…on constraint

… unused script_dir

…ertainty tracking

…er.train_epoch overrides this

…onfig

samrat-rm added 30 commits March 19, 2026 19:09

refactor: move data files from models/ to data/ and update imports

248b58f

refactor: move all the loss functions from common_metrics.py file int…

221c0b6

…o loss.py and move training files from models/ to data/ and update imports

refactor: common_metrics.py is split into metrics.py and validation.p…

7516d55

…y and moved to evaluation dir and imports are updated respectively

refactor: config/saved_models_run.json moved into parent dir. Reasoni…

1a64693

…ng : there will be more config files as the project evolves. So, config folder score is increased

cleanup: loaders was moved to the data module but original file remin…

7c4b20f

…ed in the models dir. Remiving the copy of loaders.

refactor: moved the mode_test.py to evaluation and models_tcloud to m…

d8d08cd

…odel_builder module and updated imports respectively

refactor: plot_landsat.py moved to utils module

5e3c60c

chore: add requirements.txt with project dependencies

fd201f6

Add __init__.py to make directories importable as packages

d4d2b22

fix import statement for calculate_metrics function

588932e

refactor: add BaseModel abstract class for model interface contract

8c4e43d

refactor: add MODEL_REGISTRY and migrate SiameseUNet to BaseModel

e5b2f86

refactor: migrated CDnetV2 and HRCloudNet to BaseModel abstract class

8e2e092

refactor: migrated SwinCloud model to BaseModel abstract class

99f436f

fix: img_size for SwinCloud

dfb1015

refactor : migrated BAM_CD model to BaseModel

3af7aab

fix: removed duplicate import for smp

57809a2

refactor: add UnetModel wrapper and migrate to BaseModel/MODEL_REGISTRY

325a548

refactor: add SegFormerModel wrapper and migrate to BaseModel/MODEL_R…

0c97e72

…EGISTRY

refactor: move SiameseUNet to models/siamese_unet.py

6f6cd12

refactor: add DeepLabV3Model and SwinUnetModel wrappers, migrate to M…

a043505

…ODEL_REGISTRY

fix: super method called without () in SegFormerModel

43228e6

refactor: replace MODEL_REGISTRY dict with decorator-based registry

a25a411

Add: model imports to module init for centralized registration

e262ed9

Add: loss_registry for loss functions

1cc012c

refactor: add loss registry with decorator pattern, split builders fr…

6fab6c2

…om loss classes

feat: add hook system to training loop for extensible loss/logging

bbb6c7f

chore: fix comment

21ded36

feat: introduce hook-based trainer architecture for extensible traini…

247d1ae

…ng loop Add TrainingHook ABC, BaseTrainer, and CloudTrainer to decouple training extensions (entropy reg, uncertainty logging) from core loop logic.

feat: add pipeline/runner.py entry point and smoke test

6091f34

build_pipeline() wires registry + CloudTrainer + hooks into a single callable. smoke_test.py verifies forward + backward + hook accumulation runs cleanly on CPU with dummy data — no real dataset required.

samrat-rm added 6 commits March 30, 2026 20:30

feat: add PipelineConfig dataclass with validation and from_json()

0e0b654

chore: import and error handling for model testing

b7fd8d8

chore: remove comments

3e0e120

fix: remap pre-refactor checkpoint keys for BaseModel wrapper compati…

e6c74fb

…bility

feat: implement mean_uncertainty in validate_all() — UncertaintyHook …

25e0b29

…now has real values

chore: update requirements.txt with missing packages and Python versi…

4c1e25f

…on constraint

samrat-rm changed the title ~~Refactor project structure into modular pipeline + add requirements.txt~~ feat: modular pipeline — registries, hook-based trainer, and uncertainty quantification Mar 30, 2026

samrat-rm added 11 commits March 31, 2026 02:57

chore: removing and updating comments and imports

66d229c

chore: formatting imports and removing unused imports

d9925ab

chore: adding error handling for traintest

0bcfc75

fix: refactor parent_script_dir, remove duplicate sys.path and remove…

04ae51d

… unused script_dir

Fix : delete the List and Optional import and use the modern syntax.

0a948ab

feat: replace UncertaintyHook placeholder with real entropy-based unc…

35e099e

…ertainty tracking

chore: adding doc string for Uncertainty hook

6b95b66

Fix :Making the BaseTrainer train_epoch() abstract because CloudTrain…

4e59a44

…er.train_epoch overrides this

chore: replace legacy typing imports with modern syntax in pipeline_c…

4cc01fd

…onfig

Fix: Divide by zero error handling for mean uncertainty calculation

b390d0e

chore: update the doc string comment and remove unnecessary comments

3ff65ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modular pipeline — registries, hook-based trainer, and uncertainty quantification#8

feat: modular pipeline — registries, hook-based trainer, and uncertainty quantification#8
samrat-rm wants to merge 47 commits intoOrion-AI-Lab:mainfrom
samrat-rm:refactor/modularize-pipeline

samrat-rm commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samrat-rm commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Structural Refactor

Hook-Based Training Architecture

Uncertainty Quantification

Entropy Regularization

Explainability (AttributionHook)

What Changed

1. Project Structure

2. Model Registry (model_builder/)

3. Loss Registry (training/)

4. Hook-Based Trainer (training/)

5. EntropyRegHook — Entropy Regularization to Mitigate Unimodal Dominance

6. UncertaintyHook — Mean Entropy Uncertainty Logged Per Epoch

7. Pipeline Entry Point (pipeline/)

8. Typed Config (configs/)

9. Dependencies

What Does NOT Change

Backward Compatibility

Verified Working

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samrat-rm commented Mar 19, 2026 •

edited

Loading

2. Model Registry (`model_builder/`)

3. Loss Registry (`training/`)

4. Hook-Based Trainer (`training/`)

5. `EntropyRegHook` — Entropy Regularization to Mitigate Unimodal Dominance

6. `UncertaintyHook` — Mean Entropy Uncertainty Logged Per Epoch

7. Pipeline Entry Point (`pipeline/`)

8. Typed Config (`configs/`)