High‑level framework for training and evaluating neural network architectures. The goal is to make training both classification and regression networks fast, easy, and consistent, while keeping the experimentation loop reproducible, observable and extensible.
-
Configuration‑First Design
- Each experiment is defined by a config module in
nnlibrary/configs/(model, data paths, dataloader params, optimizer/scheduler, hooks, metrics). - Usually each config imports a default config (
nnlibrary/configs/__default__.py) which contains some of the lesser important config settings. - This approach is done because all configs must contain all the different arguments, but it is not always necessary to modify all settings per config basis.
- Each experiment is defined by a config module in
-
Datasets & I/O
- Dataset configuration (paths, batch sizes, shuffle flags, optional transforms) is declared in the experiment config modules under
nnlibrary/configs/(see thedatasetand itstrain/val/testDataLoaderConfigentries). - Concrete dataset classes live in
nnlibrary/datasets/(e.g.MpcDatasetHDF5fromnnlibrary/datasets/mpc_ahu.py). Each dataset should at the least implements__len__and__getitem__to return(inputs, targets). - To use a different dataset, add a new Dataset class in
nnlibrary/datasets/, import it innnlibrary/datasets/__init__.py, and then you can reference it by name in the config’s dataset section.
- Dataset configuration (paths, batch sizes, shuffle flags, optional transforms) is declared in the experiment config modules under
-
Models
- Defined in
nnlibrary/models/(e.g.mlp.py,cnn.py,transformer.py). - Available architectures: MLP (
HVACModeMLP), TCN (TCN,TCNRegression), Transformer (TransformerRegression,TransformerRegressionOptimized,TransformerClassificationOptimized). - New models can be defined, but must be imported in
nnlibrary/models/__init__.pyto be used.
- Defined in
-
Training Engine
- Central
Trainer(innnlibrary/engines/train.py) orchestrates: dataloaders, model, optimizer / scheduler, AMP autocast (withGradScalerfor float16), gradient clipping, hooks. - Usually direct usage of
Traineris not recommended, instead use the training script underscripts/train.py. - Metrics & state exposed through a shared
infodict so hooks remain decoupled.
- Central
-
Hooks (Lifecycle Extensions)
- A 'Hook' frame is present for interacting with multiple different stages of the training process, these can be seen under
nnlibrary/engines/hooks.py. - The base hooks implement validation / test evaluation, checkpointing, timing instrumentation, logging to Weights & Biases (W&B) and TensorBoard, post‑test plotting.
- Easy to add custom logic (inherit
Hookbaseand register in config list).- If you want to keep the default hooks in your custom config add the following to your config:
hooks.append(CustomHookName)
- If you want to keep the default hooks in your custom config add the following to your config:
- A 'Hook' frame is present for interacting with multiple different stages of the training process, these can be seen under
-
Evaluation Abstractions
ClassificationEvaluatorandRegressionEvaluatorreturn consistent metric dicts (plus optional detailed artifacts like confusion matrix or prediction scatter/sequence plots).
-
Observability & Reproducibility
- W&B run grouping by dataset/model; TensorBoard summaries under
exp/<dataset>/<model>/tensorboard. - Deterministic config capture via run config; model checkpoints (
model_last.pth,model_best.pth).
- W&B run grouping by dataset/model; TensorBoard summaries under
nnlibrary/
configs/ # Experiment & model/training hyper‑parameter configs
datasets/ # Dataset classes containing sample loading logic
engines/ # Training engine, hooks, and evaluation classes
models/ # Neural network architectures (MLP, TCN, Transformer)
utils/ # Custom losses, schedulers, transforms, misc
scripts/
train.py # Entry point: dynamic config loading & training
sweep.py # Script for running WandB hyperparameter sweeps
eval_visualization.py # Post‑training evaluation and visualization
export_onnx_model.py # Export trained model checkpoints to ONNX format
exp/ # Auto‑generated experiment artifacts (checkpoints, figures, logs)
data/ # !! Here you should put the datasets !!
dumpster/ # Scratch/experimental scripts (not part of main workflow)
environment.yml # Conda environment definition
.secrets/
wandb # !! File containing your WandB API key !!
conda env create -f environment.yml
conda activate pytorchAdd your WandB API key in a file under the .secrets directory:
mkdir .secrets
echo '<wandb-api-key>' > .secrets/wandbdata/
<dataset_range>/
dataset-classification/ or dataset-regression/
train.h5 val.h5 test.h5
stats/
metadata.json
(optional) feature_means.npy / feature_stds.npy / target_mean.npy / target_std.npy
Use an existing config name (e.g. hvac-mlp-cls, TCN-reg, transformer-reg, etc.). The train script resolves short names, dotted paths, or file paths.
For running the config named "TCN-reg", residing at nnlibrary/configs/TCN-reg.py:
python scripts/train.py -n TCN-reg # Shorthand
python scripts/train.py -n nnlibrary.configs.TCN-reg # Using module path
python scripts/train.py -n nnlibrary/configs/TCN-reg.py # Using relative or absolute pathOptional flags:
--logging— Enable logger output--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}— Set logging verbosity (default: INFO)
Outputs (example):
exp/<dataset>/<model_name>/<run_name>/
model/
model_last.pth
model_best.pth
tensorboard/
figures/
wandb/
The framework supports WandB-powered hyperparameter sweeps. To use sweeps:
- Define a sweep configuration in your config file:
# At the end of your config (e.g., TCN-reg.py)
sweep_configuration = {
"name": "TCN-reg-sweep",
"method": "grid", # or "random", "bayes"
"metric": {"goal": "minimize", "name": "loss"},
"parameters": {
"num_epochs": {"values": [5, 10, 15, 20, 30]},
"lr": {"values": [1e-2, 1e-3, 1e-4]},
"train_batch_size": {"values": [256, 512, 1024, 2048]},
},
}- Run the sweep:
python scripts/sweep.py -n TCN-reg
python scripts/sweep.py -n transformer-reg --logging --log-level DEBUGRequirements:
- WandB must be enabled in the config (
enable_wandb = True) - A
sweep_configurationdict must be defined in the config - Setting a
seedis recommended for reproducibility across sweep runs
Notes:
- Sweep results are saved under
exp/<dataset>/<model>/sweeps/<sweep_name>/ - The sweep will override config values (e.g.,
lr,num_epochs) with values from the sweep search space - Metrics logged depend on the task: regression logs
loss,mae,rmse,r2_score; classification logsloss,avg_sample_accuracy,avg_class_accuracy
Visualize predictions from a trained model checkpoint:
python scripts/eval_visualization.py -n <config_name> -r <run_name> [--split train|val|test] [--interactive]Example:
python scripts/eval_visualization.py -n TCN-reg -r witty-salamander-4 --split test
python scripts/eval_visualization.py -n transformer-reg -r clever-fox-2 --interactiveExport a trained checkpoint to ONNX format for deployment:
python scripts/export_onnx_model.py -n <config_name> -r <run_name> [--dynamic-batch] [--dynamic-seq] [--opset <VERSION_NUM>]The exported ONNX model is saved under: exp/<dataset>/<model_name>/<run_name>/onnx/
Example:
python scripts/export_onnx_model.py -n TCN-reg -r witty-salamander-4
python scripts/export_onnx_model.py -n transformer-reg -r clever-fox-2 --dynamic-batch --opset 17
python scripts/export_onnx_model.py -n transformer-cls -r ugly-fork-42 --dynamic-batch --dynamic-seqIt is recommended to set the --dynamic-batch flag, since in deployment, the batch size would usually differ from the batch sized when trained. The --dynamic-seq should only be used if you know that the architecture supports a dynamic sequence length.
- TensorBoard:
tensorboard --logdir exp/<dataset>/<model_name>/<run_name>/tensorboard - Weights & Biases: open the run page (if enabled in config).
| Task | Where |
|---|---|
| New model | Implement in nnlibrary/models/, import new model in /nnlibrary/model/__init__.py, then reference in your config |
| Custom hook | Subclass Hookbase in nnlibrary/engines/hooks.py, add to config hooks list |
| New loss or scheduler | Add to nnlibrary/utils/loss.py or nnlibrary/utils/schedulers.py |
| New transform | Add to nnlibrary/utils/transforms.py (e.g. target normalization/standardization) |
| New dataset | Implement in nnlibrary/datasets/, import in nnlibrary/datasets/__init__.py, then reference in config |
from nnlibrary.engines import Trainer
import nnlibrary.configs.HVACModeMLP as cfg
trainer = Trainer(cfg=cfg)
trainer.train() # trains, validates (if enabled), checkpoints- Single source of truth: configs hold knobs; trainer/hook code stays lean.
- Pluggable lifecycle: hooks avoid subclass explosion on the trainer.
- Evaluation symmetry: same evaluators for validation & test; consistent metric dict.
- Traceability: run artifacts self-contained under
exp/. - Modularization allows for easier debugging.