Skip to content

jpbronsard/syntonic-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Syntonic Optimizer

A theory-derived optimizer that matches Adam with zero hyperparameter tuning.

The Syntonic optimizer dynamically infers the optimal learning rate timescale from gradient statistics using the scaling law:

$$\tau^* = \kappa \sqrt{\frac{\sigma^2}{\lambda}}$$

where σ² is the gradient variance, λ is the innovation rate (how fast the gradient landscape is changing), and κ = 1.0 is theory-determined — not tuned.

Adam's fixed EMA constants (β₁=0.9, β₂=0.999) accidentally encode temporal scales that happen to match typical training regimes. The Syntonic optimizer makes this structure explicit by computing τ* dynamically.


Results

CIFAR-10 — Multi-Regime Shift (10 classes)

Phase Adam Syntonic Δ
Baseline (bs=128) 80.48% 82.37% +1.89%
Large Batch (bs=512) 86.86% 84.38% −2.48%
Gradient Noise (σ=0.1) 82.32% 82.30% −0.02%
Label Noise (20%) 85.13% 84.95% −0.18%
Recovery (bs=64) 86.70% 87.04% +0.34%

CIFAR-10 Results

CIFAR-100 — Multi-Regime Shift (100 classes)

Metric Adam Syntonic Δ
Final accuracy 62.56% 61.79% −0.77%

CIFAR-100 Results

Summary

Benchmark Adam Syntonic Δ Tuning required
CIFAR-10 (10 classes) 86.70% 87.04% +0.34% None
CIFAR-100 (100 classes) 62.56% 61.79% −0.77% None

Both optimizers use identical conditions: same seed (42), same architecture, same data augmentation, same weight decay (1e-4), same gradient clipping (1.0). Neither optimizer is retuned across the five training phases.


Key Insight

Look at the bottom-left panel in the figures above: τ* Dynamic Adaptation.

Adam's temporal scale (dashed line, τ₁≈10) is fixed. The Syntonic optimizer dynamically adjusts τ* in response to regime changes — it increases during gradient noise and label noise phases, then recovers. This is inference vs coincidence.

The bottom-right panel (Effective Learning Rate) shows the consequence: Adam stays flat at 10⁻³ regardless of regime. The Syntonic optimizer modulates its learning rate based on the local σ²/λ ratio.


Quick Start

Both notebooks run directly in Google Colab (free GPU):

Notebook Colab
CIFAR-10 Multi-Regime Open In Colab
CIFAR-100 Multi-Regime Open In Colab

Expected runtime: 30 minutes per notebook on Colab free tier (T4 GPU).


The Optimizer (complete code)

class SyntonicV4(torch.optim.Optimizer):
    """
    Syntonic V4 — Theory-derived adaptive optimizer.

    Update:  p -= (base_lr × κ / τ*) ⊙ (m̂ / (√v̂ + ε))
    Tempo:   τ* = κ √(σ²/λ)  [per-element, clamped]

    κ = 1.0 is the theoretical value (not tuned).
    """
    def __init__(self, params, base_lr=0.001, kappa=1.0,
                 tau_min=1.0, tau_max=500.0,
                 beta_m=0.9, beta_v=0.999,
                 beta_sigma=0.999, beta_lambda=0.99,
                 eps=1e-8, weight_decay=0.0):
        # ... (see notebooks for full implementation)

The full implementation is ~80 lines of PyTorch. See the notebooks for the complete, runnable code.


Five-Phase Protocol

Both optimizers are configured once at epoch 1 and never retuned:

Phase Epochs Perturbation Rationale
1 1–10 Batch size 128 (baseline) Establish convergence
2 11–20 Batch size 512 Variance landscape shift
3 21–30 Gradient noise σ=0.1 Stochastic perturbation
4 31–40 20% label corruption Task non-stationarity
5 41–50 Batch size 64, clean Re-adaptation

This protocol tests whether the optimizer adapts to regime shifts without human intervention.


Theory

The Syntony Principle proposes that the optimal integration time for any adaptive system operating under uncertainty follows:

$$\tau^* = \kappa \sqrt{\frac{\sigma^2}{\lambda}}$$

This is the geometric mean between the noise timescale and the innovation timescale. The result emerges independently from 10 mathematical derivations (Kalman-Riccati, bias-variance, dimensional analysis, information theory, optimal stopping, H∞ control, Cramér-Rao bounds, Allan variance, multi-agent consensus, and the UTAE axiomatic framework).

Full theoretical framework: Bronsard, J.-P. (2025). The Syntony Principle: A Structural Scaling Law for Adaptive Systems — V4.1 Canonical Edition. Zenodo. DOI: 10.5281/zenodo.17254395

Deep learning validation paper: DOI: 10.5281/zenodo.18527033


What's Next

ImageNet-scale validation (ResNet-50, ~25M parameters) is the natural next step. The key question: does the square-root scaling τ* = κ√(σ²/λ) persist in high-dimensional loss landscapes?

If you have access to multi-GPU compute and are interested in collaborating on this, please reach out.


Reproducibility

  • Seed: 42 (fixed for all experiments)
  • Framework: PyTorch
  • Hardware: Google Colab free tier (T4 GPU)
  • All results reproducible by running the notebooks end-to-end

Citation

@software{bronsard2025syntonic,
  author = {Bronsard, Jean-Pierre},
  title = {Syntonic Optimizer: Theory-Derived Adaptive Learning Rate from the Syntony Principle},
  year = {2025},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18527033},
  url = {https://doi.org/10.5281/zenodo.18527033}
}

License

CC BY 4.0 — You are free to use, share, and adapt this work with attribution.

Author

Jean-Pierre Bronsard SyntonicAI Recherche — Montréal, QC, Canada ORCID: 0009-0008-6639-7553

About

Theory-derived PyTorch optimizer matching Adam with zero tuning. τ* = κ√(σ²/λ) — validated on CIFAR-10 and CIFAR-100

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors