Skip to content

geometric-intelligence/group-agf

Repository files navigation

Sequential Group Composition

A Window into the Mechanics of Deep Learning

arXiv CI License: MIT

PaperPDFInstallUsageCitation

Giovanni Luca Marchetti* · Daniel Kunin* · Adele Myers · Francisco Acosta · Nina Miolane


How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation?

We introduce the sequential group composition task: networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. We prove that two-layer networks learn this task one irreducible representation at a time, in an order determined by the Fourier statistics of the encoding -- producing a characteristic staircase in the training loss.

Staircase learning across five finite groups

Two-layer networks learn group composition one irreducible representation at a time. Top: power spectrum of the learned function over training. Bottom: training loss showing a characteristic staircase. Each column is a different finite group.


Installation

Prerequisites

  • Conda (Miniconda or Anaconda)
  • gfortran (Linux only, required by some numerical dependencies)

Setup

# Linux only: install gfortran
sudo apt install -y gfortran

# Create and activate the conda environment
conda env create -f conda.yaml
conda activate group-agf

# Install all Python dependencies (pinned versions from poetry.lock)
poetry install

Usage

Single Run

Train a model on a specific group:

python src/main.py --config src/config_d3.yaml

Results (loss curves, predictions, power spectra) are saved to a timestamped directory under runs/.

Supported Groups

The repository includes preconfigured experiments for five groups:

Group Config Order Architecture
Cyclic $C_{10}$ src/config_c10.yaml 10 QuadraticRNN
Product $C_4 \times C_4$ src/config_c4x4.yaml 16 QuadraticRNN
Dihedral $D_3$ src/config_d3.yaml 6 TwoLayerNet
Octahedral $O_h$ src/config_octahedral.yaml 24 TwoLayerNet
Icosahedral $A_5$ src/config_a5.yaml 60 TwoLayerNet

Parameter Sweeps

Run experiments across multiple configurations and random seeds:

python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml

Multi-GPU support:

# Auto-detect and use all available GPUs
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml --gpus auto

# Use specific GPUs
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml --gpus 0,1,2,3

Sweep results are saved to sweeps/{sweep_name}_{timestamp}/ with per-seed results and aggregated summaries.

Configuration

Key parameters in the YAML config files:

Parameter Options Description
data.group_name cn, cnxcn, dihedral, octahedral, A5 Group to learn
data.k integer Number of elements to compose
data.template_type mnist, fourier, gaussian, onehot, custom_fourier Template generation method
model.model_type QuadraticRNN, SequentialMLP, TwoLayerNet Architecture
model.hidden_dim integer Hidden layer size
model.init_scale float Weight initialization scale
training.optimizer auto, adam, per_neuron, hybrid Optimizer (auto recommended)
training.learning_rate float Base learning rate
training.mode online, offline Training mode
training.epochs integer Number of epochs (offline mode)
Example config -- D3 with custom Fourier template
data:
  group_name: dihedral
  group_n: 3
  k: 2
  template_type: custom_fourier
  powers: [0.0, 30.0, 3000.0]

model:
  model_type: TwoLayerNet
  hidden_dim: 180
  init_scale: 0.001

training:
  optimizer: per_neuron
  learning_rate: 0.01
  mode: offline
  epochs: 2000

Repository Structure

group-agf/
├── src/                          # Source code
│   ├── main.py                   # Training entry point (CLI)
│   ├── model.py                  # TwoLayerNet, QuadraticRNN, SequentialMLP
│   ├── optimizer.py              # PerNeuronScaledSGD, HybridRNNOptimizer
│   ├── dataset.py                # Dataset generation and loading
│   ├── template.py               # Template construction functions
│   ├── fourier.py                # Group Fourier transforms
│   ├── power.py                  # Power spectrum computation
│   ├── viz.py                    # Plotting and visualization
│   ├── train.py                  # Training loops (offline and online)
│   ├── run_sweep.py              # Parameter sweep runner
│   └── config_*.yaml             # Group-specific configurations
├── test/                         # Unit and integration tests
├── notebooks/                    # Jupyter notebooks for exploration
├── pyproject.toml                # Project metadata and dependencies
├── poetry.lock                   # Pinned dependency versions
└── conda.yaml                    # Conda environment specification
Module details

model.py -- Neural Network Architectures

Model Description Input
TwoLayerNet Two-layer feedforward network with configurable nonlinearity (square, relu, tanh, gelu) Flattened binary pair (N, 2 * group_size)
QuadraticRNN Recurrent network: h_t = (W_mix h_{t-1} + W_drive x_t)^2 Sequence (N, k, p)
SequentialMLP Feedforward MLP with k-th power activation, permutation-invariant for commutative groups Sequence (N, k, p)

optimizer.py -- Custom Optimizers

Optimizer Description Recommended for
PerNeuronScaledSGD SGD with per-neuron learning rate scaling exploiting model homogeneity SequentialMLP, TwoLayerNet
HybridRNNOptimizer Scaled SGD for MLP weights + Adam for recurrent weights QuadraticRNN
Adam (PyTorch built-in) Standard Adam QuadraticRNN

dataset.py -- Data Generation

  • Online datasets: OnlineModularAdditionDataset1D, OnlineModularAdditionDataset2D -- generate samples on-the-fly (GPU-accelerated)
  • Offline builders: build_modular_addition_sequence_dataset_1d, _2d, _D3, _generic
  • Group datasets: cn_dataset, cnxcn_dataset, group_dataset -- full group multiplication tables for TwoLayerNet

template.py -- Template Construction

  • Group templates: one_hot, fixed_cn, fixed_cnxcn, fixed_group
  • 1D synthetic: fourier_1d, gaussian_1d, onehot_1d
  • 2D synthetic: gaussian_mixture_2d, unique_freqs_2d, fixed_2d, hexagon_tie_2d, ring_isotropic_2d, gaussian_2d
  • MNIST-based: mnist, mnist_1d, mnist_2d

fourier.py -- Group Fourier Transforms

  • group_fourier(group, template) -- Fourier coefficients via irreducible representations
  • group_fourier_inverse(group, fourier_coefs) -- reconstruct template from Fourier coefficients

power.py -- Power Spectrum Analysis

  • GroupPower -- power spectrum of a template over any escnn group
  • CyclicPower -- specialized for cyclic groups via FFT
  • model_power_over_time -- track how the model's learned power spectrum evolves during training
  • theoretical_loss_levels_1d, _2d -- predict staircase loss plateaus from template power

viz.py -- Visualization

Plotting functions for training analysis: plot_train_loss_with_theory, plot_predictions_1d, plot_predictions_2d, plot_predictions_group, plot_power_1d, plot_power_group, plot_wmix_structure, plot_irreps, and more.

train.py -- Training Loops

  • train(model, loader, criterion, optimizer, ...) -- epoch-based offline training
  • train_online(model, loader, criterion, optimizer, ...) -- step-based online training

Testing

# All tests (unit + integration)
pytest test/ -v

# Notebook tests only (requires jupyter/nbconvert)
NOTEBOOK_TEST_MODE=1 pytest test/test_notebooks.py -v

Development

# Install pre-commit hooks
pre-commit install

# Run linting
ruff check .
ruff format --check .

Citation

If you find this work useful, please cite:

@article{marchetti2026sequential,
  title   = {Sequential Group Composition: A Window into the Mechanics of Deep Learning},
  author  = {Marchetti, Giovanni Luca and Kunin, Daniel and Myers, Adele and Acosta, Francisco and Miolane, Nina},
  journal = {arXiv preprint arXiv:2602.03655},
  year    = {2026}
}

License

This project is licensed under the MIT License. See LICENSE for details.

About

Group Alternating Gradient Flows

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •