Sequential Group Composition

A Window into the Mechanics of Deep Learning

Paper • PDF • Install • Usage • Citation

Giovanni Luca Marchetti* · Daniel Kunin* · Adele Myers · Francisco Acosta · Nina Miolane

How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation?

We introduce the sequential group composition task: networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. We prove that two-layer networks learn this task one irreducible representation at a time, in an order determined by the Fourier statistics of the encoding -- producing a characteristic staircase in the training loss.

Two-layer networks learn group composition one irreducible representation at a time. Top: power spectrum of the learned function over training. Bottom: training loss showing a characteristic staircase. Each column is a different finite group.

Installation

Prerequisites

Conda (Miniconda or Anaconda)
gfortran (Linux only, required by some numerical dependencies)

Setup

# Linux only: install gfortran
sudo apt install -y gfortran

# Create and activate the conda environment
conda env create -f conda.yaml
conda activate group-agf

# Install all Python dependencies (pinned versions from poetry.lock)
poetry install

Usage

Single Run

Train a model on a specific group:

python src/main.py --config src/config_d3.yaml

Results (loss curves, predictions, power spectra) are saved to a timestamped directory under runs/.

Supported Groups

The repository includes preconfigured experiments for five groups:

Group	Config	Order	Architecture
Cyclic $C_{10}$	`src/config_c10.yaml`	10	QuadraticRNN
Product $C_4 \times C_4$	`src/config_c4x4.yaml`	16	QuadraticRNN
Dihedral $D_3$	`src/config_d3.yaml`	6	TwoLayerNet
Octahedral $O_h$	`src/config_octahedral.yaml`	24	TwoLayerNet
Icosahedral $A_5$	`src/config_a5.yaml`	60	TwoLayerNet

Parameter Sweeps

Run experiments across multiple configurations and random seeds:

python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml

Multi-GPU support:

# Auto-detect and use all available GPUs
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml --gpus auto

# Use specific GPUs
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml --gpus 0,1,2,3

Sweep results are saved to sweeps/{sweep_name}_{timestamp}/ with per-seed results and aggregated summaries.

Configuration

Key parameters in the YAML config files:

Parameter	Options	Description
`data.group_name`	`cn`, `cnxcn`, `dihedral`, `octahedral`, `A5`	Group to learn
`data.k`	integer	Number of elements to compose
`data.template_type`	`mnist`, `fourier`, `gaussian`, `onehot`, `custom_fourier`	Template generation method
`model.model_type`	`QuadraticRNN`, `SequentialMLP`, `TwoLayerNet`	Architecture
`model.hidden_dim`	integer	Hidden layer size
`model.init_scale`	float	Weight initialization scale
`training.optimizer`	`auto`, `adam`, `per_neuron`, `hybrid`	Optimizer (`auto` recommended)
`training.learning_rate`	float	Base learning rate
`training.mode`	`online`, `offline`	Training mode
`training.epochs`	integer	Number of epochs (offline mode)

Example config -- D3 with custom Fourier template

data:
  group_name: dihedral
  group_n: 3
  k: 2
  template_type: custom_fourier
  powers: [0.0, 30.0, 3000.0]

model:
  model_type: TwoLayerNet
  hidden_dim: 180
  init_scale: 0.001

training:
  optimizer: per_neuron
  learning_rate: 0.01
  mode: offline
  epochs: 2000

Repository Structure

group-agf/
├── src/                          # Source code
│   ├── main.py                   # Training entry point (CLI)
│   ├── model.py                  # TwoLayerNet, QuadraticRNN, SequentialMLP
│   ├── optimizer.py              # PerNeuronScaledSGD, HybridRNNOptimizer
│   ├── dataset.py                # Dataset generation and loading
│   ├── template.py               # Template construction functions
│   ├── fourier.py                # Group Fourier transforms
│   ├── power.py                  # Power spectrum computation
│   ├── viz.py                    # Plotting and visualization
│   ├── train.py                  # Training loops (offline and online)
│   ├── run_sweep.py              # Parameter sweep runner
│   └── config_*.yaml             # Group-specific configurations
├── test/                         # Unit and integration tests
├── notebooks/                    # Jupyter notebooks for exploration
├── pyproject.toml                # Project metadata and dependencies
├── poetry.lock                   # Pinned dependency versions
└── conda.yaml                    # Conda environment specification

Module details

`model.py` -- Neural Network Architectures

Model	Description	Input
TwoLayerNet	Two-layer feedforward network with configurable nonlinearity (square, relu, tanh, gelu)	Flattened binary pair `(N, 2 * group_size)`
QuadraticRNN	Recurrent network: `h_t = (W_mix h_{t-1} + W_drive x_t)^2`	Sequence `(N, k, p)`
SequentialMLP	Feedforward MLP with k-th power activation, permutation-invariant for commutative groups	Sequence `(N, k, p)`

`optimizer.py` -- Custom Optimizers

Optimizer	Description	Recommended for
PerNeuronScaledSGD	SGD with per-neuron learning rate scaling exploiting model homogeneity	SequentialMLP, TwoLayerNet
HybridRNNOptimizer	Scaled SGD for MLP weights + Adam for recurrent weights	QuadraticRNN
Adam (PyTorch built-in)	Standard Adam	QuadraticRNN

`dataset.py` -- Data Generation

Online datasets: OnlineModularAdditionDataset1D, OnlineModularAdditionDataset2D -- generate samples on-the-fly (GPU-accelerated)
Offline builders: build_modular_addition_sequence_dataset_1d, _2d, _D3, _generic
Group datasets: cn_dataset, cnxcn_dataset, group_dataset -- full group multiplication tables for TwoLayerNet

`template.py` -- Template Construction

Group templates: one_hot, fixed_cn, fixed_cnxcn, fixed_group
1D synthetic: fourier_1d, gaussian_1d, onehot_1d
2D synthetic: gaussian_mixture_2d, unique_freqs_2d, fixed_2d, hexagon_tie_2d, ring_isotropic_2d, gaussian_2d
MNIST-based: mnist, mnist_1d, mnist_2d

`fourier.py` -- Group Fourier Transforms

group_fourier(group, template) -- Fourier coefficients via irreducible representations
group_fourier_inverse(group, fourier_coefs) -- reconstruct template from Fourier coefficients

`power.py` -- Power Spectrum Analysis

GroupPower -- power spectrum of a template over any escnn group
CyclicPower -- specialized for cyclic groups via FFT
model_power_over_time -- track how the model's learned power spectrum evolves during training
theoretical_loss_levels_1d, _2d -- predict staircase loss plateaus from template power

`viz.py` -- Visualization

Plotting functions for training analysis: plot_train_loss_with_theory, plot_predictions_1d, plot_predictions_2d, plot_predictions_group, plot_power_1d, plot_power_group, plot_wmix_structure, plot_irreps, and more.

`train.py` -- Training Loops

train(model, loader, criterion, optimizer, ...) -- epoch-based offline training
train_online(model, loader, criterion, optimizer, ...) -- step-based online training

Testing

# All tests (unit + integration)
pytest test/ -v

# Notebook tests only (requires jupyter/nbconvert)
NOTEBOOK_TEST_MODE=1 pytest test/test_notebooks.py -v

Development

# Install pre-commit hooks
pre-commit install

# Run linting
ruff check .
ruff format --check .

Citation

If you find this work useful, please cite:

@article{marchetti2026sequential,
  title   = {Sequential Group Composition: A Window into the Mechanics of Deep Learning},
  author  = {Marchetti, Giovanni Luca and Kunin, Daniel and Myers, Adele and Acosta, Francisco and Miolane, Nina},
  journal = {arXiv preprint arXiv:2602.03655},
  year    = {2026}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.github/workflows		.github/workflows
assets		assets
notebooks		notebooks
src		src
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
conda.yaml		conda.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequential Group Composition

A Window into the Mechanics of Deep Learning

Installation

Prerequisites

Setup

Usage

Single Run

Supported Groups

Parameter Sweeps

Configuration

Repository Structure

`model.py` -- Neural Network Architectures

`optimizer.py` -- Custom Optimizers

`dataset.py` -- Data Generation

`template.py` -- Template Construction

`fourier.py` -- Group Fourier Transforms

`power.py` -- Power Spectrum Analysis

`viz.py` -- Visualization

`train.py` -- Training Loops

Testing

Development

Citation

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

geometric-intelligence/group-agf

Folders and files

Latest commit

History

Repository files navigation

Sequential Group Composition

A Window into the Mechanics of Deep Learning

Installation

Prerequisites

Setup

Usage

Single Run

Supported Groups

Parameter Sweeps

Configuration

Repository Structure

model.py -- Neural Network Architectures

optimizer.py -- Custom Optimizers

dataset.py -- Data Generation

template.py -- Template Construction

fourier.py -- Group Fourier Transforms

power.py -- Power Spectrum Analysis

viz.py -- Visualization

train.py -- Training Loops

Testing

Development

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

`model.py` -- Neural Network Architectures

`optimizer.py` -- Custom Optimizers

`dataset.py` -- Data Generation

`template.py` -- Template Construction

`fourier.py` -- Group Fourier Transforms

`power.py` -- Power Spectrum Analysis

`viz.py` -- Visualization

`train.py` -- Training Loops

Packages