Paper • PDF • Install • Usage • Citation
Giovanni Luca Marchetti* · Daniel Kunin* · Adele Myers · Francisco Acosta · Nina Miolane
How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation?
We introduce the sequential group composition task: networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. We prove that two-layer networks learn this task one irreducible representation at a time, in an order determined by the Fourier statistics of the encoding -- producing a characteristic staircase in the training loss.
Two-layer networks learn group composition one irreducible representation at a time. Top: power spectrum of the learned function over training. Bottom: training loss showing a characteristic staircase. Each column is a different finite group.
- Conda (Miniconda or Anaconda)
gfortran(Linux only, required by some numerical dependencies)
# Linux only: install gfortran
sudo apt install -y gfortran
# Create and activate the conda environment
conda env create -f conda.yaml
conda activate group-agf
# Install all Python dependencies (pinned versions from poetry.lock)
poetry installTrain a model on a specific group:
python src/main.py --config src/config_d3.yamlResults (loss curves, predictions, power spectra) are saved to a timestamped directory under runs/.
The repository includes preconfigured experiments for five groups:
| Group | Config | Order | Architecture |
|---|---|---|---|
| Cyclic |
src/config_c10.yaml |
10 | QuadraticRNN |
| Product |
src/config_c4x4.yaml |
16 | QuadraticRNN |
| Dihedral |
src/config_d3.yaml |
6 | TwoLayerNet |
| Octahedral |
src/config_octahedral.yaml |
24 | TwoLayerNet |
| Icosahedral |
src/config_a5.yaml |
60 | TwoLayerNet |
Run experiments across multiple configurations and random seeds:
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yamlMulti-GPU support:
# Auto-detect and use all available GPUs
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml --gpus auto
# Use specific GPUs
python src/run_sweep.py --sweep src/sweep_configs/example_sweep.yaml --gpus 0,1,2,3Sweep results are saved to sweeps/{sweep_name}_{timestamp}/ with per-seed results and aggregated summaries.
Key parameters in the YAML config files:
| Parameter | Options | Description |
|---|---|---|
data.group_name |
cn, cnxcn, dihedral, octahedral, A5 |
Group to learn |
data.k |
integer | Number of elements to compose |
data.template_type |
mnist, fourier, gaussian, onehot, custom_fourier |
Template generation method |
model.model_type |
QuadraticRNN, SequentialMLP, TwoLayerNet |
Architecture |
model.hidden_dim |
integer | Hidden layer size |
model.init_scale |
float | Weight initialization scale |
training.optimizer |
auto, adam, per_neuron, hybrid |
Optimizer (auto recommended) |
training.learning_rate |
float | Base learning rate |
training.mode |
online, offline |
Training mode |
training.epochs |
integer | Number of epochs (offline mode) |
Example config -- D3 with custom Fourier template
data:
group_name: dihedral
group_n: 3
k: 2
template_type: custom_fourier
powers: [0.0, 30.0, 3000.0]
model:
model_type: TwoLayerNet
hidden_dim: 180
init_scale: 0.001
training:
optimizer: per_neuron
learning_rate: 0.01
mode: offline
epochs: 2000group-agf/
├── src/ # Source code
│ ├── main.py # Training entry point (CLI)
│ ├── model.py # TwoLayerNet, QuadraticRNN, SequentialMLP
│ ├── optimizer.py # PerNeuronScaledSGD, HybridRNNOptimizer
│ ├── dataset.py # Dataset generation and loading
│ ├── template.py # Template construction functions
│ ├── fourier.py # Group Fourier transforms
│ ├── power.py # Power spectrum computation
│ ├── viz.py # Plotting and visualization
│ ├── train.py # Training loops (offline and online)
│ ├── run_sweep.py # Parameter sweep runner
│ └── config_*.yaml # Group-specific configurations
├── test/ # Unit and integration tests
├── notebooks/ # Jupyter notebooks for exploration
├── pyproject.toml # Project metadata and dependencies
├── poetry.lock # Pinned dependency versions
└── conda.yaml # Conda environment specification
Module details
| Model | Description | Input |
|---|---|---|
| TwoLayerNet | Two-layer feedforward network with configurable nonlinearity (square, relu, tanh, gelu) | Flattened binary pair (N, 2 * group_size) |
| QuadraticRNN | Recurrent network: h_t = (W_mix h_{t-1} + W_drive x_t)^2 |
Sequence (N, k, p) |
| SequentialMLP | Feedforward MLP with k-th power activation, permutation-invariant for commutative groups | Sequence (N, k, p) |
| Optimizer | Description | Recommended for |
|---|---|---|
| PerNeuronScaledSGD | SGD with per-neuron learning rate scaling exploiting model homogeneity | SequentialMLP, TwoLayerNet |
| HybridRNNOptimizer | Scaled SGD for MLP weights + Adam for recurrent weights | QuadraticRNN |
| Adam (PyTorch built-in) | Standard Adam | QuadraticRNN |
- Online datasets:
OnlineModularAdditionDataset1D,OnlineModularAdditionDataset2D-- generate samples on-the-fly (GPU-accelerated) - Offline builders:
build_modular_addition_sequence_dataset_1d,_2d,_D3,_generic - Group datasets:
cn_dataset,cnxcn_dataset,group_dataset-- full group multiplication tables for TwoLayerNet
- Group templates:
one_hot,fixed_cn,fixed_cnxcn,fixed_group - 1D synthetic:
fourier_1d,gaussian_1d,onehot_1d - 2D synthetic:
gaussian_mixture_2d,unique_freqs_2d,fixed_2d,hexagon_tie_2d,ring_isotropic_2d,gaussian_2d - MNIST-based:
mnist,mnist_1d,mnist_2d
group_fourier(group, template)-- Fourier coefficients via irreducible representationsgroup_fourier_inverse(group, fourier_coefs)-- reconstruct template from Fourier coefficients
GroupPower-- power spectrum of a template over anyescnngroupCyclicPower-- specialized for cyclic groups via FFTmodel_power_over_time-- track how the model's learned power spectrum evolves during trainingtheoretical_loss_levels_1d,_2d-- predict staircase loss plateaus from template power
Plotting functions for training analysis: plot_train_loss_with_theory, plot_predictions_1d, plot_predictions_2d, plot_predictions_group, plot_power_1d, plot_power_group, plot_wmix_structure, plot_irreps, and more.
train(model, loader, criterion, optimizer, ...)-- epoch-based offline trainingtrain_online(model, loader, criterion, optimizer, ...)-- step-based online training
# All tests (unit + integration)
pytest test/ -v
# Notebook tests only (requires jupyter/nbconvert)
NOTEBOOK_TEST_MODE=1 pytest test/test_notebooks.py -v# Install pre-commit hooks
pre-commit install
# Run linting
ruff check .
ruff format --check .If you find this work useful, please cite:
@article{marchetti2026sequential,
title = {Sequential Group Composition: A Window into the Mechanics of Deep Learning},
author = {Marchetti, Giovanni Luca and Kunin, Daniel and Myers, Adele and Acosta, Francisco and Miolane, Nina},
journal = {arXiv preprint arXiv:2602.03655},
year = {2026}
}This project is licensed under the MIT License. See LICENSE for details.
