A framework for training Transformer models on synthetic data generated by Epsilon Machines (Hidden Markov Models) and analyzing their ability to learn statistical dependencies.
This project enables:
- Process Generation: Defining stochastic processes (e.g., Mess3, RRXOR) via HMMs.
- Transformer Training: Training PyTorch models to predict these processes.
- Analysis: Measuring performance using vectorized KL Divergence metrics against empirical N-Gram statistics and ground-truth Markov states.
-
Clone the repository:
git clone https://github.com/ilatims-b/epsilon-transformers.git cd epsilon-transformers -
Install dependencies:
pip install -e .
Define experiments using YAML configuration files.
model: d_model: 256 n_head: 4 n_layers: 2 n_ctx: 10 d_vocab: 5
dataset: process: "mess3" batch_size: 64 num_tokens: 5000000
kl_analysis: ngram_analysis: enabled: true n_values: markov_kl_analysis: enabled: true
Run the training script with your configuration:
The training pipeline automatically:
- Builds empirical N-Gram statistics from the training set (GPU-accelerated).
- Trains the model on the synthetic process.
- Evaluates KL divergence metrics (N-Gram and Markov) periodically.
- epsilon_transformers/process: Contains the HMM process definitions and dataset generators.
- epsilon_transformers/training: Training logic and configuration schemas using Pydantic.
- epsilon_transformers/analysis: Vectorized analyzers for computing N-Gram and Markov KL divergence.
- test/loss: Cross-entropy loss.
- test/relative_loss: Ratio of loss to the theoretical Myopic Entropy.
- test/kl_div_ngram_N: KL divergence between model predictions and empirical N-gram frequencies.
- test/kl_div_markov: KL divergence between model predictions and the true hidden Markov process.