Skip to content

ilatims-b/epsilon-transformers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

339 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Epsilon Transformers

A framework for training Transformer models on synthetic data generated by Epsilon Machines (Hidden Markov Models) and analyzing their ability to learn statistical dependencies.

Overview

This project enables:

  1. Process Generation: Defining stochastic processes (e.g., Mess3, RRXOR) via HMMs.
  2. Transformer Training: Training PyTorch models to predict these processes.
  3. Analysis: Measuring performance using vectorized KL Divergence metrics against empirical N-Gram statistics and ground-truth Markov states.

Installation

  1. Clone the repository:

    git clone https://github.com/ilatims-b/epsilon-transformers.git
    cd epsilon-transformers
    
  2. Install dependencies:

    pip install -e .
    

Usage

1. Configuration

Define experiments using YAML configuration files.

model: d_model: 256 n_head: 4 n_layers: 2 n_ctx: 10 d_vocab: 5

dataset: process: "mess3" batch_size: 64 num_tokens: 5000000

kl_analysis: ngram_analysis: enabled: true n_values:​ markov_kl_analysis: enabled: true

2. Training

Run the training script with your configuration:

The training pipeline automatically:

  • Builds empirical N-Gram statistics from the training set (GPU-accelerated).
  • Trains the model on the synthetic process.
  • Evaluates KL divergence metrics (N-Gram and Markov) periodically.

Key Components

  • epsilon_transformers/process: Contains the HMM process definitions and dataset generators.
  • epsilon_transformers/training: Training logic and configuration schemas using Pydantic.
  • epsilon_transformers/analysis: Vectorized analyzers for computing N-Gram and Markov KL divergence.

Metrics

  • test/loss: Cross-entropy loss.
  • test/relative_loss: Ratio of loss to the theoretical Myopic Entropy.
  • test/kl_div_ngram_N: KL divergence between model predictions and empirical N-gram frequencies.
  • test/kl_div_markov: KL divergence between model predictions and the true hidden Markov process.

About

epsilon machines and transformers!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%