This release doubles down on transformers and introduces a training loop program hala
. Pretraining bidirectional models with token denoising objective (aka masked LM) is available hala --objective denoise
. The first training run on uk4b dataset is happening here: https://wandb.ai/stud76/ha/runs/tjoqx491?workspace=user-stud76
Existing causal models can now be finetuned with conditional language modeling objective hala --objective cond
.
hat
is now a repl for both causal and bidirectional models. The hat
repl now supports history thanks to readline.
RNN training program hal
now supports training from u16
binary datasets like hala
. This allowed me to train a world model on VQ-VAE-tokenized images.
New randomly initialized checkpoints can be created with new the hai
program.