Skip to content

Gradient Checkpointing, Serialized Tokenizers, initial implementation of schema generation

Compare
Choose a tag to compare
@minimaxir minimaxir released this 23 Feb 04:21
· 61 commits to master since this release
8dbc362

0.4.0 is a big release! The release of transformers 4.3.0 caused a breaking issue which required a more prompt release; a bug fix 0.4.1 release is likely. (and new documentation is in the process as well). I have demos of new features planned as well!

Update transformers and pytorch-lightning

The minimum version of transformers has been bumped to 4.3.0, which has a number of performance improvements such as faster GPT-2 training and generation. Fast tokenizers are now the default package-wide as well. pytorch-lighting was bumped to 1.2.0 albeit that is less exciting.

tokenizers was removed as a dependency since transformers pins its own. Speaking of...

Serialized custom tokenizers.

By default, train_tokenizer() will create a serialized, one-file tokenizer. (e.g. aitextgen.tokenizer.json). This file will also correctly support added_tokens parameter.

You can load the tokenizer when loading an aitextgen model with the tokenizer_file param. (you can still use the merges_file and vocab_file params if you have them.

Gradient checkpointing + Layer freezing

Gradient checkpointing is now supported for GPT-2 models, allowing finetuning of larger models such as the 355M and 774M GPT-2 models!

This also enabled the 1558M model to be finetuned, in theory. I also added the ability to freeze layers to allow the model to be trained within VRAM constraints, but the results are mixed. More analysis will be done.

Schema-based generation

A draft implementation of schema based generation (leveraging the new custom tokenizers)

Misc bug fixes

  • Fix the TensorFlow weights URL
  • Fixed issue where prompt character length was used to check for a too-long assert instead of prompt token length (#90)
  • Workaround breaking issue in Transformers 4.3.0 by moving special token stripping into aitextgen instead of the tokenizer (#90)
  • Added an lstrip param to generation, which strips all whitespace at the beginning of generated text (related to point above)
  • The refresh rate of training is every 20 steps by default. (for better performance in Jupyter/Colab)