Releases: kuprel/min-dalle
Releases · kuprel/min-dalle
v0.4
- Fixed a criticial CUDA runtime error that occurred when generating tokens larger than the VQGAN's vocabulary
- Added
generate_images_stream
andgenerate_images
to generate individual images. Is in active use in discord bot. - Faster inference, can generate a 9x9 grid in 38 seconds on an A100
- Added
temperature
,top_k
, andsupercondition_factor
parameters - Added a simple TKinter UI (thanks to @20kdc)
- Added an option to tiles images in token space instead of pixel space. This creates a seamless effect where the borders between images are blended.
v0.3
- added
is_reusable
parameter. Turning it off saves memory (e.g. for command line script) and keeping it on makes multiple calls togenerate_image
faster - added
log2_k
parameter to control top-k image token sampling - added
log2_supercondition_factor
parameter to control the super conditioning amount - added
log2_mid_count
andgenerate_image_stream
to stream intermediate outputs. Incomplete tokens are detokenized to an image multiple times during the decoding process. This adds very little time to the overall run time - added
dtype
parameter to autocast operations tofloat32
,float16
, orbfloat16
- a grid size of 8x8 now generates in 35 seconds on an A100
v0.2
- Added to PyPI so now the entire setup process is
pip install min-dalle
- Pre-converted PyTorch weights are downloaded when needed from a Hugging Face hub, no more converting from flax
Breaking Changes
MinDalleTorch
is nowMinDalle
MinDalleFlax
and flax-to-torch conversion code have been moved to a different repository
v0.1.1
Important Bug Fixes
- Image tokens were mistakenly being computed twice in command line script when using torch
- Tokenizer was not working correctly on some machines previously (e.g. windows). Files are now read with ut8-encoding.
New Features
is_expendable
argument reduces memory usage for command line script by loading then unloading encoder/decoder/detokenizer when needed- simpler 4D
attention_state
replacing 5Dkeys_values_state
and faster inference time