My minimal/hackable implementation of Pixart-alpha from the paper, PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis . (Written in pytorch)
I wanted to train a small text2image model but.. trust me, the original Pixart codebase(all image gen codebases) is something else, definitely not minimal. So I decided to spend time understanding and recreating it (both from paper). It's more fun to create my own version anyway :)
PS: Currently working on the training/sampling(and actually training a tiny model).