Skip to content

Conditional denoising diffusion probabilistic model trained in latent space.

Notifications You must be signed in to change notification settings

artem-gorodetskii/WikiArt-Latent-Diffusion

Repository files navigation

WikiArt-Latent-Diffusion

Conditional denoising diffusion probabilistic model trained in latent space to generate paintings by famous artists. See the animation of the latent diffusion process in the figure below.

img-name
Fig. 1. The animation of the latent diffusion process.

Generalization to Different Sizes

The model is able to generalize to different image sizes. See generated examples below.

img-name
Fig. 2. Generated painting in the style of Ivan Aivazovsky.

img-name
Fig. 3. Generated painting in the style of Ivan Aivazovsky.

img-name
Fig. 4. Generated painting in the style of Ivan Aivazovsky.

img-name
Fig. 5. Generated painting in the style of Martiros Saryan.

img-name
Fig. 6. Generated painting in the style of Camille Pissarro.

img-name
Fig. 7. Generated painting in the style of Pyotr Konchalovsky.

img-name
Fig. 8. Generated painting in the style of Pierre Auguste Renoir.

Repository structure:

Dataset

We used the WikiArt dataset containing 81444 pieces of visual art from various artists. All images were cropped and resized to 512x512 resolution. To convert images into latent representation we apply the pretrained VQ-VAE from the Stable Diffusion model implemented by StabilityAI.

Diffusion Model

We adapted 2D UNet model from Hugging Face diffusers package by adding three additional embedding layers to control paining style, including artist name, genre name and style name. Before adding the style embedding to time embedding, we pass each type of style embedding through PreNet modules.

The network is trained to predict the unscaled noise component using Huber loss function (it produces better results on this dataset compared to L2 loss). During evaluation, the generated latent representations are decoded into images using the pretrained VQ-VAE.

About

Conditional denoising diffusion probabilistic model trained in latent space.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published