Skip to content

learningnoobi/anime-vae

Repository files navigation

Anime Face VAE

A Variational Autoencoder (VAE) trained on anime face images. The model learns a compressed latent representation of anime faces and can generate new ones by sampling from that latent space.

How It Works

Training:   image → Encoder → (μ, σ) → sample z → Decoder → reconstructed image
Generation: random z ~ N(0,1) → Decoder → new anime face

The encoder compresses a 64×64 RGB image (12,288 values) into a 128-dimensional latent vector. The decoder reverses this. KL divergence loss keeps the latent space organized so random sampling produces valid faces.

Results (Epoch 35)

Generated Faces

Sampled from random z ~ N(0,1) — no input image used.

Generated epoch 35

Reconstructed Faces

Top row: original images. Bottom row: encoder → decoder output.

Reconstructed epoch 35

Architecture

Component Layers
Encoder Linear 12288→1024→256, then two heads: μ and log σ² (128 each)
Decoder Linear 128→256→1024→12288, Tanh output
Latent dim 128
Loss MSE reconstruction + KL divergence

Dataset

Anime Face Dataset — place images in ./data/anime/images/.

Setup

pip install -r requirements.txt
python cvae_anime.py

Config

Parameter Value
Image size 64×64
Batch size 64
Epochs 50
Learning rate 1e-3
Latent dim 128

Outputs

Every 5 epochs the script saves:

Folder What
generated_anime/ 64 new faces sampled from random z
reconstructed_anime/ Original (top row) vs reconstructed (bottom row)
interpolated_anime/ Smooth walk between two faces in latent space

About

Trying to generate anime face using vae architechture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors