A Variational Autoencoder (VAE) trained on anime face images. The model learns a compressed latent representation of anime faces and can generate new ones by sampling from that latent space.
Training: image → Encoder → (μ, σ) → sample z → Decoder → reconstructed image
Generation: random z ~ N(0,1) → Decoder → new anime face
The encoder compresses a 64×64 RGB image (12,288 values) into a 128-dimensional latent vector. The decoder reverses this. KL divergence loss keeps the latent space organized so random sampling produces valid faces.
Sampled from random z ~ N(0,1) — no input image used.
Top row: original images. Bottom row: encoder → decoder output.
| Component | Layers |
|---|---|
| Encoder | Linear 12288→1024→256, then two heads: μ and log σ² (128 each) |
| Decoder | Linear 128→256→1024→12288, Tanh output |
| Latent dim | 128 |
| Loss | MSE reconstruction + KL divergence |
Anime Face Dataset — place images in ./data/anime/images/.
pip install -r requirements.txt
python cvae_anime.py| Parameter | Value |
|---|---|
| Image size | 64×64 |
| Batch size | 64 |
| Epochs | 50 |
| Learning rate | 1e-3 |
| Latent dim | 128 |
Every 5 epochs the script saves:
| Folder | What |
|---|---|
generated_anime/ |
64 new faces sampled from random z |
reconstructed_anime/ |
Original (top row) vs reconstructed (bottom row) |
interpolated_anime/ |
Smooth walk between two faces in latent space |

