SynthBench

Benchmarking synthetic data generation methods for visual classification. Measures how well models trained on AI-generated images transfer to real-world photos in zero-shot and few-shot settings.

Dataset & Models: huggingface.co/datasets/LakshC/SynthBench

Approach

Synthetic data generation — create training images via text-to-image models and programmatic augmentation
Zero-shot evaluation — train on synthetic data only, test on real photos
Few-shot evaluation — mix synthetic data with 5/10/25/50 real examples per class
Ablations — prompt diversity and dataset size experiments

Classes: mouse, pen, phone, laptop, water bottle, Rubik's cube

Model: ResNet-18 (via timm)

Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Verify PyTorch MPS (Apple Silicon GPU):

python -c "import torch; print(torch.backends.mps.is_available())"

Data Pipeline

1. Collect real images

Take photos with iPhone of each object class. Organize into:

data/raw/
├── mouse/
├── pen/
├── phone/
├── laptop/
├── water_bottle/
└── rubiks_cube/

2. Preprocess

Convert HEIC → JPEG, resize to 224x224, split into train/test:

python scripts/preprocess.py --input data/raw --output data/real --split 0.2

Options: --size (default 224), --quality (default 95), --seed (default 42), --classes

Project Structure

├── data/
│   ├── raw/              # Raw iPhone photos (HEIC)
│   ├── real/             # Preprocessed real images (train/test)
│   ├── synthetic_t2i/    # Text-to-image generated
│   └── synthetic_aug/    # Programmatic augmentation
├── scripts/
│   ├── preprocess.py     # HEIC conversion + resize
│   ├── generate_t2i.py   # Text-to-image generation
│   ├── generate_aug.py   # Programmatic augmentation
│   ├── train.py          # Model training
│   ├── evaluate.py       # Evaluation + metrics
│   ├── few_shot.py       # Few-shot experiments
│   ├── ablations.py      # Ablation studies
│   └── visualize.py      # Charts and figures
├── results/              # Saved metrics
├── figures/              # Generated charts
└── models/               # Saved checkpoints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynthBench

Approach

Setup

Data Pipeline

1. Collect real images

2. Preprocess

Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
figures		figures
models		models
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

LakshyaChaudhry/SynthBench

Folders and files

Latest commit

History

Repository files navigation

SynthBench

Approach

Setup

Data Pipeline

1. Collect real images

2. Preprocess

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages