This project is a complete implementation of Neural Style Transfer (NST) using a pretrained transformer network, enabling both image and video stylization using a fast, real-time feedforward model.
It includes:
- Style transfer for single images and videos
- A Streamlit-based GUI for a user-friendly experience
- CLI-based scripted support for training and inference pipelines
- Configurable settings like image width, temporal smoothing, and batch processing
- Downloaders for pretrained models and training datasets
| Feature | Description | 
|---|---|
| Image NST | Upload or choose images, apply artistic styles using a fast transformer net | 
| Video NST | Upload or choose videos, with optional temporal smoothing | 
| Streamlit UI | Intuitive web UI for both image and video stylization | 
| CLI Support | Script-based style transfer using configurable arguments | 
| Custom Model Training | Train your own models using MS-COCO or any dataset | 
.
βββ app.py                          # Streamlit GUI application
βββ image_nst_script.py             # Script for stylizing images
βββ video_nst_script.py             # Script for stylizing videos
βββ model_training_script.py        # Model training entrypoint
βββ models/
β   βββ definitions/
β   β   βββ transformer_net.py         # Transformer feedforward network
β   β   βββ perceptual_loss_net.py     # VGG16-based perceptual loss extractor
β   βββ binaries/                      # Pretrained .pth models
βββ utils/
β   βββ utils.py                         # Shared preprocessing, postprocessing, I/O, and dataset utils
β   βββ app_utils.py                     # Utility helpers for Streamlit app
β   βββ pretrained_models_downloader.py  # Script to download pre-trained style models
β   βββ training_dataset_downloader.py   # Script to download and extract COCO dataset
βββ data/
    βββ input/              # Input images and videos
    βββ styles/             # Styling base images
    βββ output/             # Stylized results
git clone https://github.com/your-username/neural-style-transfer.git
cd neural-style-transfer
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtThis must be run before using the GUI or CLI to stylize:
python utils/pretrained_models_downloader.pyThis will download pretrained .pth files and place them in models/binaries/.
To train your own style model, download the MS-COCO dataset:
python utils/training_dataset_downloader.pyThis downloads and extracts the COCO dataset under data/train/.
streamlit run app.py- Image Tab: Upload or select an image, choose a model, apply style, and download result.
- Video Tab: Upload or select a video, choose a model, optionally tune smoothing, and download result.
Train your own model with a content-style dataset:
python model_training_script.py --dataset_path ./data/train --style_image ./styles/starry_night.jpg        --epochs 2 --batch_size 4 --style_weight 5e5 --content_weight 1e0python image_nst_script.py --content_input lion.jpg --model_name mosaic.pth --img_width 512python video_nst_script.py --input_video sample.mp4 --model_name mosaic.pth --img_width 500 --smoothing_alpha 0.3- Streamlit GUI with two tabs: Image and Video
- For image:
- Uses stylize_static_image(config, return_pil=True)and shows original + styled image
 
- Uses 
- For video:
- Uses stylize_video(config)and applies frame-wise style with smoothing
 
- Uses 
- Defines stylize_static_image(config, return_pil=False)
- Loads model, processes either:
- A single image (returns PIL optionally)
- A directory (batch image processing)
 
- Frame-by-frame video processing using OpenCV
- Applies style using TransformerNet
- Uses cv2.addWeighted()if smoothing is enabled
- Saves stylized video
- Loads COCO dataset and chosen style image
- Computes perceptual loss using VGG
- Optimizes TransformerNet
- Supports live TensorBoard logs
- Feedforward CNN
- Structure:
- Conv β IN β ReLU
- 5 Residual Blocks
- Upsample + Conv + IN + ReLU
 
- Outputs stylized image in one pass
- Loads pretrained VGG16 from torchvision
- Extracts intermediate features (e.g., relu1_2, relu2_2, relu3_3) for loss computation
Core helpers:
- prepare_img(path, width, device)β tensor
- post_process_image(tensor)β RGB image
- save_and_maybe_display_image(config, img)β save logic
- SimpleDatasetβ supports batch image processing
- frame_to_tensor()and- tensor_to_frame()for video
- pil_to_bytes(pil_image)β converts PIL object for Streamlit download
- Downloads multiple pretrained .pthstyle models from known URLs
- Saves them into models/binaries/
- Mandatory before GUI or scripts can be run
- Downloads and unzips MS-COCO dataset
- Extracts train2014.zipintodata/train/train2014/
| Model File | Style | 
|---|---|
| vg_starry_night.pth | Vincent van Goghβs Starry Night | 
| candy.pth | Bright pastel stroke style | 
Place these inside:
models/binaries/
| Input Image | Style | Output | 
|---|---|---|
| Uploaded image | Starry Night | Stylized version | 
Instead of running the full app immediately, you can explore the project using the interactive Jupyter notebooks:
- General_NST_Notebook.ipynb: explains and implements Johnson's Fast Neural Style Transfer using PyTorch
- Image_NST_Notebook.ipynb: demonstrates neural style transfer on images using a feedforward Transformer network
- Video_NST_Notebook.ipynb: applies a feedforward neural style transfer model to a video
- NST_Model_Training_Notebook.ipynb: demonstrates how to train a Transformer network for fast neural style transfer
- Add batch image GUI support
- Utlize Temporal Aware Networks instead of the current FastFeed Model for video stylization
- Add Semantic Segmentation feature for videos
This project is licensed under the MIT License.
- Based on Perceptual Losses for Real-Time Style Transfer
- Uses torchvision.models.vgg16for perceptual loss
- Portions of code and implementation adapted and inspired by Aleksa GordiΔ from his excellent repository:
 gordicaleksa/pytorch-neural-style-transfer-johnson
Chaitanya Malani
Email: contact@chaitanymalani.com