RT-MPINet

RT-MPINet

Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision (RT-MPINet)

We present a real-time multiplane image (MPI) network. Unlike existing MPI based approaches that often rely on a separate depth estimation network to guide the network for estimating MPI parameters, our method directly predicts these parameters from a single RGB image. To guide the network we present a multimodal training strategy utilizing joint supervision from view synthesis and depth estimation losses. More details can be found in the paper.

Please head to the Project Page to see supplementary materials

Setup

Clone the repository:

git clone https://github.com/Realistic3D-MIUN/RT-MPINet
cd RT-MPINet

Install dependencies:
```
pip install -r requirements.txt
```

Install PyTorch3D after the general libs have been installed

pip install "pytorch3d @ git+https://github.com/facebookresearch/pytorch3d.git@89653419d0973396f3eff1a381ba09a07fffc2ed"

Checkpoints

Pretrained model checkpoints (trained on 40K COCO images) should be placed in the checkpoint/ directory. Example filenames:

checkpoint_RT_MPI_Small.pth
checkpoint_RT_MPI_Medium.pth
checkpoint_RT_MPI_Large.pth

Model	Size	Parameters	Checkpoint
Small	26 MB	6.6 Million	Download
Medium (Default)	278 MB	69 Million	Download
Large	1.2 GB	288 Million	Download

Usage

1. Live Rendering Demo

You can load any image and run the model inference each time the camera position is changed. This will be limited to the inference speed on your GPU.

python renderLiveWithMouseControl.py \
--input_image <path_to_image> \
--model_type <small|medium|large> \
--checkpoint_path <path_to_checkpoint> \
--height <height> \
--width <width>

Example:

   python renderLiveWithMouseControl.py \
   --input_image ./samples/moon.jpg \
   --model_type medium \
   --checkpoint_path ./checkpoint/checkpoint_RT_MPI_Medium.pth \
   --height 256 \
   --width 256

2. Inference: Predict MPIs from an image and render afterwards

The predicted MPIs can be used for offline rendering, which is much faster as the model isn't being queried each time camera changes. This requires

First predicting the MPIs

python predictMPIs.py \
--input_image <path_to_image> \
--model_type <small|medium|large> \
--checkpoint_path <path_to_checkpoint> \
--save_dir <output_dir> \
--height <height> \
--width <width>

Second the MPIs are loaded and views are rendered without invoking the model using

python renderPreProcessedWithMouseControl.py \
--layer_path <output_dir> \
--height <height> \
--width <width>

Example:

python predictMPIs.py \
--input_image ./samples/moon.jpg \
--model_type medium \
--checkpoint_path ./checkpoint/checkpoint_RT_MPI_Medium.pth \
--save_dir ./processedLayers/ \
--height 384 \
--width 384

python renderPreProcessedWithMouseControl.py \
--layer_path ./processedLayers/ \
--height 384 \
--width 384

3. Web Demo (Gradio)

You can run the local demo of the Huggingface app to utilize your own GPU for faster inference using

python app.py

Supported Resolutions

We have tested our model with following resolutions:

256x256
384x384
512x512
256x384
384x512

Note: If using non square aspect ratio, you need to modify the torch transform to account for changes.

License

Acknowledgements

We thank the authors of AdaMPI for their implementation of the homography renderer which has been used in this codebase under ./utils directory
We tank the author of Deepview renderer template, which was used in our project page.

Citation

If you use our work please use following citation:

@inproceedings{gond2025rtmpi,
  title={Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision},
  author={Gond, Manu and Shamshirgarha, Mohammadreza and Zerman, Emin and Knorr, Sebastian and Sj{\"o}str{\"o}m, M{\aa}rten},
  booktitle={2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP)},
  pages={},
  year={2025},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
samples		samples
utils		utils
LICENSE		LICENSE
README.md		README.md
RT_MPINet_Preview.gif		RT_MPINet_Preview.gif
app.py		app.py
helperFunctions.py		helperFunctions.py
helper_image_functions.py		helper_image_functions.py
model_Large.py		model_Large.py
model_Medium.py		model_Medium.py
model_Small.py		model_Small.py
parameters.py		parameters.py
post-install.sh		post-install.sh
predictMPIs.py		predictMPIs.py
renderLiveWithMouseControl.py		renderLiveWithMouseControl.py
renderPreProcessedWithMouseControl.py		renderPreProcessedWithMouseControl.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RT-MPINet

Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision (RT-MPINet)

Setup

Checkpoints

Usage

1. Live Rendering Demo

2. Inference: Predict MPIs from an image and render afterwards

3. Web Demo (Gradio)

Supported Resolutions

License

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

License

Realistic3D-MIUN/RT-MPINet

Folders and files

Latest commit

History

Repository files navigation

RT-MPINet

Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision (RT-MPINet)

Setup

Checkpoints

Usage

1. Live Rendering Demo

2. Inference: Predict MPIs from an image and render afterwards

3. Web Demo (Gradio)

Supported Resolutions

License

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages