An implementation of virtual object insertion task based on deep learning.
Our pipeline of inserting a virtual sphere into a 2D image consists of 4 stages:
- Generation of feature maps: this stage takes in a single 2D image and prompts the location of the object interactively. It proceeds to output several feature maps like depth, normal, albedo, etc., together with files that aid the subsequent stages.
- Generation of render scripts: this stage generates scene description files that orchestrate the rendering of the virtual object.
- Virtual object rendering: this stage automates the rendering instructed in the generated scene description files. Our implementation employs Matt Pharr's pbrt to handle the shading of the object.
- Virtual object insertion: this stage assembles the rendered scenes and programmatically inserts the shaded object into the target 2D image.
The implementation depends heavily on Python and some external packages like OpenEXR for handling HDR contents. In addition, the workflow leverages different deep learning models to estimate feature maps. Ironically, some of these models use outdated libraries which makes it cumbersome to set up the right working environment.
This section describes the required components that need to be installed to set up the working environment for our repo.
- These are optional but highly advisable components to have for performance gain with GPU support.
- To check the NVIDIA driver and the latest compatible CUDA version:
nvidia-smi
- It is required to install exactly either
CUDA Toolkit version
11.8
or12.1
. The reason is thatpbrt
requires CUDA toolkit for GPU rendering. However, once installed, PyTorch will use this CUDA version instead of their prebuilt CUDA runtime. Because PyTorch only works with CUDA11.8
and12.1
, the choice for CUDA toolkit version is limited to those two only. - OptiX SDK
will enable the rendering of
pbrt
on the GPU. Any OptiX version from7.1
to7.7
is applicable.
pbrt
must be built from source. The build instruction is detailed on their GitHub homepage.- An example of the installation process of
pbrt
:
git clone --recursive https://github.com/mmp/pbrt-v4.git pbrt
cd pbrt
mkdir build
- If GPU is compatible:
cmake -B build -DCMAKE_BUILD_TYPE=Release -DPBRT_OPTIX7_PATH="path/to/OptiX/v7.7.0"
- For CPU:
cmake -B build -DCMAKE_BUILD_TYPE=Release
- The build can be invoked indirectly through CMake.
- If using Visual Studio or XCode:
cmake --build build --config Release --target ALL_BUILD
- If using Make or Ninja:
cmake --build build
- The process will generate several binaries, among which are the
pbrt
executable and theimgtool
program.
- This package helps streamline the management of HDR content.
- The installation guide can be found here for all operating systems.
- On Windows, it might be necessary to find the correct snapshot of
vcpkg
that provides the desired version of OpenEXR:
git log --color=always --pretty='%Cred%h%Creset -%C(auto)%d%Creset %s %Cgreen(%ad)' --date=short | select-string openexr
- Check out the right commit using the commit hash, the following, for instance,
will check out the commit of
vcpkg
that contains OpenEXR version3.2.3
:
git checkout 52650f28f
- Then run the installation as normal:
vcpkg install openexr
- There will be 2 separate Python environments required. The reason is that our
implementation employs Pratul et al.'s
lighthouse which based their
code heavily on TensorFlow
1.15.0
. - Since the latest Python version supporting TensorFlow
1.15.0
is3.7.x
, some major features required by most packages used in this project would be missing. - Therefore, a separate Python environment shall be set up to run lighthouse independently.
- We will create a main Python environment that handles most major executions
and a small one to take on lighthouse. The former will be referred to as
torch
and the lattertensor
. - It is recommended to use Python's
venv
module to set up virtual environments. - For example, to create a virtual environment named torch in the current working directory:
python -m venv torch
- To activate the virtual environment on Windows (PowerShell):
torch\Scripts\activate
- On Linux and MacOS (bash/zsh):
source torch/bin/activate
- Deactivate an activated shell:
deactivate
- Note that the Python interpreter version inside a virtual environment is the
the version of the Python interpreter that was used to run the
venv
module.
- Despite not being the latest, Python
3.10
shall be used for this environment. - The main responsibility of
torch
is to handle packages from PyTorch. - With the
torch
environment activated, install PyTorch with support for CUDA12.1
:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
- PyTorch provides the lower layers for models maintained by the Hugging Face community used in our implementation:
pip install transformers diffusers[torch] xformers
- Finally several handy packages:
pip install Pillow opencv-python scipy h5py matplotlib coloredlogs pyexr
- A frozen snapshot of
torch
can be found in therequirements-torch.txt
file.
- Unlike
torch
, Python3.7
must be used for this environment. This means there must be 2 separate installations of Python on the machine. - With the
tensor
environment activated:
pip install tensorflow==1.15.0 matplotlib==2.2.3 scipy==1.1.0 protobuf==3.20.3 numpy==1.16.0 absl-py
- A snapshot of
tensor
can be found in therequirements-tensor.txt
file.
- First clone the project:
git clone https://github.com/ndming/virtual-object-insertion.git
cd virtual-object-insertion
- Download checkpoints
for Pratul et al.'s lighthouse and place them in
lighthouse/model
- Download checkpoints
for Li et al.'s inverse rendering and place them in
irois/models
The Python scripts that govern the 4 stages of the pipeline are:
mapgen.py
: generates feature mapspbrgen.py
: generates render scriptspbrren.py
: automates the rendering withpbrt
objput.py
: inserts the rendered object into a 2D image
To get the hint usage of these scripts:
python *.py --help
Note that the Python interpreter corresponding to the torch
virtual enviroment
shall be used to execute these scripts.
mapgen.py
requires the path to the image that we would like to insert an
object into and the path to the Python executable in the tensor
environment.
For example, on Windows:
python mapgen.py -cuda --img path/to/some/im.png --py37 path/to/tensor/Scripts/python.exe
The -cuda
option should only be passed if PyTorch was installed with CUDA.
The script will first prompt 4 coordinates that define the plane receiving shadow cast by the object.
The second prompt will be the position of the object inside this plane. It will then generate all necessary resources, including ones estimated by Li et al. and Pratul et al.
pbrgen.py
only requires the path to the directory containing resources
generated by mapgen.py
.
It is advisable to change the weights applying on Li et al. and Pratul et al's environment map, for example:
python pbrgen.py -upscale --res-dir path/to/gen --w-irois 0.5 --w-house 1.5
Note that if -upscale
is specified, the script will supersample the Pratul et
al.'s environment map 4 times its original size using
StableDiffusion.
Depending on the GPU capability, the upscaling process may take up to half an
hour, for which the -cache
option might be helpful if the generation of env
map can be skipped and the script will use the previously generated files:
python pbrgen.py -cache --res-dir path/to/gen --w-irois 0.5 --w-house 1.5
pbrren.py
needs to know the folder containing the pbrt
and imgtool
executables, and the path to the directory containing resources generated by
pbrgen.py
.
As an example:
python pbrren.py -gpu --pbrt-dir path/to/pbrt/folder --res-dir path/to/gen/pbrt
The -gpu
option should only be passed if pbrt
was built with GPU support.
Finally, objput.py
takes the path to the directory containing resources
generated by pbrren.py
and inserts the object into the image specified
with the --target
option.
If --target
is omitted, the script will insert the object to the target.png
file sitting in the same folder.
python objput.py --res-dir path/to/gen/pbrt --target path/to/some/target.png
The insertion result will be saved to the file specified by the --output
option.
If this option is omitted, the result is placed in the resource directory.
Note that this result is itself an EXR, for which an image viewer like tev could be helpful.