- Objectives
- Key Features
- Project Scope
- Potential Benefits
- Techniques for Development
- Comparison of Techniques
- Data Collection
- 3D Reconstruction
- XR Integration
- Tools & Resources
- Comparative Analysis
- Evaluation
- To design and develop a realistic virtual clinical environment that extends beyond the capabilities of 360 videos and images, granting users 6 degrees of freedom (6DoF) to explore and engage with the content.
- To compare several environment development techniques such as photogrammetry, NeRFs, and Gaussian splatting.
- Realistic Virtual Environment
Create a realistic simulation of a clinical setting where staff and students can practice and learn. - 6DoF Navigation
Unlike traditional 360-degree videos or images, users will be able to move freely in any direction—forward/backward, up/down, left/right, pitch, yaw, and roll. - XR Headmounted Display Compatibility
Ensure that the environment is optimised for and compatible with various standalone XR head mounted displays for immersive experiences.
The project will involve the following steps:
- Data collection: Capture video and image data of a clinical environment using a suitable camera system.
- 3D reconstruction: Use photogrammetry, NeRFs and Gaussian Splatting to construct individual 3D models of the clinical environment from the captured image data.
- XR integration: Integrate the captured virtual environments with a commercially available game engine (Unreal Engine). Then implement an XR head mounted display to allow users to interact with 6DoF.
- Comparative analysis: Compare the 3D virtual environments for the following.
- Resolution: and Detail: Compare the detail level and resolution across the three techniques, ensuring the most realistic representation of the clinical environment.
- Performance: Assess the real-time rendering performance of each technique, especially when used on the XR head mounted displays.
- Flexibility: Evaluate the ability of each technique to allow changes or updates to the environment.
- Development time & cost: Analyse the time and resources required to develop the environment using each technique.
- Resolution: and Detail: Compare the detail level and resolution across the three techniques, ensuring the most realistic representation of the clinical environment.
- Evaluation: Evaluate the usability and effectiveness of the virtual environments within a clinical educational context.
- Enhanced training: Clinical staff and students can immerse themselves in cost effective realistic scenarios, improving the quality of training and education.
- Safe learning: Allows students and staff to practice without real-world consequences.
- Accessibility: Can be accessed anytime, anywhere, offering flexibility in learning and training.
Photogrammetry is the science and technology of obtaining reliable information about physical objects and the environment through the process of recording, measuring and interpreting photographic images and patterns of electromagnetic radiant imagery and other phenomena.
It is a non-contact method of 3D measurement that uses multiple images of an object to reconstruct its 3D shape and position. Photogrammetry is used in a wide variety of applications, including surveying, mapping, engineering, archaeology, and special effects.
Photogrammetry can be performed using a variety of equipment, including traditional cameras, drones, and satellite scanners. The type of equipment used depends on the specific application.
To perform photogrammetry, a series of images of the object or scene of interest are taken from different angles. The images are then processed using software to create a 3D model. The software matches points in the different images to create a point cloud. The point cloud is then used to generate a mesh, which is a surface that represents the 3D shape of the object or scene.
NeRF stands for Neural Radiance Fields. It is a machine learning approach to 3D reconstruction and rendering. NeRFs represent a 3D scene as a neural network that predicts the colour and density of light at any point in the scene.
To train a NeRF, a set of images of the scene are taken from different angles. The NeRF is then trained to predict the colour and density of light at each pixel in each image. Once the NeRF is trained, it can be used to render the scene from any viewpoint.
NeRFs have several advantages over traditional 3D rendering techniques. First, they are very efficient, as they can be implemented using standard GPU rasterization hardware. Second, they are very flexible, as they can represent a wide variety of 3D scenes, including those with complex geometries and materials. Third, they are very scalable, as they can be used to render scenes with millions of polygons.
NeRFs have been used to create a variety of applications, including real-time rendering of 3D scenes, 3D reconstruction from images, and 3D printing.
📄 NeRF: Neural Radiance Fields (matthewtancik.com)
Gaussian splatting is a rasterization technique for 3D reconstruction and rendering. It represents a 3D scene as millions of particles, each of which is a 3D Gaussian function. Each particle has a position, rotation, scale, opacity, and view-dependent colour.
To render a Gaussian splatting scene, the particles are first converted into 2D space ("splatted") and then organized and sorted for efficient rendering. The splatting process involves projecting each particle onto the image plane and then blending it with the existing pixels. The opacity and view-dependent colour of the particle determine how much it contributes to the final colour of the pixel.
Gaussian splatting has several advantages over traditional 3D rendering techniques. First, it is very efficient, as it can be implemented using standard GPU rasterization hardware. Second, it is very flexible, as it can represent a wide variety of 3D scenes, including those with complex geometries and materials. Third, it is very scalable, as it can be used to render scenes with millions of polygons.
Gaussian splatting has been used to create a variety of various applications, including real-time rendering of 3D scenes, 3D reconstruction from images, and 3D printing. It is a powerful technique that has the potential to revolutionise the way we interact with 3D content.
📄 3D Gaussian Splatting for Real-Time Radiance Field Rendering
Photogrammetry is a well-established technique for 3D reconstruction, but it can be time-consuming and requires a large amount of data. NeRFs is a newer technique that is less data-intensive, but it can be more difficult to train and deploy. Gaussian splatting is a rendering technique that is well-suited for XR applications, as it is efficient and scalable.
For this project, we will compare the performance of photogrammetry, NeRFs and Gaussian splatting for reconstructing a clinical environment. We will also evaluate the usability and effectiveness of rendering the virtual environment in XR using Unreal Engine.
Data collection involves capturing video and image data of the Environments. We captured three clinical environments using a mixture of devices and settings. I felt the most optimal way of capturing enough images for model training is to record video and then extract images from the individual frames.
Patient Consultation Room
Patient consultation room - this is part of the clinical simulation suite at University Hospitals Birmingham NHS Trust
Setup 1 | Setup 2 | Setup 3 | Setup 4 | Setup 5 | Setup 6 | Setup 7 | Setup 8 | |
---|---|---|---|---|---|---|---|---|
Device | iPhone 13 | iPhone 13 | iPhone 13 | iPhone 13 | Canon R6 | Canon R6 | Canon R6 | Canon R6 |
Lens | - | - | - | - | 24-105mm | 24-105mm | 24-105mm | 24-105mm |
Capture length | 2 min | 5 min | 2 min | 5 min | 2 min | 5 min | 2 min | 5 min |
Shutter speed | ||||||||
Aperture | ||||||||
ISO | ||||||||
Resolution | 1080 | 1080 | 4K | 4K | 1080 | 1080 | 4K | 4K |
FPS | 60 fps | 60 fps | 30 fps | 30 fps | 50 fps | 50 fps | 25 fps | 25 fps |
Format | Mov | Mov | Mov | Mov | Mov | Mov | Mov | Mov |
Michael Rubloff has provided guidance on camera settings.
📄 What are the Best Camera Settings to take a NeRF?
📄 What’s the best Focal Length to take a NeRF?
As of writing, implementing Gaussian Splatting on a local machine remains challenging. I would like to extend my gratitude to Jon Stephens. His comprehensive tutorial has been instrumental in navigating the complexities of setting up and executing the 3D Gaussian Splatting for Real-Time Radiance Field Rendering process, including image preparation and data training. Jon's insights and instructions have been invaluable in this process. You can find his detailed guide here.
The following steps were taken: Image preparation ➡️ Model Training ➡️ Data Visualisation ➡️ Export to Unreal Engine
The implementation and testing were conducted on a machine running Windows 11, equipped with the following specifications:
- Git
- Conda
- CUDA Toolkit (tested with version 11.8 only)
- Visual Studio (ensure to install Desktop Development with C++)
- COLMAP
- ImageMagik
- FFMPEG
🔨XVERSE: A free Unreal Engine 5 Gaussian Splatting plugin
🔨 Gauzilla: A 3D Gaussian Splatting (3DGS) renderer written in Rust for platform-agnostic WebAssembly (WASM)
🔨 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
📺 Getting Started With 3D Gaussian Splatting for Windows: (Beginner Tutorial)
📄 Getting Started With 3D Gaussian Splatting for Windows: gaussian-splatting-Windows GitHub repository
🔨 Akiya Research Institute 3D Gaussians Plugin for UE5
🔨 TurboNeRF
🔨 nerfstudio
🔨 Luma Unreal Engine Plugin