Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 32 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,47 @@

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) **Google Chrome 222.2** on
Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Daniel Chen
* Tested on: Chromium 144 - Windows 11, AMD Ryzen 7 8845HS w/ Radeon 780M Graphics (3.80 GHz), RTX 4070 notebook

This project is a WebGPU implementation of point cloud and Gaussian splat rendering.

Gaussian splat rendering is a rendering technique that depicts a set of oriented, scaled, colored points as small, volumetric Gaussian distribuions in 3D space. This model is is often used in photogrammetry, recreatng 3D scenes from photo data, by gradually making the set of points converge onto a progressively more accurate depiction of a scene. The resulting data can be stored in a `.ply` file ([samples](https://drive.google.com/drive/folders/1KOoKk4plvl720-nQEiqLcuTCMFizt0cc?usp=sharing)), which, alongside a JSON file with camera data, can be read by this web app and rendered in real time as either a point cloud (only centers are drawn, which is faster since there is less to render, letting you visualize the scene through the density of points) or as the full set of Gaussian splats.

### Live Demo

[![](img/thumb.png)](http://TODO.github.io/Project4-WebGPU-Forward-Plus-and-Clustered-Deferred)
https://enamchae.github.io/Project5-WebGPU-Gaussian-Splat-Viewer/

[![](./images/cover.png)](https://enamchae.github.io/Project5-WebGPU-Gaussian-Splat-Viewer/)

### Demo Video/GIF

[![](img/video.mp4)](TODO)
https://github.com/user-attachments/assets/e26cffa1-f91a-46f7-9415-3bf389344195

### Performance

For many scenes, there is a load time of several seconds while splat data is read from the `.ply` file. The below analysis only deals with the render time after all this data has been loaded.

#### Preprocessing workgroup size
For the bonzai scene above at the default angle, using the Gaussian splat renderer at the default splat scale, it is difficult to compare different workgroup sizes for the preprocessing compute shader, as adjusting the workgroup size too low causes my GPU to hang. Workgroup sizes of 64 or below hang immediately on the bonzai scene, 128 remains steady at around 7 to 28 ms/frame but hangs as more splats are moved onto the screen, and 256 has no issues while achieving a similar framerate. The workgroup size can be decreased lower especially when sorting is disabled, so the number of workgroup dispatches needed to perform radix sort may cause issues at lower workgroup sizes. When sorting is disabled, the framerate hovers in the 6 to 28 ms/frame range at all workgroup sizes between 16 and 256.

#### Half-precision packing
To save on some additional memory per splat, some `f32` fields on the `Splat` struct are compressed into paired-up `f16` fields instead. With this, we can drop down from 48 bytes to 32 bytes per `Splat` (but we have too many fields to reach 16 bytes), which can be beneficial seeing as many splats will make up a scene. The render time remains roughly the same.

#### View frustum culling
In the preprocess step, we flag splats as being culled if they lie outside the camera's view frustum plus a 10% margin in either dimension. On the bicycle scene above, the benefits of view frustum culling are noticeable especially when a significant portion of the model is off-screen. At roughly the angle pictured below, the render time is about 28 ms/frame with view frustum culling and about 50 ms/frame without, but both remain at about 70 ms/frame with the full scene.

![](./images/occluded.png)

#### Effects of scene complexity
There is a noticeable performance difference between the bonzai and bicycle scenes above, so the number of splats likely makes a difference. In particular, rendering the entire bonzai scene takes around 6 to 21 ms/frame whereas the bicycle scene can take around 70 ms/frame, even if they take a similar proportion of the frame.

### (TODO: Your README)
|Bonzai|Bicycle|
|-|-|
|![](./images/bonzaiframe.png)|![](./images/bicycleframe.png)|

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
The cleaned bicycle scene above has about 4 times as many splats (1 063 091) as the bonzai scene (272 956), which could mean more threads have to be run in sequence in the preprocessing and sorting steps. An additional bottleneck could be the handling of the atomic sort count, which would require each thread encountering the `atomicAdd` to be executed in sequence. One way to avoid this could be to use a prefix sum instead of a linear addition, achieving an `O(\log(n))` ideally parallel time complexity rather than `O(n)`.

This assignment has a considerable amount of performance analysis compared
to implementation work. Complete the implementation early to leave time!

### Credits

Expand Down
299 changes: 299 additions & 0 deletions deno.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added images/bicycleframe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/bonzaiframe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/cover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/occluded.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading