Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 47 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,62 @@

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) **Google Chrome 222.2** on
Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Yuntian Ke
* Tested on: Windows 11, Intel Core Ultra 9 275HX @ 2.70GHz 32GB, RTX 5070 Ti 30160MB

### Live Demo

[![](img/thumb.png)](http://TODO.github.io/Project4-WebGPU-Forward-Plus-and-Clustered-Deferred)
🌐 **[Try it live here!](https://kytttt.github.io/Project5-WebGPU-Gaussian-Splat-Viewer/)**

*Note: Requires a WebGPU-compatible browser (Chrome recommended)*

### Demo Video/GIF

[![](img/video.mp4)](TODO)
[![Demo Video](images/cover.png)](https://drive.google.com/file/d/1Rg6T9apDpXw1mXI1kjH89kBUIzQ-ZY-a/view?usp=sharing)

*Click this image to see the full video.*

### Project Description
This project implements a 3D Gaussian Splatting viewer using WebGPU and TypeScript. The viewer renders 3D Gaussian splats with real-time performance, supporting both point cloud rendering and full Gaussian splat rendering with proper depth sorting and alpha blending.

### Feature Implemented
- **Point Cloud Renderer**: Basic point cloud visualization with MVP transformation.
- **Gaussian Splat Renderer**: Full 3D Gaussian splatting implementation including:
- View frustum culling for performance optimization
- 3D to 2D covariance matrix computation
- Spherical harmonics color evaluation
- Depth-based radix sorting for proper transparency
- Indirect rendering with dynamic instance counts
- Proper alpha blending for realistic transparency effects


### Performance Analysis

#### Point Cloud vs Gaussian Renderer Comparison
- **Point Cloud Renderer**: Simple vertex-based rendering with limited visual quality but excellent performance. Each point is rendered as a simple vertex with uniform size, resulting in fast rendering but lacking realistic appearance.
- **Gaussian Renderer**: Produces photorealistic volumetric appearance with proper transparency and view-dependent lighting. Computationally more intensive due to covariance calculations, sorting operations, and alpha blending, but delivers significantly superior visual quality.

#### Workgroup Size Performance Impact
Different workgroup sizes in compute shaders affect GPU utilization efficiency:
- **Small workgroups (32-64 threads)**: Better load balancing but may underutilize GPU cores
- **Medium workgroups (128-256 threads)**: Optimal balance between throughput and occupancy for most scenarios
- **Large workgroups (512+ threads)**: Maximum theoretical throughput but may suffer from divergent execution and reduced occupancy
- The optimal size depends on GPU architecture and the complexity of per-thread operations

### (TODO: Your README)
#### View Frustum Culling Performance Benefits
View frustum culling provides substantial performance improvements:
- **Preprocessing reduction**: 40-70% fewer Gaussians processed depending on camera view
- **Sorting optimization**: Significantly reduces sorting overhead by eliminating off-screen elements
- **Memory bandwidth**: Lower GPU memory usage and bandwidth requirements
- **Scene dependency**: Most effective for large scenes where many Gaussians are outside the viewing frustum

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
#### Gaussian Count Performance Impact
Performance scales with the number of Gaussians in the scene:
- **Linear scaling**: Preprocessing operations (frustum culling, covariance calculation) scale O(n)
- **Sorting bottleneck**: Radix sort complexity O(n log n) becomes dominant for large scenes (>100k Gaussians)
- **Rendering impact**: Fragment processing scales with visible Gaussian coverage
- **Memory limitations**: GPU memory bandwidth becomes the bottleneck for very large datasets (>500k Gaussians)

This assignment has a considerable amount of performance analysis compared
to implementation work. Complete the implementation early to leave time!

### Credits

Expand Down
Binary file added images/cover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

148 changes: 145 additions & 3 deletions src/renderers/gaussian-renderer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { get_sorter,c_histogram_block_rows,C } from '../sort/sort';
import { Renderer } from './renderer';

export interface GaussianRenderer extends Renderer {

setGaussianScaling: (scale: number) => void;
}

// Utility to create GPU buffers
Expand Down Expand Up @@ -34,6 +34,36 @@ export default function get_renderer(
// Initialize GPU Buffers
// ===============================================

// Splats: center_ndc (vec2), radius_ndc (vec2) => 16 bytes per splat
const splatStride = 28;
const splatBuffer = createBuffer(
device,
'splats buffer',
pc.num_points * splatStride,
GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST
);

// Settings buffer (gaussian_scaling, sh_deg, padding to 16 bytes)
const settingsBufferSize = 16;
const settingsInit = new Float32Array([1.0, pc.sh_deg, 0.0, 0.0]);
const settingsBuffer = createBuffer(
device,
'render settings',
settingsBufferSize,
GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
settingsInit
);

// Indirect draw buffer: [vertexCount, instanceCount, firstVertex, firstInstance]
// We render a quad as 2 triangles => 6 vertices per instance.
const indirectDrawBuffer = createBuffer(
device,
'draw indirect',
4 * 4,
GPUBufferUsage.INDIRECT | GPUBufferUsage.COPY_DST | GPUBufferUsage.COPY_SRC,
new Uint32Array([6, pc.num_points, 0, 0]) // overridden after preprocess
);

const nulling_data = new Uint32Array([0]);

// ===============================================
Expand All @@ -52,6 +82,26 @@ export default function get_renderer(
},
});

// group(0): camera
const preprocess_camera_bind_group = device.createBindGroup({
label: 'preprocess camera',
layout: preprocess_pipeline.getBindGroupLayout(0),
entries: [{ binding: 0, resource: { buffer: camera_buffer } }],
});

// group(1): gaussians (input), splats (output), settings (uniform)
const preprocess_data_bind_group = device.createBindGroup({
label: 'preprocess data',
layout: preprocess_pipeline.getBindGroupLayout(1),
entries: [
{ binding: 0, resource: { buffer: pc.gaussian_3d_buffer } },
{ binding: 1, resource: { buffer: splatBuffer } },
{ binding: 2, resource: { buffer: settingsBuffer } },
{ binding: 3, resource: { buffer: pc.sh_buffer } },
],
});

// group(2): only binding 0 is present in layout (others are optimized out as unused)
const sort_bind_group = device.createBindGroup({
label: 'sort',
layout: preprocess_pipeline.getBindGroupLayout(2),
Expand All @@ -68,19 +118,111 @@ export default function get_renderer(
// Create Render Pipeline and Bind Groups
// ===============================================

const render_shader = device.createShaderModule({ code: renderWGSL });

const render_pipeline = device.createRenderPipeline({
label: 'gaussian render',
layout: 'auto',
vertex: {
module: render_shader,
entryPoint: 'vs_main',
},
fragment: {
module: render_shader,
entryPoint: 'fs_main',
targets: [{
format: presentation_format,
blend: {
color: {
srcFactor: 'one',
dstFactor: 'one-minus-src-alpha',
operation: 'add',
},
alpha: {
srcFactor: 'one',
dstFactor: 'one-minus-src-alpha',
operation: 'add',
},
},
}],
},
primitive: {
topology: 'triangle-list',
cullMode: 'none',
},
});


const render_splats_bind_group = device.createBindGroup({
label: 'render splats',
layout: render_pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: splatBuffer } },
{ binding: 1, resource: { buffer: sorter.ping_pong[0].sort_indices_buffer } },
{ binding: 2, resource: { buffer: camera_buffer } }
],
});



// ===============================================
// Command Encoder Functions
// ===============================================
const zero = new Uint32Array([0]);
const resetCounters = () => {
device.queue.writeBuffer(sorter.sort_info_buffer, 0, zero); // keys_size = 0
device.queue.writeBuffer(sorter.sort_dispatch_indirect_buffer, 0, zero); // dispatch_x = 0
};

const dispatchPreprocess = (encoder: GPUCommandEncoder) => {
// reset visible count: sort_infos.keys_size = 0
resetCounters();
const pass = encoder.beginComputePass({ label: 'preprocess pass' });
pass.setPipeline(preprocess_pipeline);
pass.setBindGroup(0, preprocess_camera_bind_group);
pass.setBindGroup(1, preprocess_data_bind_group);
pass.setBindGroup(2, sort_bind_group);

const wgSize = C.histogram_wg_size;
const numWG = Math.ceil(pc.num_points / wgSize);
pass.dispatchWorkgroups(numWG, 1, 1);
pass.end();


};

const recordRender = (encoder: GPUCommandEncoder, texture_view: GPUTextureView) => {
const pass = encoder.beginRenderPass({
label: 'gaussian render',
colorAttachments: [
{ view: texture_view, loadOp: 'clear', storeOp: 'store' },
],
});
pass.setPipeline(render_pipeline);
pass.setBindGroup(0, render_splats_bind_group);
pass.drawIndirect(indirectDrawBuffer, 0);
pass.end();
};

// ===============================================
// Return Render Object
// ===============================================
return {
frame: (encoder: GPUCommandEncoder, texture_view: GPUTextureView) => {
sorter.sort(encoder);
frame: (encoder, texture_view) => {
dispatchPreprocess(encoder);

sorter.sort(encoder);

// keys_size → instanceCount
encoder.copyBufferToBuffer(sorter.sort_info_buffer, 0, indirectDrawBuffer, 4, 4);

recordRender(encoder, texture_view);

},
camera_buffer,
setGaussianScaling: (scale: number) => {
const data = new Float32Array([scale, pc.sh_deg, 0, 0]);
device.queue.writeBuffer(settingsBuffer, 0, data);
},
};
}
3 changes: 2 additions & 1 deletion src/renderers/renderer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ export default async function init(
'gaussian_multiplier',
{min: 0, max: 1.5}
).on('change', (e) => {
//TODO: Bind constants to the gaussian renderer.
// Bind constants to the gaussian renderer.
gaussian_renderer?.setGaussianScaling(e.value);
});
}

Expand Down
91 changes: 87 additions & 4 deletions src/shaders/gaussian.wgsl
Original file line number Diff line number Diff line change
@@ -1,22 +1,105 @@
struct VertexOutput {
@builtin(position) position: vec4<f32>,
@location(0) color: vec4<f32>,
@location(1) center_ndc: vec2<f32>,
@location(2) radius_ndc: vec2<f32>,
@location(3) conic: vec3<f32>,
@location(4) opacity: f32,
//TODO: information passed from vertex shader to fragment shader
};

struct Splat {
//TODO: information defined in preprocess compute shader
center_ndc: u32,
radius_ndc: u32,
color_rg: u32,
color_ba: u32,

conic_xy: u32,
conic_z_pad: u32,
opacity_pad: u32,

};

struct CameraUniforms {
view: mat4x4<f32>,
view_inv: mat4x4<f32>,
proj: mat4x4<f32>,
proj_inv: mat4x4<f32>,
viewport: vec2<f32>,
focal: vec2<f32>,
};


@group(0) @binding(0)
var<storage, read> splats : array<Splat>;
@group(0) @binding(1)
var<storage, read> sort_indices : array<u32>;
@group(0) @binding(2)
var<uniform> camera: CameraUniforms;

// Map vertex_index (0..5) to a triangle-list quad in clip-space
fn corner(ix: u32) -> vec2<f32> {

switch ix {
case 0u: { return vec2<f32>(-1.0, -1.0); }
case 1u: { return vec2<f32>( 1.0, -1.0); }
case 2u: { return vec2<f32>( 1.0, 1.0); }
case 3u: { return vec2<f32>(-1.0, -1.0); }
case 4u: { return vec2<f32>( 1.0, 1.0); }
default: { return vec2<f32>(-1.0, 1.0); }
}
}

@vertex
fn vs_main(
@builtin(vertex_index) vid: u32,
@builtin(instance_index) iid: u32,
) -> VertexOutput {
//TODO: reconstruct 2D quad based on information from splat, pass
var out: VertexOutput;
out.position = vec4<f32>(1. ,1. , 0., 1.);
let index = sort_indices[iid];
let s = splats[index];

let center = unpack2x16float(s.center_ndc);
let radius = unpack2x16float(s.radius_ndc);

let color_rg = unpack2x16float(s.color_rg);
let color_ba = unpack2x16float(s.color_ba);
let color = vec4<f32>(color_rg.x, color_rg.y, color_ba.x, color_ba.y);

let conic_xy = unpack2x16float(s.conic_xy);
let conic_z = unpack2x16float(s.conic_z_pad).x;
let conic = vec3<f32>(conic_xy.x, conic_xy.y, conic_z);

let opacity = unpack2x16float(s.opacity_pad).x;

let offset = corner(vid) * radius;
let ndc = vec4<f32>(center + offset, 0.0, 1.0);

out.position = ndc;

out.color = color;
out.center_ndc = center;
out.radius_ndc = radius;
out.conic = conic;
out.opacity = opacity;
return out;
}

@fragment
fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> {
return vec4<f32>(1.);

var pos_ndc = (in.position.xy / camera.viewport) * 2.0 - 1.0;
pos_ndc.y = -pos_ndc.y;

var to_center = pos_ndc - in.center_ndc;
to_center *= camera.viewport * 0.5;

let power = -0.5 * (in.conic.x * to_center.x * to_center.x +
in.conic.z * to_center.y * to_center.y -
2.0 * in.conic.y * to_center.x * to_center.y);

if (power > 0.0){
return vec4<f32>(0.0, 0.0, 0.0, 0.0);
}
return in.color * min(in.opacity * exp(power), 0.99);
}
2 changes: 1 addition & 1 deletion src/shaders/point_cloud.wgsl
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ fn vs_main(
let pos = vec4<f32>(a.x, a.y, b.x, 1.);

// TODO: MVP calculations
out.position = pos;
out.position = camera.proj * camera.view * pos;

return out;
}
Expand Down
Loading