diff --git a/.gitignore b/.gitignore index 054e565..54b237d 100644 --- a/.gitignore +++ b/.gitignore @@ -24,4 +24,5 @@ dist-ssr *.sw? /.vite -*/scenes \ No newline at end of file +*/scenes +/scenes diff --git a/README.md b/README.md index edffdaf..7e8f226 100644 --- a/README.md +++ b/README.md @@ -2,25 +2,102 @@ **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) **Google Chrome 222.2** on - Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +- Jacqueline Guan + - [LinkedIn](https://www.linkedin.com/in/jackie-guan/) + - [Personal website](https://jyguan18.github.io/) +- Tested on my personal laptop: + - Windows 11 Pro 26100.4946 + - Processor AMD Ryzen 9 7945HX with Radeon Graphics + - 32 GB RAM + - Nvidia GeForce RTX 4080 + +### Demo Video/GIF + +
+ +[![](images/gif.gif)] + +
### Live Demo -[![](img/thumb.png)](http://TODO.github.io/Project4-WebGPU-Forward-Plus-and-Clustered-Deferred) +[![](images/pic.png)](https://jyguan18.github.io/Project5-WebGPU-Gaussian-Splat-Viewer/) -### Demo Video/GIF +The live demo for this project can be found [here](https://jyguan18.github.io/Project5-WebGPU-Gaussian-Splat-Viewer/). + +## Introduction + +For this project, I implemented a real-time Gaussian Splatting viewer using WebGPU. Gaussian Splatting is a novel rendering technique that represents 3D scenes as collections of 3D Gaussian primitives, which are projected and rendered as 2D splats. The renderer processes and displays 3D point clouds by converting each 3D Gaussian into a 2D splat through a series of transformations, including view-frustum culling, covariance projection, and depth-based sorting for correct alpha blending. + +## Implementation + +### Point Clouds + +I implemented a basic point cloud renderer for comparison, which renders each Gaussian as a simple point primitive without computing covariance or performing alpha blending. + +### Gaussian Render + +#### Preprocessing Stage + +In the Preprocessing Stage, each Gaussian undergoes the following transformations: + +- View-Frustum Culling: Gaussians are transformed from world space to view space to normalized device coordinates (NDC). Any Gaussian outside the viewing frustum (with a 1.2x tolerance) or behind the camera is culled. +- Covariance Projection: For visible Gaussians, I compute the 3D covariance matrix from the rotation (quaternion) and scale parameters, then project it to 2D screen space using the Jacobian of the perspective projection. This determines each splat's size and orientation on screen. +- Spherical Harmonics Evaluation: Colors are computed by evaluating spherical harmonic coefficients based on the viewing direction, enabling view-dependent appearance effects. +- Atomic Compaction: Instead of storing splats at their original Gaussian indices, I use atomicAdd to assign consecutive indices only to visible splats. This eliminates gaps in the output arrays caused by culled Gaussians, ensuring efficient memory usage and preventing rendering artifacts. +- Depth Sorting Setup: Each visible splat's depth is stored (as 100.0 - viewPos.z bitcast to uint32 for radix sorting), along with its compact index for proper rendering order. + +#### Indirect Drawing + +Instead of passing parameters directly from the CPU to the GPU, we can store those parameters in a GPU using an indirect buffer. In this case, the GPU can modify how many things to draw without the CPU knowing ahead of time. + +In this project, my compute shader decides which gaussians are visible (or culled) and how many splats actually need to be drawn. The shader will write the new number of visible splats into a counter and that count will be copied into the indirect draw buffer. So instead of drawing all of the point, we only draw the visible ones dynamically on the GPU. + +This makes the renderer fully GPU-driven. + +#### Rendering Stage + +The vertex shader reconstructs 2D screen-space quads for each sorted splat, and the fragment shader evaluates the Gaussian function to determine pixel coverage and opacity, achieving smooth alpha-blended rendering through proper depth-sorted compositing. + +## Performance Analysis + +With both the bicycle and the bonsai splat scenes, the average frame rate remained pretty much the same regardless of the settings when running it on my machine. And so I'm focusing more on the theoretical performance implications rather than with data. + +### Compare your results from point-cloud and gaussian renderer, what are the differences? + +The point cloud only renders discrete points with visible gaps between them, which creates a bit of a sparse appearance and an outline of the shape. It is also only one color. The Gaussian renderer produces smooth, continuous surfaces by blending overlapping splats. The alpha-blended Gaussians fill gaps naturally, creating photorealistic results that closely match the original captured scene. + +The point cloud renderer is simpler to render but it requires but it requires more points to achieve decent coverage. The Gaussian renderer achieves a better performance-to-quality ratio, even despite the additional preprocessing. The Gaussian approach should represent the scene more efficiently as each Gaussian can cover multiple pixels smoothly, whereas the point clouds would need many more primitives to achieve a similar (but still inferior) coverage. + +### For gaussian renderer, how does changing the workgroup-size affect performance? Why do you think this is? + +Workgroup sizes has a measurable impact on GPU performance due to how modern GPUs schedule and execute compute work. + +Smaller workgroups would likely show lower occupancy and more overhead from launching many workgroups. Large workgroups would probably show diminishing returns or worse performance due to register pressure or cache contention. + +GPUs schedule workgroups onto streaming multiprocessors (SMs). Larger workgroups should improve occupancy by keeping more threads active but only up to a point. Larger workgroups also consume more resources per SM, potentially limiting how many workgroups can run concurrently (reducing occupancy). In this implementation, multiple threads perform atomicAdd operations. Moderate workgroup sizes should balance parallelism with atomic serialization overhead. Threads in the same workgroup also share L1 cache. Too large would thrash the cache and too small wouldn't exploit the locality effectively. + +### Does view-frustum culling give performance improvement? Why do you think this is? + +I think view-frustum culling to provide substantial performance gains, especially for scenes where large portions fall outside of the camera view. + +Without culling, it would process and sort all Gaussians regardless of visibility. With culling, it should only process visible Gaussians (which would be like 30-60% depending on the view). + +The radix sort would have to operate on fewer elements, the indirect draw call would render fewer instances, compact indexing should mean the renderer accesses a smaller, more cache-friendly portion of GPU memory, and threads handling culled Gaussians return early in the compute shader which frees up execution resources for other work. This kind of performance improvement should be most noticeable when zoomed in on a specific area, looking at a scene from angles where many Gaussians face away, and using tight view frustums. + +The 1.2x frustum tolerance should prevent edge Gaussians from popping in/out during camera movement while still culling most off-screen content. + +### Does number of gaussians affect performance? Why do you think this is? + +I expect the number of Gaussians to directly impact every stage of the pipeline, with performance degrading as count increases. -[![](img/video.mp4)](TODO) +Gaussian count should matter because each gaussian requires matrix multiplications for view/project transforms, quaternion to rotation matrix conversion, 3D to 2D covariance projection, and spherical harmonics evaluation, which whould all scale linearly. -### (TODO: Your README) +More Gaussians would also mean larger input buffers to read from, larger output buffers to write to, more data moving through the GPU's memory hierarchy, and potential cache thrashing with very large datasets. -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +Even with indirect drawing, the GPU has to iterate through more instances, generate more vertices (6 per splat), and process more fragments (overlapping splats multiply fragment work). -This assignment has a considerable amount of performance analysis compared -to implementation work. Complete the implementation early to leave time! +More Gaussians likely mean more overlapping splats, causing the fragment shader to execute multiple times per pixel, which could dramatically increase fragment shader load in dense scenes. ### Credits diff --git a/images/gif.gif b/images/gif.gif new file mode 100644 index 0000000..945d4a6 Binary files /dev/null and b/images/gif.gif differ diff --git a/images/pic.png b/images/pic.png new file mode 100644 index 0000000..56f5dc5 Binary files /dev/null and b/images/pic.png differ diff --git a/package-lock.json b/package-lock.json index 04843bd..694c409 100644 --- a/package-lock.json +++ b/package-lock.json @@ -12,6 +12,7 @@ "@loaders.gl/ply": "^4.2.2", "@petamoriken/float16": "^3.8.7", "tweakpane": "^3.1.8", + "tweakpane-plugin-file-import": "^0.2.0", "wgpu-matrix": "^3.2.0" }, "devDependencies": { diff --git a/src/renderers/gaussian-renderer.ts b/src/renderers/gaussian-renderer.ts index 1684523..f1e9a72 100644 --- a/src/renderers/gaussian-renderer.ts +++ b/src/renderers/gaussian-renderer.ts @@ -1,11 +1,11 @@ -import { PointCloud } from '../utils/load'; -import preprocessWGSL from '../shaders/preprocess.wgsl'; -import renderWGSL from '../shaders/gaussian.wgsl'; -import { get_sorter,c_histogram_block_rows,C } from '../sort/sort'; -import { Renderer } from './renderer'; +import { PointCloud } from "../utils/load"; +import preprocessWGSL from "../shaders/preprocess.wgsl"; +import renderWGSL from "../shaders/gaussian.wgsl"; +import { get_sorter, c_histogram_block_rows, C } from "../sort/sort"; +import { Renderer } from "./renderer"; export interface GaussianRenderer extends Renderer { - + render_settings_buffer: GPUBuffer; } // Utility to create GPU buffers @@ -25,26 +25,122 @@ export default function get_renderer( pc: PointCloud, device: GPUDevice, presentation_format: GPUTextureFormat, - camera_buffer: GPUBuffer, + camera_buffer: GPUBuffer ): GaussianRenderer { - const sorter = get_sorter(pc.num_points, device); - + // =============================================== // Initialize GPU Buffers // =============================================== const nulling_data = new Uint32Array([0]); + const nulling_buffer = createBuffer( + device, + "null_buffer", + nulling_data.byteLength, + GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST, + nulling_data + ); + + const bytesPerSplat = 24; + const splatBufferSize = pc.num_points * bytesPerSplat; + const splat_buffer = createBuffer( + device, + "splat buffer", + splatBufferSize, + GPUBufferUsage.STORAGE, + null + ); + + const indirect_buffer_data = new Uint32Array([6, 0, 0, 0]); + + const indirect_buffer = createBuffer( + device, + "indirect buffer", + indirect_buffer_data.byteLength, + GPUBufferUsage.INDIRECT | GPUBufferUsage.COPY_DST, + indirect_buffer_data + ); + + const render_settings_buffer = createBuffer( + device, + "render settings buffer", + 8, + GPUBufferUsage.COPY_DST | GPUBufferUsage.UNIFORM, + new Float32Array([1.0, pc.sh_deg]) + ); + + // =============================================== + // Create Render Pipeline and Bind Groups + // =============================================== + const render_pipeline = device.createRenderPipeline({ + label: "render", + layout: "auto", + vertex: { + module: device.createShaderModule({ + code: renderWGSL, + }), + entryPoint: "vs_main", + buffers: [], + }, + fragment: { + module: device.createShaderModule({ + code: renderWGSL, + }), + entryPoint: "fs_main", + targets: [ + { + format: presentation_format, + blend: { + color: { + srcFactor: "one", + dstFactor: "one-minus-src-alpha", + operation: "add", + }, + alpha: { + srcFactor: "one", + dstFactor: "one-minus-src-alpha", + operation: "add", + }, + }, + }, + ], + }, + primitive: { + topology: "triangle-list", + }, + }); + + const render_pipeline_bind_group = device.createBindGroup({ + label: "render pipeline bind group", + layout: render_pipeline.getBindGroupLayout(0), + entries: [ + { + binding: 0, + resource: { buffer: splat_buffer }, + }, + { + binding: 1, + resource: { buffer: sorter.ping_pong[0].sort_indices_buffer }, + }, + { + binding: 2, + resource: { + buffer: camera_buffer, + }, + }, + ], + }); // =============================================== // Create Compute Pipeline and Bind Groups // =============================================== const preprocess_pipeline = device.createComputePipeline({ - label: 'preprocess', - layout: 'auto', + label: "preprocess", + layout: "auto", compute: { module: device.createShaderModule({ code: preprocessWGSL }), - entryPoint: 'preprocess', + entryPoint: "preprocess", constants: { workgroupSize: C.histogram_wg_size, sortKeyPerThread: c_histogram_block_rows, @@ -52,35 +148,134 @@ export default function get_renderer( }, }); + const preprocess_camera_bind_group = device.createBindGroup({ + label: "preprocess camera", + layout: preprocess_pipeline.getBindGroupLayout(0), + entries: [ + { + binding: 0, + resource: { buffer: camera_buffer }, + }, + ], + }); + + const gaussian_bind_group = device.createBindGroup({ + label: "preprocess data", + layout: preprocess_pipeline.getBindGroupLayout(1), + entries: [ + { + binding: 0, + resource: { buffer: pc.gaussian_3d_buffer }, + }, + { + binding: 1, + resource: { buffer: pc.sh_buffer }, + }, + ], + }); + + const compute_pipeline_bind_group = device.createBindGroup({ + label: "compute pipeline bind group", + layout: preprocess_pipeline.getBindGroupLayout(3), + entries: [ + { + binding: 0, + resource: { buffer: splat_buffer }, + }, + { + binding: 1, + resource: { buffer: render_settings_buffer }, + }, + ], + }); + const sort_bind_group = device.createBindGroup({ - label: 'sort', + label: "sort", layout: preprocess_pipeline.getBindGroupLayout(2), entries: [ { binding: 0, resource: { buffer: sorter.sort_info_buffer } }, - { binding: 1, resource: { buffer: sorter.ping_pong[0].sort_depths_buffer } }, - { binding: 2, resource: { buffer: sorter.ping_pong[0].sort_indices_buffer } }, - { binding: 3, resource: { buffer: sorter.sort_dispatch_indirect_buffer } }, + { + binding: 1, + resource: { buffer: sorter.ping_pong[0].sort_depths_buffer }, + }, + { + binding: 2, + resource: { buffer: sorter.ping_pong[0].sort_indices_buffer }, + }, + { + binding: 3, + resource: { buffer: sorter.sort_dispatch_indirect_buffer }, + }, ], }); - - // =============================================== - // Create Render Pipeline and Bind Groups - // =============================================== - - // =============================================== // Command Encoder Functions - // =============================================== - + // ============================================== // =============================================== // Return Render Object // =============================================== return { frame: (encoder: GPUCommandEncoder, texture_view: GPUTextureView) => { + // reset sorting buffers + encoder.copyBufferToBuffer( + nulling_buffer, + 0, + sorter.sort_info_buffer, + 0, + 4 + ); + + encoder.copyBufferToBuffer( + nulling_buffer, + 0, + sorter.sort_dispatch_indirect_buffer, + 0, + 4 + ); + + // start compute pass + const preprocess_pass = encoder.beginComputePass({ label: "preprocess" }); + preprocess_pass.setPipeline(preprocess_pipeline); + preprocess_pass.setBindGroup(0, preprocess_camera_bind_group); + preprocess_pass.setBindGroup(1, gaussian_bind_group); + preprocess_pass.setBindGroup(2, sort_bind_group); + preprocess_pass.setBindGroup(3, compute_pipeline_bind_group); + + const workgroups = Math.ceil(pc.num_points / C.histogram_wg_size); + preprocess_pass.dispatchWorkgroups(workgroups); + preprocess_pass.end(); + sorter.sort(encoder); + + encoder.copyBufferToBuffer( + sorter.sort_info_buffer, + 0, + indirect_buffer, + 4, + 4 + ); + + // start render pass + const render_pass = encoder.beginRenderPass({ + label: "render pass", + colorAttachments: [ + { + view: texture_view, + loadOp: "clear", + clearValue: [0, 0, 0, 1], + storeOp: "store", + }, + ], + }); + + render_pass.setPipeline(render_pipeline); + render_pass.setBindGroup(0, render_pipeline_bind_group); + render_pass.drawIndirect(indirect_buffer, 0); + render_pass.end(); }, camera_buffer, + render_settings_buffer, }; } diff --git a/src/renderers/renderer.ts b/src/renderers/renderer.ts index ffdf9ba..f3da2e0 100644 --- a/src/renderers/renderer.ts +++ b/src/renderers/renderer.ts @@ -1,15 +1,18 @@ -import { load } from '../utils/load'; -import { Pane } from 'tweakpane'; -import * as TweakpaneFileImportPlugin from 'tweakpane-plugin-file-import'; -import { default as get_renderer_gaussian, GaussianRenderer } from './gaussian-renderer'; -import { default as get_renderer_pointcloud } from './point-cloud-renderer'; -import { Camera, load_camera_presets} from '../camera/camera'; -import { CameraControl } from '../camera/camera-control'; -import { time, timeReturn } from '../utils/simple-console'; +import { load } from "../utils/load"; +import { Pane } from "tweakpane"; +import * as TweakpaneFileImportPlugin from "tweakpane-plugin-file-import"; +import { + default as get_renderer_gaussian, + GaussianRenderer, +} from "./gaussian-renderer"; +import { default as get_renderer_pointcloud } from "./point-cloud-renderer"; +import { Camera, load_camera_presets } from "../camera/camera"; +import { CameraControl } from "../camera/camera-control"; +import { time, timeReturn } from "../utils/simple-console"; export interface Renderer { - frame: (encoder: GPUCommandEncoder, texture_view: GPUTextureView) => void, - camera_buffer: GPUBuffer, + frame: (encoder: GPUCommandEncoder, texture_view: GPUTextureView) => void; + camera_buffer: GPUBuffer; } export default async function init( @@ -17,14 +20,14 @@ export default async function init( context: GPUCanvasContext, device: GPUDevice ) { - let ply_file_loaded = false; - let cam_file_loaded = false; - let renderers: { pointcloud?: Renderer, gaussian?: Renderer } = {}; - let gaussian_renderer: GaussianRenderer | undefined; - let pointcloud_renderer: Renderer | undefined; - let renderer: Renderer | undefined; + let ply_file_loaded = false; + let cam_file_loaded = false; + let renderers: { pointcloud?: Renderer; gaussian?: Renderer } = {}; + let gaussian_renderer: GaussianRenderer | undefined; + let pointcloud_renderer: Renderer | undefined; + let renderer: Renderer | undefined; let cameras; - + const camera = new Camera(canvas, device); const control = new CameraControl(camera); @@ -35,108 +38,126 @@ export default async function init( camera.on_update_canvas(); }); observer.observe(canvas); - + const presentation_format = navigator.gpu.getPreferredCanvasFormat(); context.configure({ device, format: presentation_format, - alphaMode: 'opaque', + alphaMode: "opaque", }); - // Tweakpane: easily adding tweak control for parameters. const params = { fps: 0.0, gaussian_multiplier: 1, - renderer: 'pointcloud', - ply_file: '', - cam_file: '', + renderer: "pointcloud", + ply_file: "", + cam_file: "", }; const pane = new Pane({ - title: 'Config', + title: "Config", expanded: true, }); pane.registerPlugin(TweakpaneFileImportPlugin); { - pane.addMonitor(params, 'fps', { - readonly:true + pane.addMonitor(params, "fps", { + readonly: true, }); } { - pane.addInput(params, 'renderer', { - options: { - pointcloud: 'pointcloud', - gaussian: 'gaussian', - } - }).on('change', (e) => { - renderer = renderers[e.value]; - }); + pane + .addInput(params, "renderer", { + options: { + pointcloud: "pointcloud", + gaussian: "gaussian", + }, + }) + .on("change", (e) => { + renderer = renderers[e.value]; + }); } { - pane.addInput(params, 'ply_file', { - view: 'file-input', - lineCount: 3, - filetypes: ['.ply'], - invalidFiletypeMessage: "We can't accept those filetypes!" - }) - .on('change', async (file) => { - const uploadedFile = file.value; - if (uploadedFile) { - const pc = await load(uploadedFile, device); - pointcloud_renderer = get_renderer_pointcloud(pc, device, presentation_format, camera.uniform_buffer); - gaussian_renderer = get_renderer_gaussian(pc, device, presentation_format, camera.uniform_buffer); - renderers = { - pointcloud: pointcloud_renderer, - gaussian: gaussian_renderer, - }; - renderer = renderers[params.renderer]; - ply_file_loaded = true; - }else{ - ply_file_loaded = false; - } - }); + pane + .addInput(params, "ply_file", { + view: "file-input", + lineCount: 3, + filetypes: [".ply"], + invalidFiletypeMessage: "We can't accept those filetypes!", + }) + .on("change", async (file) => { + const uploadedFile = file.value; + if (uploadedFile) { + const pc = await load(uploadedFile, device); + pointcloud_renderer = get_renderer_pointcloud( + pc, + device, + presentation_format, + camera.uniform_buffer + ); + gaussian_renderer = get_renderer_gaussian( + pc, + device, + presentation_format, + camera.uniform_buffer + ); + renderers = { + pointcloud: pointcloud_renderer, + gaussian: gaussian_renderer, + }; + renderer = renderers[params.renderer]; + ply_file_loaded = true; + } else { + ply_file_loaded = false; + } + }); } { - pane.addInput(params, 'cam_file', { - view: 'file-input', - lineCount: 3, - filetypes: ['.json'], - invalidFiletypeMessage: "We can't accept those filetypes!" - }) - .on('change', async (file) => { - const uploadedFile = file.value; - if (uploadedFile) { - cameras=await load_camera_presets(file.value); - camera.set_preset(cameras[0]); - cam_file_loaded = true; - }else{ - cam_file_loaded = false; - } - }); + pane + .addInput(params, "cam_file", { + view: "file-input", + lineCount: 3, + filetypes: [".json"], + invalidFiletypeMessage: "We can't accept those filetypes!", + }) + .on("change", async (file) => { + const uploadedFile = file.value; + if (uploadedFile) { + cameras = await load_camera_presets(file.value); + camera.set_preset(cameras[0]); + cam_file_loaded = true; + } else { + cam_file_loaded = false; + } + }); } { - pane.addInput( - params, - 'gaussian_multiplier', - {min: 0, max: 1.5} - ).on('change', (e) => { - //TODO: Bind constants to the gaussian renderer. - }); + pane + .addInput(params, "gaussian_multiplier", { min: 0, max: 1.5 }) + .on("change", (e) => { + //TODO: Bind constants to the gaussian renderer. + if (gaussian_renderer) { + device.queue.writeBuffer( + gaussian_renderer.render_settings_buffer, + 0, + new Float32Array([params.gaussian_multiplier]) + ); + } + }); } - document.addEventListener('keydown', (event) => { - switch(event.key) { - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': + document.addEventListener("keydown", (event) => { + switch (event.key) { + case "0": + case "1": + case "2": + case "3": + case "4": + case "5": + case "6": + case "7": + case "8": + case "9": const i = parseInt(event.key); console.log(`set to camera preset ${i}`); camera.set_preset(cameras[i]); @@ -146,7 +167,7 @@ export default async function init( function frame() { if (ply_file_loaded && cam_file_loaded) { - params.fps=1.0/timeReturn()*1000.0; + params.fps = (1.0 / timeReturn()) * 1000.0; time(); const encoder = device.createCommandEncoder(); const texture_view = context.getCurrentTexture().createView(); diff --git a/src/shaders/gaussian.wgsl b/src/shaders/gaussian.wgsl index 759226d..31fdaaa 100644 --- a/src/shaders/gaussian.wgsl +++ b/src/shaders/gaussian.wgsl @@ -1,22 +1,97 @@ struct VertexOutput { @builtin(position) position: vec4, //TODO: information passed from vertex shader to fragment shader + @location(0) size: vec2, + @location(1) color: vec3, + @location(2) conic: vec4, + @location(3) center: vec2 }; - struct Splat { //TODO: information defined in preprocess compute shader + pos_size: array, + conic: array, + color_sh: array, +}; + +struct CameraUniforms { + view: mat4x4, + view_inv: mat4x4, + proj: mat4x4, + proj_inv: mat4x4, + viewport: vec2, + focal: vec2 }; +@group(0) @binding(0) var splats: array; +@group(0) @binding(1) var sort_indices: array; +@group(0) @binding(2) var camera: CameraUniforms; + @vertex fn vs_main( + @builtin(instance_index) instance_idx: u32, + @builtin(vertex_index) vertex_idx: u32 ) -> VertexOutput { - //TODO: reconstruct 2D quad based on information from splat, pass - var out: VertexOutput; - out.position = vec4(1. ,1. , 0., 1.); - return out; + //TODO: reconstruct 2D quad based on information from splat, pass + let idx = sort_indices[instance_idx]; + let splat = splats[idx]; + + let xy = unpack2x16float(splat.pos_size[0]); + let wh = unpack2x16float(splat.pos_size[1]); + + let x = xy.x; + let y = xy.y; + let w = wh.x * 2.0; + let h = wh.y * 2.0; + + let quads = array ( + vec2f(x - w, y + h), + vec2f(x - w, y - h), + vec2f(x + w, y - h), + vec2f(x + w, y - h), + vec2f(x + w, y + h), + vec2f(x - w, y + h), + ); + + let conic01 = unpack2x16float(splat.conic[0]); + let conic23 = unpack2x16float(splat.conic[1]); + let conic = vec3(conic01.x, conic01.y, conic23.x); + let opacity = conic23.y; + + var vertex_out: VertexOutput; + + vertex_out.conic = vec4f(conic, opacity); + + vertex_out.center = vec2f(x, y); + + vertex_out.position = vec4f(quads[vertex_idx].x, quads[vertex_idx].y, 0.0f, 1.0f); + + vertex_out.size = (wh * 0.5f + 0.5f) * camera.viewport.xy; + + let rg = unpack2x16float(splat.color_sh[0]); + let ba = unpack2x16float(splat.color_sh[1]); + vertex_out.color = vec3(rg.x, rg.y, ba.x); + + return vertex_out; } @fragment fn fs_main(in: VertexOutput) -> @location(0) vec4 { - return vec4(1.); + var pos = (in.position.xy / camera.viewport) * 2.0f - 1.0f; + pos.y = -pos.y; + + var offset = pos.xy - in.center.xy; + offset = vec2f(-offset.x, offset.y) * camera.viewport * 0.5f; + + var power = (in.conic.x * pow(offset.x, 2.0f) + + in.conic.z * pow(offset.y, 2.0f)) * + -0.5f - + in.conic.y * offset.x * offset.y; + + if (power > 0.0f) { + return vec4(0.0f, 0.0f, 0.0f, 0.0f); + } + + let alpha = clamp(in.conic.w * exp(power), 0.0f, 0.99f); + + return vec4(in.color * alpha, alpha); } \ No newline at end of file diff --git a/src/shaders/point_cloud.wgsl b/src/shaders/point_cloud.wgsl index 01dded1..e49cd39 100644 --- a/src/shaders/point_cloud.wgsl +++ b/src/shaders/point_cloud.wgsl @@ -35,7 +35,8 @@ fn vs_main( let pos = vec4(a.x, a.y, b.x, 1.); // TODO: MVP calculations - out.position = pos; + out.position = camera.proj * camera.view * pos; + return out; } diff --git a/src/shaders/preprocess.wgsl b/src/shaders/preprocess.wgsl index bbc63f5..6141af4 100644 --- a/src/shaders/preprocess.wgsl +++ b/src/shaders/preprocess.wgsl @@ -57,9 +57,23 @@ struct Gaussian { struct Splat { //TODO: store information for 2D splat rendering + pos_size: array, + conic: array, + color_sh: array, }; //TODO: bind your data here +@group(0) @binding(0) +var camera: CameraUniforms; +@group(1) @binding(0) +var gaussians: array; +@group(1) @binding(1) +var sh_coeffs: array; +@group(3) @binding(0) +var splats: array; +@group(3) @binding(1) +var render_settings: RenderSettings; + @group(2) @binding(0) var sort_infos: SortInfos; @group(2) @binding(1) @@ -72,7 +86,21 @@ var sort_dispatch: DispatchIndirect; /// reads the ith sh coef from the storage buffer fn sh_coef(splat_idx: u32, c_idx: u32) -> vec3 { //TODO: access your binded sh_coeff, see load.ts for how it is stored - return vec3(0.0); + + let max_num_coefs = 16u; + let coef_offset = c_idx * 3u / 2u; + let base_idx = splat_idx * max_num_coefs * 3u / 2u; + + let packed_rg = sh_coeffs[base_idx + coef_offset]; + let packed_b = sh_coeffs[base_idx + coef_offset + 1u]; + let color01 = unpack2x16float(packed_rg); + let color23 = unpack2x16float(packed_b); + + if (c_idx % 2u == 0u) { + return vec3(color01.x, color01.y, color23.x); + } else { + return vec3(color01.y, color23.x, color23.y); + } } // spherical harmonics evaluation with Condon–Shortley phase @@ -112,7 +140,138 @@ fn computeColorFromSH(dir: vec3, v_idx: u32, sh_deg: u32) -> vec3 { fn preprocess(@builtin(global_invocation_id) gid: vec3, @builtin(num_workgroups) wgs: vec3) { let idx = gid.x; //TODO: set up pipeline as described in instruction + if (idx >= arrayLength(&gaussians)){ + return; + } + + let gaussian = gaussians[idx]; + + let pos_packed = gaussian.pos_opacity[0]; + let pos_x = unpack2x16float(pos_packed).x; + let pos_y = unpack2x16float(pos_packed).y; + + let z_op_packed = gaussian.pos_opacity[1]; + let op_z = unpack2x16float(z_op_packed).x; + let opacity = unpack2x16float(z_op_packed).y; + + let pos_world = vec4(pos_x,pos_y,op_z, 1.0); + + // transform to view space + let pos_view = camera.view * vec4(pos_world); + + // project to clip + let pos_clip = camera.proj * pos_view; + + // convert to ndc + let pos_ndc = pos_clip.xy / pos_clip.w; + + if (pos_ndc.x < -1.2f || pos_ndc.x > 1.2f || + pos_ndc.y < -1.2f || pos_ndc.y > 1.2f || + (camera.view * pos_world).z < 0.0f ) + { + return; + } + + let r01 = unpack2x16float(gaussian.rot[0]); + let r23 = unpack2x16float(gaussian.rot[1]); + let rot = vec4(r01.x, r01.y, r23.x, r23.y); + + let r = rot.x; + let x = rot.y; + let y = rot.z; + let z = rot.w; + + let R = mat3x3f( + 1.0f - 2.0f * (y * y + z * z), 2.0f * (x * y - r * z), 2.0f * (x * z + r * y), + 2.0f * (x * y + r * z), 1.0f - 2.0f * (x * x + z * z), 2.0f * (y * z - r * x), + 2.0f * (x * z - r * y), 2.0f * (y * z + r * x), 1.0f - 2.0f * (x * x + y * y) + ); + + let scale01 = unpack2x16float(gaussian.scale[0]); + let scale23 = unpack2x16float(gaussian.scale[1]); + + let scale = exp(vec3f(scale01.x, scale01.y, scale23.x)); + + let S = mat3x3f( + scale.x * render_settings.gaussian_scaling, 0.0f, 0.0f, + 0.0f, scale.y * render_settings.gaussian_scaling, 0.0f, + 0.0f, 0.0f, scale.z * render_settings.gaussian_scaling + ); + + let covar_matrix_3D = transpose(S * R) * S * R; + + let covar_3D = array( + covar_matrix_3D[0][0], + covar_matrix_3D[0][1], + covar_matrix_3D[0][2], + covar_matrix_3D[1][1], + covar_matrix_3D[1][2], + covar_matrix_3D[2][2], + ); + + let J = mat3x3f( + camera.focal.x / pos_view.z, 0.0f, -(camera.focal.x * pos_view.x) / (pos_view.z * pos_view.z), + 0.0f, camera.focal.y / pos_view.z, -(camera.focal.y * pos_view.y) / (pos_view.z * pos_view.z), + 0.0f, 0.0f, 0.0f + ); + + let W = transpose(mat3x3f( + camera.view[0].xyz, camera.view[1].xyz, camera.view[2].xyz + )); + + let T = W * J; + + let V = mat3x3f( + covar_3D[0], covar_3D[1], covar_3D[2], + covar_3D[1], covar_3D[3], covar_3D[4], + covar_3D[2], covar_3D[4], covar_3D[5], + ); + + var covar_matrix_2D = transpose(T) * transpose(V) * T; + covar_matrix_2D[0][0] += 0.3f; + covar_matrix_2D[1][1] += 0.3f; + + let covar_2D = vec3( + covar_matrix_2D[0][0], + covar_matrix_2D[0][1], + covar_matrix_2D[1][1] + ); + + let determinant = covar_2D.x * covar_2D.z - (covar_2D.y * covar_2D.y); + + if (determinant == 0.0f) { + return; + } + + let mid = (covar_2D.x + covar_2D.z) * 0.5f; + let lambda1 = mid + sqrt(max(0.1f, mid * mid - determinant)); + let lambda2 = mid - sqrt(max(0.1f, mid * mid - determinant)); + let radius = ceil(3.0f * sqrt(max(lambda1, lambda2))); + + let view_dir = normalize(pos_world.xyz - camera.view_inv[3].xyz); + let color = computeColorFromSH(view_dir, idx, u32(render_settings.sh_deg)); + + let sortKeysIdx = atomicAdd(&sort_infos.keys_size, 1u); + + splats[sortKeysIdx].pos_size[0] = pack2x16float(pos_ndc.xy); + splats[sortKeysIdx].pos_size[1] = pack2x16float(vec2(radius, radius) / camera.viewport); + + let conic = vec3f( covar_2D.z / determinant, -covar_2D.y / determinant, covar_2D.x / determinant); + let conic01 = pack2x16float(conic.xy); + let conic23 = pack2x16float(vec2(conic.z, 1.0f / (1.0f + exp(-opacity)))); + splats[sortKeysIdx].conic[0] = conic01; + splats[sortKeysIdx].conic[1] = conic23; + splats[sortKeysIdx].color_sh[0] = pack2x16float(vec2(color.r, color.g)); + splats[sortKeysIdx].color_sh[1] = pack2x16float(vec2(color.b, 1.0f)); + + // depth for sort + let depth = pos_view.z; + sort_depths[sortKeysIdx] = bitcast(100.0f - depth); + sort_indices[sortKeysIdx] = sortKeysIdx; + + let keys_per_dispatch = workgroupSize * sortKeyPerThread; - let keys_per_dispatch = workgroupSize * sortKeyPerThread; - // increment DispatchIndirect.dispatchx each time you reach limit for one dispatch of keys + if (sortKeysIdx % keys_per_dispatch == 0u) { + atomicAdd(&sort_dispatch.dispatch_x, 1u); + } } \ No newline at end of file