[WIP] Webgpu pathtracer prototype #705

TheBlek · 2025-11-15T09:20:51Z

Hi @gkjohnson! Here's a simple lambertian pathtracer. Both megakernel and wavefront, as discussed in #692. See webgpu_primitives demo. There is still a lot of work to be done here. I will try to keep an up-to-date todo list here so you could follow the progress.

At what point do you propose measuring performance?

I think it is not so difficult to have both megakernel and wavefront pathtracers as they share the essential code (for now at least). And it could be useful to have a point of reference when implementing additional features. And I'm not sure how relative performance will scale with features.

In webgpu by default we can bind maximum 128Mb buffers and there is no way to request bigger limits in three.js. So that leaves us with max tile resolution of 1920x1080 (which I hardcoded in the example for now). I'm wondering whether requesting bigger webgpu limits or features for that matter is a worthwhile feature for three.js. What do you think?

This buffer limit also limits the scene size at least until the TLAS is deployed and tracing is per-mesh instead of on the whole scene.

Also by default we can bind a maximum of 8 buffers for use which is a bottleneck for traceRay kernel. It needs a lot of buffers so I had to move reading material index into the next kernel (bsdf eval phase). This could be fixed if three.js had support for array types in structs and top-level storage buffer backed structs. At least I could not get it to work, so I had to put queue sizes into a separate buffer and pass separately. I also wonder to write an issue about the limitation in three.js but maybe I just didn't find it.

What's your policy on three.js version for webgpu release here? Ideally we would use r181 as it includes support for indirect compute (which is essential for wavefront pathtracing to not read-back values from GPU between calls). But it released like 5 days ago :) For now I copy-pasted compute methods from WebGpuRenderer and WebGpuBackend with little tweaks in order to compute indirectly. see computeIndirect and computeBackendIndirect functions in PathTracerCore.

Just saw that the library recently upgraded to r181 - perfect!

Would love any feedback!

…h timings.

…nu at the bottom.

… texture

# Conflicts: # package-lock.json # package.json

gkjohnson · 2025-11-17T02:25:01Z

Amazing work! Thanks for putting the time in. In terms of getting an initial version merged I think these are some good goals. I think this should help get the general architecture defined and answer some of the open questions. I think your other TODOs can be added in subsequent PRs which will make things easier to review:

Demo with full resolution support and window resizing.
Lambert support only.
Support for megakernal & wavefront path tracing to compare performance.
Organize the files

In terms of making this easier to understand - do you mind splitting some of the code out into separate files? The "PathTracerCore" file is pretty monolithic and difficult to follow at the moment. I think these changes will make it easier to give some feedback:

I think we can move the compute nodes to a webgpu/nodes folder.
Lets remove commented code that's not being used
Can we document a bit more of what's happening especially in the wavefront path tracer step?

And some other comments:

So that leaves us with max tile resolution of 1920x1080

It would be helpful to know what data is needed at each step of the path tracer. But organizing things from the beginning so that tiled rendering is supported I think is our best bet. Once three.js supports requesting larger buffers users can use that but we'll still need a graceful fallback either way.

At what point do you propose measuring performance?

Let's get to a point where we have a full resolution scene rendering and we can take a look at the pros / cons of the megakernal and wavefront approach. The wavefront approach may also wind up having more of a noticeable improvement later on when we can terminate rays early and / or sort them and therefore avoid running large blocks of work.

Also just a note that the "megakernel" path seems to not be working for me. I see this error:

'TRI_INTERSECT_EPSILON'
    if ( abs( det ) < TRI_INTERSECT_EPSILON ) {
                      ^^^^^^^^^^^^^^^^^^^^^


 - While calling [Device].CreateShaderModule([ShaderModuleDescriptor ""compute""]).

gkjohnson

Thanks for organizing things - it helps quite a bit. I've just taken a deeper look at things and I think I'm understanding how things are roughly working, now. I've added a few initial questions but excuse me if I've asked anything obvious as I wrap my head around things 😅

gkjohnson · 2025-11-17T23:25:22Z

src/webgpu/nodes/random.wgsl.js

+` );
+
+export const pcgInit = wgslFn( /* wgsl */`
+	fn pcg_initialize(state: ptr<function, PcgState>, p: vec2u, frame: u32) -> void {


nit: is there a reason we can't store the "state" variable as a global variable? It seems more simple than passing a state argument in every time.

I did not want to think how to do that in nodes system initially. Will take a look if that's possible.

Agreed! I was just curious if there was a reason it couldn't be done.

gkjohnson · 2025-11-17T23:25:57Z

src/webgpu/nodes/reset.wgsl.js

+
+export const resetFn = wgslFn( /* wgsl */ `
+
+	fn reset(


nit: Can we name this function and comment it so it's more clear what it's resetting? What is this for?

This shader resets result and sample count buffers, so any accumulated color stored becomes and sample count becomes 0. Will adjust name to better reflect the meaning

gkjohnson · 2025-11-17T23:26:39Z

src/webgpu/nodes/sampling.wgsl.js

+export const lambertBsdfFunc = wgslFn( /* wgsl */`
+	fn bsdfEval(rngState: ptr<function, PcgState>, normal: vec3f, view: vec3f) -> ScatterRecord {
+
+		const PI: f32 = 3.141592653589793;


We should probably move this to a constants node.

gkjohnson · 2025-11-18T12:19:08Z

src/webgpu/PathTracerCore.js

+				1,
+			];
+			// 0. Clean queues
+			_renderer.compute( this.cleanQueuesKernel, 1 );


I'm wondering if we want to / need to clear these kernels? The buffer size limits may make this more complicated but I was originally thinking of these queues as ongoing "work queues" that would have work constantly pushed onto them over multiple frames. Rays from still-unterminated paths from sample 1 could still be in the queue while rays from sample 2 (or 3 or 4) are being pushed onto the same queue, for example. Then there can always be work available on the queue for the compute nodes to pick up.

"Finishing" a render could mean a few things:

The maximum samples have been reached for every pixel and the last ray from the queue is complete.

Some pixels can be terminated early if they are determined to already be "resolved" enough (see Add ability to report noise on each pass, allows halting on a noise threshold #198)

Small tiles can be run to start off new paths and fill up the queue as needed. This would also let users really tune in how much work they want to do per frame - rather than samples the use can say "only process X rays per frame" and that can be something that's auto-tunable.

And maybe a separate thought but I'm also wondering if sorting rays can somehow help with performance and thread utilization.

rather than samples the use can say "only process X rays per frame" and that can be something that's auto-tunable.

I see. This sounds interesting and promising in terms of adapting to hardware's capabilities. However, I think this should be done later, after the essential stuff is setup. Right now, ray generation kernel fully loads the ray queue with rays without checking for overflows so we have to clean the queues.

And maybe a separate thought but I'm also wondering if sorting rays can somehow help with performance and thread utilization.

This is an interesting idea to explore. I can see how sorting ray in a certain way can improve thread utilization. Certainly a topic to explore.

Perhaps we should create issues for those and return later on?

I agree we can take a look at these later. I added some small notes to #692.

rather than samples the use can say "only process X rays per frame" and that can be something that's auto-tunable.

I see. This sounds interesting and promising in terms of adapting to hardware's capabilities. However, I think this should be done later

Once the resolution of the render in the demo is right and we can get this merged I'll take a look at this if that's okay. I'm pretty interested in it

gkjohnson · 2025-11-18T12:32:07Z

src/webgpu/nodes/wavefront.wgsl.js

+export const writeTraceRayDispatchSize = wgslFn( /* wgsl */ `
+	fn writeTraceRayDispatchSize(
+		outputBuffer: ptr<storage, array<u32>, read_write>,
+
+		queueSizes: ptr<storage, array<atomic<u32>>, read_write>,
+
+		workgroupSize: u32,
+	) -> void {
+		atomicStore(&queueSizes[1], 0);
+		atomicStore(&queueSizes[2], 0);
+
+		let size = atomicLoad(&queueSizes[0]);
+		outputBuffer[0] = u32( ceil( f32(size) / f32( workgroupSize ) ) );
+		outputBuffer[1] = 1;
+		outputBuffer[2] = 1;
+	}
+
+` );
+
+export const writeEscapedRayDispatchSize = wgslFn( /* wgsl */ `
+	fn writeTraceRayDispatchSize(
+		outputBuffer: ptr<storage, array<u32>, read_write>,
+
+		queueSizes: ptr<storage, array<atomic<u32>>, read_write>,
+		workgroupSize: u32,
+	) -> void {
+		let size = atomicLoad(&queueSizes[2]);
+		outputBuffer[0] = u32( ceil( f32(size) / f32( workgroupSize ) ) );
+		outputBuffer[1] = 1;
+		outputBuffer[2] = 1;
+	}
+
+` );
+
+export const writeBsdfDispatchSize = wgslFn( /* wgsl */ `
+	fn writeBsdfDispatchSize(
+		queueSizes: ptr<storage, array<atomic<u32>>, read_write>,
+		outputBuffer: ptr<storage, array<u32>, read_write>,
+		workgroupSize: u32
+	) -> void {
+
+		atomicStore(&queueSizes[0], 0);
+
+		let count = atomicLoad(&queueSizes[1]);
+		outputBuffer[0] = u32( ceil( f32(count) / f32( workgroupSize ) ) );
+		outputBuffer[1] = 1;
+		outputBuffer[2] = 1;
+	}
+`, );
+


It looks like these are just copying data to a indirect buffers so we can issue indirect compute commands, is that right? Is it the case that we can't use atomic storage buffers are indirect compute arguments?

It looks like these are just copying data to a indirect buffers so we can issue indirect compute commands, is that right?

Yeah, we are writing the dispatch size in the indirect buffer.

Is it the case that we can't use atomic storage buffers are indirect compute arguments?

Well, we possibly could reuse storage buffer for atomics as one of the indirect buffers, but we cant get rid of them all. In the case of escaped rays and bsdf sampling we need two dispatch buffers and if we use queue sizes buffer for one of them, relevant values for other will be lost.

Theoretically we could create only one dispatch buffer for everything and dispatch with offsets. So values [0; 3) of the array will be dispatch size for escaped ray and [3; 6) - for bsdf sample kernel. But I think this is not supported in three.js right now

gkjohnson · 2025-11-18T12:40:22Z

src/webgpu/PathTracerCore.js

+				// 2.1 Dispatch shaders
+				// When processing escaped rays, calculate new contribution and add that to the result image
+				_renderer.compute( this.escapedRayKernel, this.escapedRayDispatchBuffer );
+				// When processing hit results, sample a new ray according to material's properties
+				_renderer.compute( this.bsdfEvalKernel, this.bsdfDispatchBuffer );


I'm wonder if it's possible to merge this "escaped ray" kernel into one of the other ones?

It is possible. However separation is deliberate and should help with thread utilization. The main reason why wavefront pathtracing would be faster than regular megakernel is more unified kernel execution duration and hence better thread utilization.

This was my thinking when writing this.

However I can see now, that the original article had a unified "Shade stage" for escaped and intersected rays. We could experiment with this in the future, once the performance baseline is established.

TheBlek and others added 19 commits October 16, 2025 20:07

Make basic raytracing work. Barebones proto

50139b2

Remove unused temp geometry, cleanup

c761e9a

Pass basic material information to compute shader

132139e

Add randomness, accumulate color over multiple bounces, lambert brdf

3fbb983

Get working basic version of accumulation

ebf4b1d

Fix cosine sampling pdf value

394d17e

Fix getVertexAttribute missing from dependencies. Add console.log wit…

a4df761

…h timings.

Written basic scaffolding for wavefront pathtracer. Added a little me…

7bf3fc4

…nu at the bottom.

Use a buffer to store intermediate state for pathtracing instead of a…

de42afc

… texture

Display total sample count in the bottom

c0b3a08

Create a rough outline for wavefront workflow

577df8e

Move to 0.9.2; write all the needed kernels. tsl is broken for now.

2a5a756

Fix all compilation error except for the buffer size.

401a6a0

Pack HitResult to fit into 128mb buffer

bdd9dfd

Fix wavefront demo. It works now. No cleanup

f0401be

Create new computeKernel nodes on geometry change.

aef28fe

Cleanup.

5b15ae7

Restore PhysicalPathTracingMaterial

4a50400

Merge remote-tracking branch 'origin/main' into webgpuPrototype

cd71365

# Conflicts: # package-lock.json # package.json

TheBlek added 2 commits November 17, 2025 20:38

Move webgpu shaders into separate files.

7774d04

Use three.js's compute indirect feature. Add more comments.

0bb6557

gkjohnson reviewed Nov 18, 2025

View reviewed changes

Uh oh!

[WIP] Webgpu pathtracer prototype #705

Are you sure you want to change the base?

[WIP] Webgpu pathtracer prototype #705

Uh oh!

Conversation

TheBlek commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkjohnson commented Nov 17, 2025

Uh oh!

gkjohnson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlek Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TheBlek commented Nov 15, 2025 •

edited

Loading

gkjohnson left a comment •

edited

Loading

TheBlek Nov 18, 2025 •

edited

Loading