Long running compute kernels in wgpu? #2353

phoekz · 2022-01-03T11:54:10Z

phoekz
Jan 3, 2022

Hey guys. I have some compute kernels which would take up to a minute to complete. Unfortunately the Windows TDR kills the GPU if a kernel runs for too long (a couple secs). I am now splitting the big kernel down to smaller loads, but I am still getting killed by the TDR. Here's what I have tried so far:

// One pass.
let mut encoder = device.create_command_encoder(..);
{
    let mut pass = device.begin_compute_pass(..);
    for _ in 0..many {
        pass.dispatch(..);
    }
}
queue.submit(Some(encoder.finish()));

// Many passes.
let mut encoder = device.create_command_encoder(..);
for _ in 0..many {
    let mut pass = device.begin_compute_pass(..);
    pass.dispatch(..);
}
queue.submit(Some(encoder.finish()));

// Many command buffers.
for _ in 0..many {
    let mut encoder = device.create_command_encoder(..);
    {
        let mut pass = device.begin_compute_pass(..);
        pass.dispatch(..);
    }
    queue.submit(Some(encoder.finish()));
}

I couldn't find any way to sync or flush the queue in the WebGPU specs. I tried adding a couple of device.poll(wgpu::Maintain::Wait); between each submits, but that didn't work either.

Of course I could change the TDR limit in the registry, but it's not a very satisfactory solution 🙂, can't force the users to do it for example.

If it matters, I am running headlessly, so no windows or surfaces. Basically I took your capture example and built the app around it.

I am running wgpu = "0.12.0" on Windows 11 with Vulkan backend on RTX 3080.

Answered by cwfitzgerald

Jan 3, 2022

My best guess is that TDR applies to queue submissions not individual dispatches, so you should try to split up the work into submissions of reasonable time (maybe 100ms). This will give you lots of headroom if you underestimate the time things will take. As an extension, you could use timestamp queries to dynamically adjust the load per submit bases on feedback.

View full answer

cwfitzgerald · 2022-01-03T16:11:56Z

cwfitzgerald
Jan 3, 2022
Maintainer

My best guess is that TDR applies to queue submissions not individual dispatches, so you should try to split up the work into submissions of reasonable time (maybe 100ms). This will give you lots of headroom if you underestimate the time things will take. As an extension, you could use timestamp queries to dynamically adjust the load per submit bases on feedback.

2 replies

phoekz Jan 3, 2022
Author

Thank you, that was it! I split the kernel into much smaller pieces, where before I did 256 dispatches in total, now I do over 64k dispatches. My computer no longer dies when running this app.

phoekz Jan 3, 2022
Author

To clarify, I now do 64k queue submissions. One dispatch per submission.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long running compute kernels in wgpu? #2353

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Long running compute kernels in wgpu? #2353

phoekz Jan 3, 2022

Replies: 1 comment · 2 replies

cwfitzgerald Jan 3, 2022 Maintainer

phoekz Jan 3, 2022 Author

phoekz Jan 3, 2022 Author

phoekz
Jan 3, 2022

Replies: 1 comment 2 replies

cwfitzgerald
Jan 3, 2022
Maintainer

phoekz Jan 3, 2022
Author

phoekz Jan 3, 2022
Author