[core] defer buffer/texture destruction while used by a command buffer #8706

uael · 2025-12-11T15:58:33Z

Description
Allow setting retain_command_buffer_references back to false by deferring buffer and texture destroy if used by a command buffer. Replace #8694

Testing
CTS and tests

Checklist

Run cargo fmt.
~~Run taplo format.~~
Run cargo clippy --tests. If applicable, add:
- ~~--target wasm32-unknown-unknown~~
Run cargo xtask test to run tests.
~~If this contains user-facing changes, add a CHANGELOG.md entry.~~

uael · 2025-12-11T19:35:20Z

CTS still hangs on Mac, I need to investigate this

andyleiserson · 2025-12-11T21:31:58Z

The known cause of CTS hangs on Mac is #3084, although that doesn't totally make sense here, because the test list hasn't changed.

uael · 2025-12-12T04:30:52Z

Is this approach the right path forward ? Let me add my measurements regarding the auto-retain or not flag and memory usage later today

uael · 2025-12-12T07:36:53Z

@andyleiserson rebased to get your CTS filter changes, it's now hanging with 'webgpu:api,validation,encoding,cmds,copyTextureToTexture:texture_format_compatibility:*', I just tested locally and they all passes 2327 / 2843, other are skipped. Any idea where I should have look first ?

andyleiserson · 2025-12-15T23:26:35Z

I'm not sure about the approach. There is already a mechanism to defer destruction for buffers/textures that are in use -- see the code at the end of Buffer::destroy and Texture::destroy. There is a tension between safely keeping resources alive when needed, and enabling applications to destroy them immediately when desired to recover memory. The case where a resource is referenced in a command buffer, then destroyed before that command buffer is submitted, is intended to be supported (there is a check on submit that resources have not been destroyed).

It may be useful to look at #8129 and the "full set of changes" referenced in the description, in particular 42e4a04. This was a (never merged) attempt to address the resource lifetime problem a different way closer to what you are doing, where the resources would be kept alive via Arc references in the tracker and recovered only once all the references went away. It changed destroy to replace the Arc reference held in the hub with a tombstone, allowing the resource to be destroyed at that point if there weren't other references in trackers keeping it alive.

I don't think it should be necessary to have both the in_flight_count mechanism and the schedule_resource_destruction mechanism for managing lifetime.

I would still be interested to see a test case. We have made changes over the past few months (in particular, encoding on finish) that should have made it harder to destroy resources and then still end up trying to use a command buffer that references them. The exception I know of is #7816, I believe (other than setting the retain references flag) that issue is still outstanding and I'm not aware of a strategy for fixing it besides keeping all resources alive whenever they are referenced in a command buffer. But I think that might be a Metal bug, and I don't know if we want to remove the ability to eagerly destroy resources entirely if we can get it fixed in Metal.

uael · 2025-12-16T05:20:02Z

Thank you for the detailed explanation. I clearly didn't had enough context when approaching this but what I'm sure about is that #7816 doesn't happen anymore with this proposed approach (and the previous closed one), even with unretained references (without #7842).

I would still be interested to see a test case.

Unfortunately all I have is memory measurements from our iOS app and #7842 is definitely the culprit: memory indefinitely accumulate when this specific commit is cherry-picked. But that's on v25.

I'm not sure about the approach.

In the end I'm not even looking for this exact proposed approach/behavior, I was just trying to get #7816 fixed without the need for #7842. Hopping that it can fix the following as well:

-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:3459: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x1220c7a00 (label: (wgpu internal) Signal):
<MTLToolsObject: 0x600002b02450> -> <MTLSimBuffer: 0x60000382c300>
    label = Render Pass Vertex Buffer 
    length = 36 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    purgeableState = MTLPurgeableStateNonVolatile'
CoreSimulator 1048 - Device: iPad mini (A17 Pro) (395B57A4-D87A-4845-90CB-168FA4CE7140) - Runtime: iOS 26.0 (23A339) - DeviceType: iPad mini (A17 Pro)
Can't show file for stack frame : <DBGLLDBStackFrame: 0x827be4000> - stackNumber:12 - name:core::ptr::drop_in_place$LT$wgpu_hal..metal..Buffer$GT$::h4d4355beee563fc7 [inlined]. The file path does not exist on the file system: /rustc/1159e78c4747b02ef996e55082b704c09b970588/library/core/src/ptr/mod.rs

Let me re-run my measurements against trunk instead of v25, with and without #7842.

But I think that might be a Metal bug, and I don't know if we want to remove the ability to eagerly destroy resources entirely if we can get it fixed in Metal.

This appear to be true, but the validation error above make me think that something might still be wrong in wgpu tracking regarding Metal expected behavior. But again, that was on v25, do you recall any recent change that could have fixed the above ?

uael · 2025-12-17T05:17:30Z

Closing as I'm unable to reproduce on trunk, sounds like it has been fixed already. I should have started from there directly - sorry for the time loss.

andyleiserson · 2025-12-17T17:50:47Z

No worries. I do think that turning the retained references flag back off is desirable -- I just haven't had time to dig into it.

Re: the MTLDebugDevice error, a lot has changed since v25 aimed at resolving this kind of issue, so I wouldn't be surprised if it has been resolved, but I can't say anything for sure.

uael mentioned this pull request Dec 11, 2025

[metal] explicitly retain resources used by command buffers #8694

Closed

6 tasks

ErichDonGubler requested a review from andyleiserson December 11, 2025 16:57

ErichDonGubler assigned andyleiserson Dec 11, 2025

[core] defer buffer/texture destruction while used by a command buffer

fe9eb2d

uael force-pushed the uael/command-buffer-in-flight branch from f44dcdb to fe9eb2d Compare December 12, 2025 06:51

uael closed this Dec 17, 2025

andyleiserson mentioned this pull request Dec 17, 2025

Turn off Metal retain_command_buffer_references flag #8747

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] defer buffer/texture destruction while used by a command buffer #8706

[core] defer buffer/texture destruction while used by a command buffer #8706

uael commented Dec 11, 2025

Uh oh!

uael commented Dec 11, 2025

Uh oh!

andyleiserson commented Dec 11, 2025

Uh oh!

uael commented Dec 12, 2025

Uh oh!

uael commented Dec 12, 2025

Uh oh!

andyleiserson commented Dec 15, 2025

Uh oh!

uael commented Dec 16, 2025

Uh oh!

uael commented Dec 17, 2025

Uh oh!

andyleiserson commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[core] defer buffer/texture destruction while used by a command buffer #8706

[core] defer buffer/texture destruction while used by a command buffer #8706

Conversation

uael commented Dec 11, 2025

Uh oh!

uael commented Dec 11, 2025

Uh oh!

andyleiserson commented Dec 11, 2025

Uh oh!

uael commented Dec 12, 2025

Uh oh!

uael commented Dec 12, 2025

Uh oh!

andyleiserson commented Dec 15, 2025

Uh oh!

uael commented Dec 16, 2025

Uh oh!

uael commented Dec 17, 2025

Uh oh!

andyleiserson commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants