Reduce the memory footprint of wide tile commands #1325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

LaurenzV merged 21 commits into main from laurenz/props

Dec 29, 2025

Collaborator

LaurenzV commented Dec 16, 2025

This PR reduces the memory footprint of wide tile commands from 48 bytes to 16 bytes. This is achieved by introducing a number of changes:

For fill/alpha fills, we store the core properties (paint, mask, etc.) in a separate buffer that is built on a per-path basis, and the commands now simply contain an index that allows us to fetch the properties once we need them without having to duplicate them in each command.
A similar pattern is used for clip commands.
Clip commands now use u16 instead of u32 for x/width.
Instead of storing the alpha index as a usize, we store an absolute offset on a per-path basis, and each command only stores the relative offset to the index of the path it belongs to. By doing so, using u32 for each command to store the alpha index is sufficiently large.

This leads to some nice speedups in many cases:

LaurenzV added 13 commits

December 16, 2025 11:35


          Use cmd_props

90e4c6d


          Use u16 for clip commands

105895e


          Store alpha indices relative for alpha fill commands

bbd5e4c


          Collect clip properties in struct

502517d


          cleanup

0276efd


          more fixes

bafa4bd


          Pass props as a whole

8d0aff8


          Make absolute offset private

13cf85a


          Reformat

8b0b74a


          Add comment

d420a8a


          typo

5a2927c


          Adapt comment

2e3d971


          Imports

57b34bb

grebmeg self-requested a review

December 19, 2025 00:08

Collaborator

grebmeg commented Dec 19, 2025

Hey @LaurenzV, awesome work! I’m adding myself as a reviewer to indicate that I plan to review the PR, but I’ll likely be able to get to it on Monday. 🙏

grebmeg reviewed

View reviewed changes

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs

    
                  pub strips: Box<[Strip]>,

                  #[cfg(feature = "multithreading")]

                  /// The index of the thread that owns the alpha buffer.

                  /// Always 0 in single-threaded mode.

Collaborator

grebmeg Dec 21, 2025

If I’m calculating the struct size correctly, it’s now 32 bytes instead of 24 bytes in single-threaded mode, so we’re paying an additional 8 bytes even though this field isn’t used there. While we probably won’t have thousands of clips simultaneously, would it still make sense to keep this behind a feature gate?

Collaborator Author

LaurenzV Dec 28, 2025

Is this really a problem though? That struct is only used to keep track of the clip stack, and realistically you aren't going to have more than a few nested clip path, so the impact on memory is very negligible I think (unlike the wide tile commands, where you could end up with thousands or even tens of thousands of commands depending on what you are drawing)? I could add it back, but the main reason I removed it is that it keeps the code simpler because we don't have to slap on the cfg attribute everywhere we use it.

Collaborator

grebmeg Dec 28, 2025

No, it’s not a problem, I just think paying for something you don’t use isn’t ideal, even if the cost is negligible and the code is slightly simpler.

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs Show resolved Hide resolved

sparse_strips/vello_hybrid/src/schedule.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs Outdated

    
                          tiles,

                          width,

                          height,

                          props: Props::default(),

Collaborator

grebmeg Dec 22, 2025

Would it make sense to assign reasonable starting capacities for vectors in Props, specifically for FillProps? I think we can assume it will need to grow during the first render since it’s the primary data-holding structure. It might also be worth checking whether this shows any impact in benchmarks.

Collaborator Author

LaurenzV Dec 28, 2025

Any suggestions for that? I think the impact will be negligible since, as before, we aren't likely pushing a lot of elements into it, but if you have a preference for the starting capacity I can add that. I don't think it would have a performance impact because the benchmarks (or at least the Blend2D benchmarks) runs with a "warm" render context, i.e. a render context that has been previously rested.

Collaborator

grebmeg Dec 28, 2025

No worries then, choosing the right capacity size isn’t easy. Regarding the warm render context, are you saying the render context isn’t recreated from scratch, but instead reset after each render?

LaurenzV added 7 commits

December 28, 2025 09:54


          Rename Props to CommandAttrs

749c072


          Use u32 instead of usize for alpha indices

7a02fdc


          Apply suggestion

52431d6


          Apply suggestion

816b41b


          Remove irrelevant try into

e393ffa


          Add a TODO comment

10cb0d1


          Fix clippy

b5b7337

LaurenzV requested a review from grebmeg

December 28, 2025 16:26

grebmeg approved these changes

View reviewed changes

sparse_strips/vello_common/src/coarse.rs

    
                  pub strips: Box<[Strip]>,

                  #[cfg(feature = "multithreading")]

                  /// The index of the thread that owns the alpha buffer.

                  /// Always 0 in single-threaded mode.

Collaborator

grebmeg Dec 28, 2025

No, it’s not a problem, I just think paying for something you don’t use isn’t ideal, even if the cost is negligible and the code is slightly simpler.

sparse_strips/vello_common/src/coarse.rs Outdated

    
                          tiles,

                          width,

                          height,

                          props: Props::default(),

Collaborator

grebmeg Dec 28, 2025

No worries then, choosing the right capacity size isn’t easy. Regarding the warm render context, are you saying the render context isn’t recreated from scratch, but instead reset after each render?

sparse_strips/vello_common/src/coarse.rs Outdated Show resolved Hide resolved

sparse_strips/vello_hybrid/src/schedule.rs Outdated Show resolved Hide resolved

sparse_strips/vello_common/src/coarse.rs

    
                  /// The index of the thread that owns the alpha buffer

                  /// containing the mask values at `alpha_idx`.

                  /// Always 0 in single-threaded mode.

                  pub thread_idx: u8,

Collaborator

grebmeg Dec 28, 2025

I guess we can consider this negligible in ClipProps as well? And for FillProps, not having thread_idx wouldn’t actually change the struct size, right?

Collaborator Author

LaurenzV Dec 29, 2025

I think so, yes. As mentioned though, I can change it back in case you consider it important. 😄


          Part 2 of properties -> attributes

34a42b1

Collaborator Author

LaurenzV commented Dec 29, 2025

No worries then, choosing the right capacity size isn’t easy. Regarding the warm render context, are you saying the render context isn’t recreated from scratch, but instead reset after each render?

That's how it's currently done in the Blend2D benchmark suite, yep!

Collaborator Author

LaurenzV commented Dec 29, 2025

@grebmeg Thanks a lot!! I'll try to take a look at your other PRs soon.

LaurenzV added this pull request to the merge queue

Merged via the queue into main with commit f6283d7

17 checks passed

LaurenzV deleted the laurenz/props branch

December 29, 2025 08:50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet