perf(vello_common): track has_opacities to skip alpha blending #1329

grebmeg · 2025-12-18T22:28:43Z

By tracking image opacities, we can skip alpha blending for fully opaque image fills and directly override blend_buf (code here), resulting in a notable performance gain over the main branch (the change here adds about ~24% improvement):

Low quality:

sparse_strips/vello_bench/src/integration.rs

sparse_strips/vello_common/src/pixmap.rs

sparse_strips/vello_cpu/src/fine/lowp/image.rs

LaurenzV · 2026-01-05T15:49:07Z

sparse_strips/vello_cpu/src/fine/lowp/image.rs

+        // Widen to u16, then compute `256 - fx` to ensure fx + fx_inv = 256.
        let fx = self.simd.widen_u8x16(fx);
        let fy = self.simd.widen_u8x16(fy);
+        let fx_inv = u16x16::splat(self.simd, 256) - fx;


Are you sure it's correct to use 256 here? I believe the previous 255 was correct.

Just to explain, in the expression fract(y_positions + 0.5) * 256.0, the expression fract(y_positions + 0.5) will yield a value in the range [0.0, 1.0) (note that 1.0 is exclusive), and therefore it will be mapped to the range [0.0, 256.0), so if we end up with something like 255.5, it will be clamped to 255 after converting to u8. The inverse should then be 0, because the overall value is supposed to be at most 255 (the maximum value of u8), not 256. It's been a while since I wrote this code, but I think that should be right, or am I missing something?

Let's say for example that we are sampling the exact center of the pixel (0.5, 0.5). In this case, fx ends up being 0, and using your code fx_inv would be 256. If we are now sampling a white pixel, we would then be calculating 256 * 255, which would overflow the u16. So I think it should be right that fx + fx_inv = 255. 🤔

LaurenzV · 2026-01-05T15:54:04Z

sparse_strips/vello_cpu/src/fine/lowp/image.rs

-        let ip1 = (p00 * fx_inv + p10 * fx) >> 8;
-        let ip2 = (p01 * fx_inv + p11 * fx) >> 8;
-        let res = self.simd.narrow_u16x16((ip1 * fy_inv + ip2 * fy) >> 8);
+        // Add rounding bias before shifting: round(x/256) = floor((x + 128) / 256).


When I wrote this code I was aware that >> 8 is technically not correct, because we really want to divide by 255 and not 256, but I didn't consider it as critical since there is going to be a lot of imprecision anyway from the various calculations that are performed, and doing the shift is much faster.

So, if we really want to fix this, I think the correct approach would be to use the div_255 method instead of bit-shifting. However, it is probably worth checking how much this slows down the benchmarks. What do you think?

I used 256 because it allows a simple >> 8 shift for division, avoiding the need for div_255().

There’s no overflow risk, the maximum intermediate value is 256 × 255 + 128 = 65,408, which fits within u16. The +128 rounding bias compensates for some precision loss, since round(x / 256) = floor((x + 128) / 256).

You’re right that this introduces a small asymmetry at the edges, when fract() → 1.0, fx caps at 255 due to u8 clamping, but this should be imperceptible in practice.

I benchmarked your snippet, it behaves correctly and passes the tests, but it introduces a ~2.5% regression. Given that, what’s your opinion on whether a more precise but slightly slower approach is preferable, or if the faster, less precise version is acceptable here?

Ah true, I got the overflow part wrong, my bad!

LaurenzV · 2026-01-05T16:26:03Z

I think this is how it should look like, what do you think?

let fx = f32_to_u8(element_wise_splat(
    self.simd,
    fract(x_positions + 0.5).madd(255.0, 0.5),
));
let fy = f32_to_u8(element_wise_splat(
    self.simd,
    fract(y_positions + 0.5).madd(255.0, 0.5),
));

let fx = self.simd.widen_u8x16(fx);
let fy = self.simd.widen_u8x16(fy);
let fx_inv = u16x16::splat(self.simd, 255) - fx;
let fy_inv = u16x16::splat(self.simd, 255) - fy;

let x_pos1 = extend_x(x_positions - 0.5);
let x_pos2 = extend_x(x_positions + 0.5);
let y_pos1 = extend_y(y_positions - 0.5);
let y_pos2 = extend_y(y_positions + 0.5);

let p00 = self
    .simd
    .widen_u8x16(sample(self.simd, &self.data, x_pos1, y_pos1));
let p10 = self
    .simd
    .widen_u8x16(sample(self.simd, &self.data, x_pos2, y_pos1));
let p01 = self
    .simd
    .widen_u8x16(sample(self.simd, &self.data, x_pos1, y_pos2));
let p11 = self
    .simd
    .widen_u8x16(sample(self.simd, &self.data, x_pos2, y_pos2));

let ip1 = (p00 * fx_inv + p10 * fx).div_255();
let ip2 = (p01 * fx_inv + p11 * fx).div_255();
let res = self
    .simd
    .narrow_u16x16((ip1 * fy_inv + ip2 * fy).div_255());

I tried it locally and the tests still seem to pass.

LaurenzV

Will leave it up to you if you want to change back to 256, but I think 2.5% is small enough that it's better to just use the more accurate version!

grebmeg force-pushed the gemberg/tests-diff-log-data branch from 52be151 to d9ec9a2 Compare December 22, 2025 01:31

Base automatically changed from gemberg/tests-diff-log-data to main December 22, 2025 02:03

grebmeg changed the base branch from main to gemberg/perf/image-rendering-improvements December 22, 2025 02:33

grebmeg force-pushed the gemberg/perf/image-rendering-improvements branch from 2d4526c to 422ece4 Compare December 22, 2025 02:35

grebmeg force-pushed the gemberg/gemberg/perf/image-rendering-improvements2 branch from 72652c0 to 10b89a6 Compare December 22, 2025 02:45

grebmeg force-pushed the gemberg/perf/image-rendering-improvements branch from 422ece4 to 5e74593 Compare December 29, 2025 23:37

grebmeg mentioned this pull request Dec 29, 2025

perf: eliminate overdraw for opaque image fills #1327

Merged

Base automatically changed from gemberg/perf/image-rendering-improvements to main December 29, 2025 23:53

grebmeg force-pushed the gemberg/gemberg/perf/image-rendering-improvements2 branch from 10b89a6 to c398a62 Compare December 30, 2025 00:01

LaurenzV reviewed Dec 30, 2025

View reviewed changes

grebmeg force-pushed the gemberg/gemberg/perf/image-rendering-improvements2 branch from c398a62 to facff60 Compare January 5, 2026 00:14

grebmeg requested a review from LaurenzV January 5, 2026 00:58

LaurenzV reviewed Jan 5, 2026

View reviewed changes

grebmeg requested a review from LaurenzV January 5, 2026 23:33

LaurenzV approved these changes Jan 6, 2026

View reviewed changes

LaurenzV mentioned this pull request Jan 7, 2026

Update the changelog for sparse strips renderers #1344

Merged

grebmeg added 3 commits January 12, 2026 17:47

perf(vello_common): track has_opacities to skip alpha blending

528349f

fix: correct rounding in bilinear image sampling

b33951c

.

0413d9e

grebmeg force-pushed the gemberg/gemberg/perf/image-rendering-improvements2 branch from d1a62b3 to 0413d9e Compare January 12, 2026 06:48

grebmeg added this pull request to the merge queue Jan 12, 2026

Merged via the queue into main with commit 7fa709f Jan 12, 2026
17 checks passed

grebmeg deleted the gemberg/gemberg/perf/image-rendering-improvements2 branch January 12, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(vello_common): track has_opacities to skip alpha blending #1329

perf(vello_common): track has_opacities to skip alpha blending #1329

Uh oh!

grebmeg commented Dec 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LaurenzV Jan 5, 2026

Uh oh!

LaurenzV Jan 5, 2026

Uh oh!

LaurenzV Jan 5, 2026

Uh oh!

grebmeg Jan 5, 2026

Uh oh!

LaurenzV Jan 6, 2026

Uh oh!

LaurenzV commented Jan 5, 2026

Uh oh!

LaurenzV left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(vello_common): track has_opacities to skip alpha blending #1329

perf(vello_common): track has_opacities to skip alpha blending #1329

Uh oh!

Conversation

grebmeg commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LaurenzV Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV commented Jan 5, 2026

Uh oh!

LaurenzV left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grebmeg commented Dec 18, 2025 •

edited

Loading