perf: row sliding window for feDiffuseLighting normal computation by wjc911 · Pull Request #1029 · linebender/resvg

wjc911 · 2026-02-22T18:31:42Z

Summary

Replace the per-pixel 3x3 neighborhood fetch with a sliding window that carries the previous row's data across columns, reducing redundant height-map reads from 9 to 3 per pixel on interior rows
Apply the optimization only for images >= 128x128 pixels; smaller images use the original path to avoid branch overhead on tiny inputs
Normal vector computation (Sobel-style finite differences) is unchanged; only the data access pattern is restructured

Benchmark Results

Metric	Value
Average speedup	1.16x

Test Results

All 1723/1723 integration tests pass (cargo test --release -p resvg --test integration).

🤖 Generated with Claude Code

…utation Add an optimized apply path for the lighting filter that uses: - Row-based alpha extraction into contiguous i16 buffers for cache-friendly access - A 3-row sliding window that reuses two rows per iteration (only reads one new row) - Pre-computed spot light direction (computed once instead of per-pixel) - Sequential row-major pixel processing order The dispatcher selects between naive (for images < 64x64 pixels) and optimized paths. The naive implementation is preserved verbatim as a fallback and correctness reference. Includes exhaustive bit-exact correctness tests comparing naive vs optimized across 7 light source configurations, 4 surface scales, 4 diffuse constants, and multiple image sizes (3x3 through 256x256, including non-square). Benchmark results show 1.7x-5.2x speedup at 1024x1024 and 1.7x-3.6x at 4096x4096 depending on light source type.

…d #[cfg(test)], fix comments - Raise OPTIMIZED_THRESHOLD from 64*64 to 128*128 (benchmark showed 64x64 distant light is 0.93x slower on the optimized path; crossover is between 64x64 and 256x256, so 128*128 is a safe conservative choice) - Gate apply_naive and its helper functions (light_color, *_normal, alpha_at) behind #[cfg(test)] to eliminate dead code in release builds; the apply() function now always uses apply_optimized - Fix misleading comments: the optimization is primarily a cache-locality improvement (row-based alpha extraction + sliding window), NOT SIMD auto-vectorization (the dyn Fn trait object for light_factor prevents inlining and vectorization)

The previous fix incorrectly removed the naive fallback entirely, making apply() always call apply_optimized unconditionally. Benchmarks showed 64x64 distant light is 0.93x (slower with optimized path). Restore the dynamic threshold check: images below 128*128 pixels use apply_naive, larger images use apply_optimized. Remove all #[cfg(test)] gates from apply_naive, light_color, normal helper functions, alpha_at, and the OPTIMIZED_THRESHOLD constant so they are available in release builds.

Use scoped threads and AtomicUsize progress counter to run benchmark configurations in parallel across all available CPU cores.

Replaces the parallel bench_e2e.rs with a sequential single-threaded version that uses per-resolution iteration counts (2000 for 16px, down to 100 for 1024px+), a probe-then-scale budget cap (30s total per case, skip if single probe > 10s), and --compare for TSV baseline comparison. Allows CPU-pinned reproducible measurements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wjc911 and others added 11 commits February 21, 2026 16:23

Add comprehensive lighting filter benchmarks for regression testing

5a55930

bench: parallelize feLighting benchmark with std::thread::scope

c05cfb8

Use scoped threads and AtomicUsize progress counter to run benchmark configurations in parallel across all available CPU cores.

Add inline(never)/cold annotations to lighting optimized functions

c983077

Clean up feDiffuseLighting optimization code

759e308

Update diffuse lighting benchmarks for real-world usage patterns

44dd285

Apply cargo fmt to bench_e2e.rs

f5bb9d0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Apply cargo fmt --all to fix CI formatting check

63fc970

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wjc911 closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: row sliding window for feDiffuseLighting normal computation#1029

perf: row sliding window for feDiffuseLighting normal computation#1029
wjc911 wants to merge 11 commits intolinebender:mainfrom
wjc911:feDiffuseLighting_perf_optimize

wjc911 commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wjc911 commented Feb 22, 2026

Summary

Benchmark Results

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant