Skip to content

perf: row sliding window for feDiffuseLighting normal computation#1029

Closed
wjc911 wants to merge 11 commits intolinebender:mainfrom
wjc911:feDiffuseLighting_perf_optimize
Closed

perf: row sliding window for feDiffuseLighting normal computation#1029
wjc911 wants to merge 11 commits intolinebender:mainfrom
wjc911:feDiffuseLighting_perf_optimize

Conversation

@wjc911
Copy link
Copy Markdown

@wjc911 wjc911 commented Feb 22, 2026

Summary

  • Replace the per-pixel 3x3 neighborhood fetch with a sliding window that carries the previous row's data across columns, reducing redundant height-map reads from 9 to 3 per pixel on interior rows
  • Apply the optimization only for images >= 128x128 pixels; smaller images use the original path to avoid branch overhead on tiny inputs
  • Normal vector computation (Sobel-style finite differences) is unchanged; only the data access pattern is restructured

Benchmark Results

Metric Value
Average speedup 1.16x

Test Results

All 1723/1723 integration tests pass (cargo test --release -p resvg --test integration).

🤖 Generated with Claude Code

wjc911 and others added 11 commits February 21, 2026 16:23
…utation

Add an optimized apply path for the lighting filter that uses:
- Row-based alpha extraction into contiguous i16 buffers for cache-friendly access
- A 3-row sliding window that reuses two rows per iteration (only reads one new row)
- Pre-computed spot light direction (computed once instead of per-pixel)
- Sequential row-major pixel processing order

The dispatcher selects between naive (for images < 64x64 pixels) and optimized
paths. The naive implementation is preserved verbatim as a fallback and
correctness reference.

Includes exhaustive bit-exact correctness tests comparing naive vs optimized
across 7 light source configurations, 4 surface scales, 4 diffuse constants,
and multiple image sizes (3x3 through 256x256, including non-square).

Benchmark results show 1.7x-5.2x speedup at 1024x1024 and 1.7x-3.6x at
4096x4096 depending on light source type.
…d #[cfg(test)], fix comments

- Raise OPTIMIZED_THRESHOLD from 64*64 to 128*128 (benchmark showed 64x64
  distant light is 0.93x slower on the optimized path; crossover is between
  64x64 and 256x256, so 128*128 is a safe conservative choice)
- Gate apply_naive and its helper functions (light_color, *_normal, alpha_at)
  behind #[cfg(test)] to eliminate dead code in release builds; the apply()
  function now always uses apply_optimized
- Fix misleading comments: the optimization is primarily a cache-locality
  improvement (row-based alpha extraction + sliding window), NOT SIMD
  auto-vectorization (the dyn Fn trait object for light_factor prevents
  inlining and vectorization)
The previous fix incorrectly removed the naive fallback entirely, making
apply() always call apply_optimized unconditionally. Benchmarks showed
64x64 distant light is 0.93x (slower with optimized path).

Restore the dynamic threshold check: images below 128*128 pixels use
apply_naive, larger images use apply_optimized. Remove all #[cfg(test)]
gates from apply_naive, light_color, normal helper functions, alpha_at,
and the OPTIMIZED_THRESHOLD constant so they are available in release
builds.
Use scoped threads and AtomicUsize progress counter to run benchmark
configurations in parallel across all available CPU cores.
Replaces the parallel bench_e2e.rs with a sequential single-threaded
version that uses per-resolution iteration counts (2000 for 16px,
down to 100 for 1024px+), a probe-then-scale budget cap (30s total
per case, skip if single probe > 10s), and --compare for TSV baseline
comparison. Allows CPU-pinned reproducible measurements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wjc911 wjc911 closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant