perf: row sliding window for feDiffuseLighting normal computation#1029
Closed
wjc911 wants to merge 11 commits intolinebender:mainfrom
Closed
perf: row sliding window for feDiffuseLighting normal computation#1029wjc911 wants to merge 11 commits intolinebender:mainfrom
wjc911 wants to merge 11 commits intolinebender:mainfrom
Conversation
…utation Add an optimized apply path for the lighting filter that uses: - Row-based alpha extraction into contiguous i16 buffers for cache-friendly access - A 3-row sliding window that reuses two rows per iteration (only reads one new row) - Pre-computed spot light direction (computed once instead of per-pixel) - Sequential row-major pixel processing order The dispatcher selects between naive (for images < 64x64 pixels) and optimized paths. The naive implementation is preserved verbatim as a fallback and correctness reference. Includes exhaustive bit-exact correctness tests comparing naive vs optimized across 7 light source configurations, 4 surface scales, 4 diffuse constants, and multiple image sizes (3x3 through 256x256, including non-square). Benchmark results show 1.7x-5.2x speedup at 1024x1024 and 1.7x-3.6x at 4096x4096 depending on light source type.
…d #[cfg(test)], fix comments - Raise OPTIMIZED_THRESHOLD from 64*64 to 128*128 (benchmark showed 64x64 distant light is 0.93x slower on the optimized path; crossover is between 64x64 and 256x256, so 128*128 is a safe conservative choice) - Gate apply_naive and its helper functions (light_color, *_normal, alpha_at) behind #[cfg(test)] to eliminate dead code in release builds; the apply() function now always uses apply_optimized - Fix misleading comments: the optimization is primarily a cache-locality improvement (row-based alpha extraction + sliding window), NOT SIMD auto-vectorization (the dyn Fn trait object for light_factor prevents inlining and vectorization)
The previous fix incorrectly removed the naive fallback entirely, making apply() always call apply_optimized unconditionally. Benchmarks showed 64x64 distant light is 0.93x (slower with optimized path). Restore the dynamic threshold check: images below 128*128 pixels use apply_naive, larger images use apply_optimized. Remove all #[cfg(test)] gates from apply_naive, light_color, normal helper functions, alpha_at, and the OPTIMIZED_THRESHOLD constant so they are available in release builds.
Use scoped threads and AtomicUsize progress counter to run benchmark configurations in parallel across all available CPU cores.
Replaces the parallel bench_e2e.rs with a sequential single-threaded version that uses per-resolution iteration counts (2000 for 16px, down to 100 for 1024px+), a probe-then-scale budget cap (30s total per case, skip if single probe > 10s), and --compare for TSV baseline comparison. Allows CPU-pinned reproducible measurements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Benchmark Results
Test Results
All 1723/1723 integration tests pass (
cargo test --release -p resvg --test integration).🤖 Generated with Claude Code