understory_index: write `Grid::visit_rect` without a `HashSet` #91

tomcur · 2025-12-18T17:24:05Z

Rather than the HashSet, this does a check on one of the corners of the intersection between rect and entry.aabb.

This is a speed up, especially so for smaller systems (where the allocation and initializing the HashSet are relatively expensive). For larger systems and where there is a lot of overlap, this check appears to still be somewhat cheaper than the hashing and bucket lookup.

Timings against main using #90:

visit_rect_grid:

visit_rect_grid_f64/Grid(10.)/32
                        time:   [258.81 µs 259.31 µs 259.83 µs]
                        thrpt:  [3.9410 Melem/s 3.9490 Melem/s 3.9566 Melem/s]
                 change:
                        time:   [-42.138% -42.006% -41.869%] (p = 0.00 < 0.05)
                        thrpt:  [+72.025% +72.430% +72.824%]
                        Performance has improved.

visit_rect_grid_f64/Grid(10.)/64
                        time:   [297.79 µs 298.34 µs 298.95 µs]
                        thrpt:  [13.701 Melem/s 13.729 Melem/s 13.755 Melem/s]
                 change:
                        time:   [-31.003% -30.837% -30.660%] (p = 0.00 < 0.05)
                        thrpt:  [+44.216% +44.587% +44.934%]
                        Performance has improved.

visit_rect_grid_f64/Grid(10.)/128
                        time:   [379.81 µs 380.90 µs 382.05 µs]
                        thrpt:  [42.884 Melem/s 43.014 Melem/s 43.138 Melem/s]
                 change:
                        time:   [-22.038% -21.747% -21.466%] (p = 0.00 < 0.05)
                        thrpt:  [+27.334% +27.790% +28.268%]
                        Performance has improved.

visit_rect_overlap:

I added a 512x512-sized grid with overlap here to be quite sure this is never slower. Note that in this benchmark, each cell contains multiple AABBs, and the rectangle were using to visit overlaps on average 18 cells.

visit_rect_overlap_f64/Grid(10.)/32
                        time:   [731.93 µs 733.00 µs 734.19 µs]
                        thrpt:  [1.3947 Melem/s 1.3970 Melem/s 1.3990 Melem/s]
                 change:
                        time:   [-21.316% -20.920% -20.538%] (p = 0.00 < 0.05)
                        thrpt:  [+25.846% +26.455% +27.091%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/64
                        time:   [861.14 µs 862.88 µs 864.84 µs]
                        thrpt:  [4.7361 Melem/s 4.7469 Melem/s 4.7565 Melem/s]
                 change:
                        time:   [-23.093% -22.726% -22.278%] (p = 0.00 < 0.05)
                        thrpt:  [+28.664% +29.410% +30.028%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/128
                        time:   [1.0383 ms 1.0404 ms 1.0428 ms]
                        thrpt:  [15.711 Melem/s 15.748 Melem/s 15.780 Melem/s]
                 change:
                        time:   [-16.744% -16.039% -15.540%] (p = 0.00 < 0.05)
                        thrpt:  [+18.399% +19.102% +20.112%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/512
                        time:   [1.5065 ms 1.5104 ms 1.5144 ms]
                        thrpt:  [173.10 Melem/s 173.56 Melem/s 174.00 Melem/s]
                 change:
                        time:   [-11.413% -8.3974% -5.3382%] (p = 0.00 < 0.05)
                        thrpt:  [+5.6392% +9.1672% +12.884%]
                        Performance has improved.

Rather than the `HashSet`, this does a check on one of the corners of the intersection between `rect` and `entry.aabb`. This is a speed up, especially so for smaller systems (where the allocation and initializing the `HashSet` are relatively expensive). For larger systems and where there is a lot of overlap, this check appears to still be somewhat cheaper than the hashing and bucket lookup. Timings against main using endoli#90: - `visit_rect_grid`: ``` visit_rect_grid_f64/Grid(10.)/32 time: [258.81 µs 259.31 µs 259.83 µs] thrpt: [3.9410 Melem/s 3.9490 Melem/s 3.9566 Melem/s] change: time: [-42.138% -42.006% -41.869%] (p = 0.00 < 0.05) thrpt: [+72.025% +72.430% +72.824%] Performance has improved. visit_rect_grid_f64/Grid(10.)/64 time: [297.79 µs 298.34 µs 298.95 µs] thrpt: [13.701 Melem/s 13.729 Melem/s 13.755 Melem/s] change: time: [-31.003% -30.837% -30.660%] (p = 0.00 < 0.05) thrpt: [+44.216% +44.587% +44.934%] Performance has improved. visit_rect_grid_f64/Grid(10.)/128 time: [379.81 µs 380.90 µs 382.05 µs] thrpt: [42.884 Melem/s 43.014 Melem/s 43.138 Melem/s] change: time: [-22.038% -21.747% -21.466%] (p = 0.00 < 0.05) thrpt: [+27.334% +27.790% +28.268%] Performance has improved. ``` - `visit_rect_overlap`: I added a `512x512`-sized grid with overlap here to be quite sure this is never slower. Note that in this benchmark, each cell contains multiple AABBs, and the rectangle were using to visit overlaps on average 18 cells. ``` visit_rect_overlap_f64/Grid(10.)/32 time: [731.93 µs 733.00 µs 734.19 µs] thrpt: [1.3947 Melem/s 1.3970 Melem/s 1.3990 Melem/s] change: time: [-21.316% -20.920% -20.538%] (p = 0.00 < 0.05) thrpt: [+25.846% +26.455% +27.091%] Performance has improved. visit_rect_overlap_f64/Grid(10.)/64 time: [861.14 µs 862.88 µs 864.84 µs] thrpt: [4.7361 Melem/s 4.7469 Melem/s 4.7565 Melem/s] change: time: [-23.093% -22.726% -22.278%] (p = 0.00 < 0.05) thrpt: [+28.664% +29.410% +30.028%] Performance has improved. visit_rect_overlap_f64/Grid(10.)/128 time: [1.0383 ms 1.0404 ms 1.0428 ms] thrpt: [15.711 Melem/s 15.748 Melem/s 15.780 Melem/s] change: time: [-16.744% -16.039% -15.540%] (p = 0.00 < 0.05) thrpt: [+18.399% +19.102% +20.112%] Performance has improved. visit_rect_overlap_f64/Grid(10.)/512 time: [1.5065 ms 1.5104 ms 1.5144 ms] thrpt: [173.10 Melem/s 173.56 Melem/s 174.00 Melem/s] change: time: [-11.413% -8.3974% -5.3382%] (p = 0.00 < 0.05) thrpt: [+5.6392% +9.1672% +12.884%] Performance has improved. ```

tomcur

Just realized there's a potential issue with how truncation of higher-order bits and grid cells is handled here. Say we allow geometry to be placed in some cell that's out of range of an i32 and just truncate the higher order bits, then cells don't have a real geometric meaning and we cannot do this trick.

tomcur

Just realized there's a potential issue with how truncation of higher-order bits and grid cells is handled here. Say we allow geometry to be placed in some cell that's out of range of an i32 and just truncate the higher order bits, then cells don't have a real geometric meaning and we cannot do this trick.

Think it should still work, as the lower bits should still identify a cell. But if we allow such overflow of the grid cells in general, we should test that quite well (unrelated to this PR).

waywardmonkeys · 2025-12-23T11:54:43Z

Are your comments saying it should land or shouldn't?

I think it should.

I think that overflowing the grid should be an error of some sort.

tomcur · 2025-12-23T12:10:31Z

Under the assumption we want overflow, I first thought this couldn't be made to work, but now I think it can, and so I think it should land (regardless of handling overflow or not).

Separately: I only later realized overflow was cared about, in the lint ignore reason here:

understory/understory_index/src/backends/grid.rs

Lines 56 to 75 in 99f24d2

    
           impl GridScalar for f64 { 
        
               #[allow( 
        
                   clippy::cast_possible_truncation, 
        
                   reason = "Grid cell indices are intentionally i32; higher bits are truncated by design." 
        
               )] 
        
               #[inline] 
        
               fn cell_coord(value: Self, origin: Self, cell_size: Self) -> i32 { 
        
                   debug_assert!( 
        
                       cell_size > 0.0, 
        
                       "grid cell_size must be strictly positive (f64)" 
        
                   ); 
        
                   let t = (value - origin) / cell_size; 
        
                   if t >= 0.0 { 
        
                       t as i32 
        
                   } else { 
        
                       let ti = t as i32; 
        
                       if (ti as Self) == t { ti } else { ti - 1 } 
        
                   } 
        
               } 
        
           }

At first glance that's sort of correct, but the word "truncated" should probably be "saturated", and perhaps ti - 1 should be ti.saturating_sub(1)... in any case, if we choose to keep that overflow handling, which just pushes anything out of bounds into the grid "extrema", it should be tested/fixed, and that should then probably land before this PR.

If an `f64`, `f32` or `i64` coordinate is such that it is out of range of cells representable by an `i32`, the grid cell coordinate is saturated to `i32::MIN` or `i32::MAX`). Negative fractional coordinates are still rounded toward negative infinity. This is probably enough for endoli#91, but that requires a bit more thought.

If an `f64`, `f32` or `i64` coordinate is such that it is out of range of cells representable by an `i32`, the grid cell coordinate is saturated to `i32::MIN` or `i32::MAX`). Negative fractional coordinates are still rounded toward negative infinity. This is probably enough for #91, but that requires a bit more thought.

tomcur added 2 commits December 18, 2025 18:23

Clippy

64e9572

tomcur commented Dec 18, 2025

View reviewed changes

tomcur commented Dec 19, 2025

View reviewed changes

tomcur mentioned this pull request Dec 29, 2025

index: define Grid's GridScalar::cell_coord to be saturating #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

understory_index: write `Grid::visit_rect` without a `HashSet` #91

understory_index: write `Grid::visit_rect` without a `HashSet` #91

Uh oh!

tomcur commented Dec 18, 2025

Uh oh!

tomcur left a comment •

edited

Loading

Uh oh!

tomcur left a comment

Uh oh!

waywardmonkeys commented Dec 23, 2025

Uh oh!

tomcur commented Dec 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

understory_index: write Grid::visit_rect without a HashSet #91

Are you sure you want to change the base?

understory_index: write Grid::visit_rect without a HashSet #91

Uh oh!

Conversation

tomcur commented Dec 18, 2025

Uh oh!

tomcur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomcur left a comment

Choose a reason for hiding this comment

Uh oh!

waywardmonkeys commented Dec 23, 2025

Uh oh!

tomcur commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

understory_index: write `Grid::visit_rect` without a `HashSet` #91

understory_index: write `Grid::visit_rect` without a `HashSet` #91

tomcur left a comment •

edited

Loading

tomcur commented Dec 23, 2025 •

edited

Loading