Skip to content

Conversation

@tomcur
Copy link
Contributor

@tomcur tomcur commented Dec 18, 2025

Rather than the HashSet, this does a check on one of the corners of the intersection between rect and entry.aabb.

This is a speed up, especially so for smaller systems (where the allocation and initializing the HashSet are relatively expensive). For larger systems and where there is a lot of overlap, this check appears to still be somewhat cheaper than the hashing and bucket lookup.

Timings against main using #90:

  • visit_rect_grid:
visit_rect_grid_f64/Grid(10.)/32
                        time:   [258.81 µs 259.31 µs 259.83 µs]
                        thrpt:  [3.9410 Melem/s 3.9490 Melem/s 3.9566 Melem/s]
                 change:
                        time:   [-42.138% -42.006% -41.869%] (p = 0.00 < 0.05)
                        thrpt:  [+72.025% +72.430% +72.824%]
                        Performance has improved.

visit_rect_grid_f64/Grid(10.)/64
                        time:   [297.79 µs 298.34 µs 298.95 µs]
                        thrpt:  [13.701 Melem/s 13.729 Melem/s 13.755 Melem/s]
                 change:
                        time:   [-31.003% -30.837% -30.660%] (p = 0.00 < 0.05)
                        thrpt:  [+44.216% +44.587% +44.934%]
                        Performance has improved.

visit_rect_grid_f64/Grid(10.)/128
                        time:   [379.81 µs 380.90 µs 382.05 µs]
                        thrpt:  [42.884 Melem/s 43.014 Melem/s 43.138 Melem/s]
                 change:
                        time:   [-22.038% -21.747% -21.466%] (p = 0.00 < 0.05)
                        thrpt:  [+27.334% +27.790% +28.268%]
                        Performance has improved.
  • visit_rect_overlap:

I added a 512x512-sized grid with overlap here to be quite sure this is never slower. Note that in this benchmark, each cell contains multiple AABBs, and the rectangle were using to visit overlaps on average 18 cells.

visit_rect_overlap_f64/Grid(10.)/32
                        time:   [731.93 µs 733.00 µs 734.19 µs]
                        thrpt:  [1.3947 Melem/s 1.3970 Melem/s 1.3990 Melem/s]
                 change:
                        time:   [-21.316% -20.920% -20.538%] (p = 0.00 < 0.05)
                        thrpt:  [+25.846% +26.455% +27.091%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/64
                        time:   [861.14 µs 862.88 µs 864.84 µs]
                        thrpt:  [4.7361 Melem/s 4.7469 Melem/s 4.7565 Melem/s]
                 change:
                        time:   [-23.093% -22.726% -22.278%] (p = 0.00 < 0.05)
                        thrpt:  [+28.664% +29.410% +30.028%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/128
                        time:   [1.0383 ms 1.0404 ms 1.0428 ms]
                        thrpt:  [15.711 Melem/s 15.748 Melem/s 15.780 Melem/s]
                 change:
                        time:   [-16.744% -16.039% -15.540%] (p = 0.00 < 0.05)
                        thrpt:  [+18.399% +19.102% +20.112%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/512
                        time:   [1.5065 ms 1.5104 ms 1.5144 ms]
                        thrpt:  [173.10 Melem/s 173.56 Melem/s 174.00 Melem/s]
                 change:
                        time:   [-11.413% -8.3974% -5.3382%] (p = 0.00 < 0.05)
                        thrpt:  [+5.6392% +9.1672% +12.884%]
                        Performance has improved.

Rather than the `HashSet`, this does a check on one of the corners of
the intersection between `rect` and `entry.aabb`.

This is a speed up, especially so for smaller systems (where the
allocation and initializing the `HashSet` are relatively expensive). For
larger systems and where there is a lot of overlap, this check appears
to still be somewhat cheaper than the hashing and bucket lookup.

Timings against main using endoli#90:

- `visit_rect_grid`:

```
visit_rect_grid_f64/Grid(10.)/32
                        time:   [258.81 µs 259.31 µs 259.83 µs]
                        thrpt:  [3.9410 Melem/s 3.9490 Melem/s 3.9566 Melem/s]
                 change:
                        time:   [-42.138% -42.006% -41.869%] (p = 0.00 < 0.05)
                        thrpt:  [+72.025% +72.430% +72.824%]
                        Performance has improved.

visit_rect_grid_f64/Grid(10.)/64
                        time:   [297.79 µs 298.34 µs 298.95 µs]
                        thrpt:  [13.701 Melem/s 13.729 Melem/s 13.755 Melem/s]
                 change:
                        time:   [-31.003% -30.837% -30.660%] (p = 0.00 < 0.05)
                        thrpt:  [+44.216% +44.587% +44.934%]
                        Performance has improved.

visit_rect_grid_f64/Grid(10.)/128
                        time:   [379.81 µs 380.90 µs 382.05 µs]
                        thrpt:  [42.884 Melem/s 43.014 Melem/s 43.138 Melem/s]
                 change:
                        time:   [-22.038% -21.747% -21.466%] (p = 0.00 < 0.05)
                        thrpt:  [+27.334% +27.790% +28.268%]
                        Performance has improved.
```

- `visit_rect_overlap`:

I added a `512x512`-sized grid with overlap here to be quite sure this
is never slower. Note that in this benchmark, each cell contains
multiple AABBs, and the rectangle were using to visit overlaps on
average 18 cells.

```
visit_rect_overlap_f64/Grid(10.)/32
                        time:   [731.93 µs 733.00 µs 734.19 µs]
                        thrpt:  [1.3947 Melem/s 1.3970 Melem/s 1.3990 Melem/s]
                 change:
                        time:   [-21.316% -20.920% -20.538%] (p = 0.00 < 0.05)
                        thrpt:  [+25.846% +26.455% +27.091%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/64
                        time:   [861.14 µs 862.88 µs 864.84 µs]
                        thrpt:  [4.7361 Melem/s 4.7469 Melem/s 4.7565 Melem/s]
                 change:
                        time:   [-23.093% -22.726% -22.278%] (p = 0.00 < 0.05)
                        thrpt:  [+28.664% +29.410% +30.028%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/128
                        time:   [1.0383 ms 1.0404 ms 1.0428 ms]
                        thrpt:  [15.711 Melem/s 15.748 Melem/s 15.780 Melem/s]
                 change:
                        time:   [-16.744% -16.039% -15.540%] (p = 0.00 < 0.05)
                        thrpt:  [+18.399% +19.102% +20.112%]
                        Performance has improved.

visit_rect_overlap_f64/Grid(10.)/512
                        time:   [1.5065 ms 1.5104 ms 1.5144 ms]
                        thrpt:  [173.10 Melem/s 173.56 Melem/s 174.00 Melem/s]
                 change:
                        time:   [-11.413% -8.3974% -5.3382%] (p = 0.00 < 0.05)
                        thrpt:  [+5.6392% +9.1672% +12.884%]
                        Performance has improved.
```
Copy link
Contributor Author

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized there's a potential issue with how truncation of higher-order bits and grid cells is handled here. Say we allow geometry to be placed in some cell that's out of range of an i32 and just truncate the higher order bits, then cells don't have a real geometric meaning and we cannot do this trick.

Copy link
Contributor Author

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized there's a potential issue with how truncation of higher-order bits and grid cells is handled here. Say we allow geometry to be placed in some cell that's out of range of an i32 and just truncate the higher order bits, then cells don't have a real geometric meaning and we cannot do this trick.

Think it should still work, as the lower bits should still identify a cell. But if we allow such overflow of the grid cells in general, we should test that quite well (unrelated to this PR).

@waywardmonkeys
Copy link
Contributor

Are your comments saying it should land or shouldn't?

I think it should.

I think that overflowing the grid should be an error of some sort.

@tomcur
Copy link
Contributor Author

tomcur commented Dec 23, 2025

Under the assumption we want overflow, I first thought this couldn't be made to work, but now I think it can, and so I think it should land (regardless of handling overflow or not).

Separately: I only later realized overflow was cared about, in the lint ignore reason here:

impl GridScalar for f64 {
#[allow(
clippy::cast_possible_truncation,
reason = "Grid cell indices are intentionally i32; higher bits are truncated by design."
)]
#[inline]
fn cell_coord(value: Self, origin: Self, cell_size: Self) -> i32 {
debug_assert!(
cell_size > 0.0,
"grid cell_size must be strictly positive (f64)"
);
let t = (value - origin) / cell_size;
if t >= 0.0 {
t as i32
} else {
let ti = t as i32;
if (ti as Self) == t { ti } else { ti - 1 }
}
}
}

At first glance that's sort of correct, but the word "truncated" should probably be "saturated", and perhaps ti - 1 should be ti.saturating_sub(1)... in any case, if we choose to keep that overflow handling, which just pushes anything out of bounds into the grid "extrema", it should be tested/fixed, and that should then probably land before this PR.

tomcur added a commit to tomcur/understory that referenced this pull request Dec 29, 2025
If an `f64`, `f32` or `i64` coordinate is such that it is out of range
of cells representable by an `i32`, the grid cell coordinate is
saturated to `i32::MIN` or `i32::MAX`). Negative fractional coordinates
are still rounded toward negative infinity.

This is probably enough for
endoli#91, but that requires a bit
more thought.
tomcur added a commit that referenced this pull request Jan 2, 2026
If an `f64`, `f32` or `i64` coordinate is such that it is out of range
of cells representable by an `i32`, the grid cell coordinate is
saturated to `i32::MIN` or `i32::MAX`). Negative fractional coordinates
are still rounded toward negative infinity.

This is probably enough for
#91, but that requires a bit
more thought.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants