-
Notifications
You must be signed in to change notification settings - Fork 7
understory_box_tree: speed up hit testing #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hit testing is a hot path and used to pay avoidable per-query overhead. - Avoid allocation: use `IndexGeneric::visit_point` instead of `query_point` so hit testing does not allocate a `Vec` of candidates each call. - Avoid repeated work: reuse cached `world_transform_inverse` and `depth` from `Tree::commit` while scoring candidates. - Preserve behavior: still filters precisely by `local_bounds`, `local_clip`, and ancestor clips (after the coarse AABB query), and keeps the existing z/depth/newer tie-break. This improves hit testing across all index backends because the allocation/inverse/depth work was backend-agnostic overhead. Benchmark (synthetic `ui_box_tree`): `hit_test_point/flatvec` improved about 22% on my machine (~61us -> ~47us per iteration). Command: `cargo bench -p understory_benches --bench ui_box_tree -- ui_box_tree/hit_test_point/flatvec --noplot`
tomcur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice speed up! The timings you mentioned reproduce on my machine as well.
I've left some nits inline.
For some workloads we can go further; especially if visit_point yields many nodes from the same subtree for a given point, the repeated tree walking could comparatively get very expensive (as we're traversing the same ancestors over and over towards the root). Fixing that requires either allocating or some mutable scratch context.
|
|
||
| if let Some(local_clip) = node.local.local_clip | ||
| && !local_clip.contains(local_point) | ||
| // Transform once: most candidates fail at the bounds check, so avoid repeated work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't look entirely accurate to me: in the current box tree model, the local bounds are a node's real geometry, and the contains check is the first necessary condition for a hit by finely testing whether the point in local coordinates is within those bounds. The spatial index does that coarsely and it's probably accurate to say the comment as written applies more to the spatial index.
Perhaps instead:
| // Transform once: most candidates fail at the bounds check, so avoid repeated work. | |
| // Finely test whether `point` is within the node's bounds and the node's own clip. |
| if z > *z_best | ||
| || (z == *z_best | ||
| None => best = Some((id, z, depth)), | ||
| Some((best_id, z_best, depth_best)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Some((best_id, z_best, depth_best)) => { | |
| Some((id_best, z_best, depth_best)) => { |
or
| Some((best_id, z_best, depth_best)) => { | |
| Some((best_id, best_z, best_depth)) => { |
| // Walk ancestors checking their clips for precise hit filtering. | ||
| // | ||
| // This is intentionally only done for candidates that pass the local bounds/clip | ||
| // checks, since ancestor traversal is comparatively expensive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better than the original comment in this place, except that we may want to keep mentioning explicitly this walks towards the root (just so it's immediately clear which direction this is walking in).
| // Walk ancestors checking their clips for precise hit filtering. | |
| // | |
| // This is intentionally only done for candidates that pass the local bounds/clip | |
| // checks, since ancestor traversal is comparatively expensive. | |
| // Walk ancestors towards the node's root checking their clips for precise hit filtering. | |
| // | |
| // This is intentionally only done for candidates that pass the local bounds/clip | |
| // checks, since ancestor traversal is comparatively expensive. |
Hit testing is a hot path and used to pay avoidable per-query overhead.
IndexGeneric::visit_pointinstead ofquery_pointso hit testing does not allocate aVecof candidates each call.world_transform_inverseanddepthfromTree::commitwhile scoring candidates.local_bounds,local_clip, and ancestor clips (after the coarse AABB query), and keeps the existing z/depth/newer tie-break.This improves hit testing across all index backends because the allocation/inverse/depth work was backend-agnostic overhead.
Benchmark (synthetic
ui_box_tree):hit_test_point/flatvecimproved about 22% on my machine (~61us -> ~47us per iteration). Command:cargo bench -p understory_benches --bench ui_box_tree -- ui_box_tree/hit_test_point/flatvec --noplot