-
Notifications
You must be signed in to change notification settings - Fork 2
Island detection algorithm #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds an (optional) Haversine distance-based “island” detection pass to prune edges that belong to small/compact connected components during OMF graph vectorization.
Changes:
- Introduces
component_algorithm::island_detection_algorithmand unit tests for midpoint + island detection behavior. - Wires an optional island-detection configuration from app config → network run →
OmfGraphVectorized::new, and applies edge-list cleaning. - Adds
indexmapdependency for adjacency representation.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/bambam-omf/src/graph/serialize_ops.rs | Adds clean_omf_edge_list helper to filter edge-list-aligned vectors by a boolean mask. |
| rust/bambam-omf/src/graph/omf_graph.rs | Wires island detection + edge removal into graph construction. |
| rust/bambam-omf/src/graph/mod.rs | Registers the new component_algorithm module. |
| rust/bambam-omf/src/graph/component_algorithm.rs | Implements island detection algorithm + adjacency builder + tests. |
| rust/bambam-omf/src/app/omf_app.rs | Reads island detection configuration from config file and passes it into run. |
| rust/bambam-omf/src/app/network.rs | Defines IslandDetectionAlgorithmConfiguration and plumbs it into graph creation. |
| rust/bambam-omf/Cargo.toml | Adds indexmap workspace dependency. |
| rust/Cargo.toml | Adds workspace-level indexmap version pin. |
| configuration/bambam-omf/travel-mode-filter.json | Adds example island detection configuration block. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn island_detection_algorithm( | ||
| edge_lists: &[&EdgeList], | ||
| vertices: &[Vertex], | ||
| distance_threshold: f64, | ||
| distance_threshold_unit: DistanceUnit, | ||
| ) -> Result<Vec<(EdgeListId, EdgeId)>, OvertureMapsCollectionError> { | ||
| let forward_adjacency: DenseAdjacencyList = build_adjacency(edge_lists, vertices.len(), true) | ||
| .map_err(|s| { | ||
| OvertureMapsCollectionError::InternalError(format!( | ||
| "failed to compute adjacency matrix for island detection algorithm: {s}" | ||
| )) | ||
| })?; | ||
|
|
||
| let island_edges: Result<Vec<_>, _> = edge_lists | ||
| .par_iter() | ||
| .flat_map(|&el| el.0.par_iter()) | ||
| .filter_map(|edge| { | ||
| match is_component_island_parallel( | ||
| edge, | ||
| distance_threshold, | ||
| distance_threshold_unit, | ||
| edge_lists, | ||
| vertices, | ||
| &forward_adjacency, | ||
| ) { | ||
| Ok(true) => Some(Ok((edge.edge_list_id, edge.edge_id))), | ||
| Ok(false) => None, | ||
| Err(e) => Some(Err(e)), | ||
| } | ||
| }) | ||
| .collect(); |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
island_detection_algorithm runs a graph traversal starting from every edge in parallel. On realistic OMF-sized graphs this is likely to be extremely expensive (roughly O(E * explored_edges)) and can dominate runtime/memory. A more scalable approach is to compute connected components once (e.g., union-find / BFS per component), compute each component’s max extent, then mark all edges in components whose extent < threshold.
rust/bambam-omf/src/app/omf_app.rs
Outdated
| "island_algoritm_configuration", | ||
| ) | ||
| .map_err(|e| { | ||
| let msg = format!( | ||
| "error reading 'edge_lists' key in '{configuration_file}': {e}" |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config key string "island_algoritm_configuration" is misspelled ("algoritm"), which bakes a typo into the user-facing configuration format. Consider renaming to island_algorithm_configuration (and handling backwards compatibility if needed).
| "island_algoritm_configuration", | |
| ) | |
| .map_err(|e| { | |
| let msg = format!( | |
| "error reading 'edge_lists' key in '{configuration_file}': {e}" | |
| "island_algorithm_configuration", | |
| ) | |
| .or_else(|_| { | |
| // Backwards compatibility: support legacy misspelled key. | |
| config.get::<Option<IslandDetectionAlgorithmConfiguration>>( | |
| "island_algoritm_configuration", | |
| ) | |
| }) | |
| .map_err(|e| { | |
| let msg = format!( | |
| "error reading 'island_algorithm_configuration' (or legacy \ | |
| 'island_algoritm_configuration') key in '{configuration_file}': {e}" |
|
I ran the algorithm with the bbox that we've using and got 36,631 edges out of the initial 37,115, which seems like the right magnitude. On my laptop, run time was a couple of minutes and it seemed to be using 100% CPU all the time so I think we are being efficient. |
robfitzgerald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me! i agree with you that using the radius distance of the expanding frontier is a good option from both a compute cost and domain correctness perspective. i would prefer if we included some doc comments on the public functions added to bambam-omf but not worried enough to stop this from rolling in. thanks!
In this PR, I'm implementing a Haversine distance-based island detection algorithm. We discussed potential alternative criteria for this algorithm but I decided this was enough changes for a single PR.
For each each, we check in parallel if the distance from its midpoint to the midpoint of the initial edge is large enough. This way, we remove edges that are part of components that are small enough.
I assumed all edge lists together could make two vertices adjacent, so the processing is made considering all edge lists at the same time and therefore the "now let's remove the edges" part is a little repetitive. I also welcome feedback on the overall style of the implementation.