-
Notifications
You must be signed in to change notification settings - Fork 661
Description
Version
5.4.0-SNAPSHOT
Feature
This proposal is to enhance the spatial index with support for index-per-graph as well as to improve its serialization using kryo - via Apache Sedona's kryo/jts implementation.
This is an incremental improvement of the existing JTS-based in-memory implementation - its not a complete overhaul such as a disk-based incrementally updated transaction-aware R-tree (if someone contributed that then this issue's PR could be discarded 😄 ).
The impact of this work have been evaluated and presented at the GeoLD workshop last year proceedings:
Simon Bin, Claus Stadler, Lorenz Bühmann, and Michael Martin
Getting practical with GeoSPARQL and Apache Jena
Slides
The essence is presented on the following slides:
Using an index per graph (unsurprisingly) boosts the performance when multiple graphs have geometries and only a subset is queried (slide 15):
As for serialization performance (slide 16), while index building became a bit slower, this is outweighed by near-instant loading of the spatial index. The reason for the writing overhead is, that the index tree is now serialized as a tree - before, the items were written out as a flat list, and the tree had to be rebuilt from scratch on restart.
A new geosparql:indexPerGraph
option (boolean) is added to the geosparql:GeosparqlDataset
assembler.
The implementation has been mainly done by @LorenzBuehmann - the writing and presentation is the work of @SimonBin - I supported in evaluation.
As for compatibility, I need to check for whether it is backward compatible but I think due to the change of the serializer, existing spatial indexes would have to be rebuilt.
For reference, a bit of related discussion has happened in #2645.
Are you interested in contributing a solution yourself?
Yes