One maintainer & 1 contributor in Spring Data ArangoDB project have refused to accept inheritance-related contributions implemented here. That decision has obviously (& without doubt) been driven not by rational considerations about technology, but by something else. In the process of blocking the contributions implemented here Spring Data ArangoDB upstream project has become tainted by extremely severe inefficiencies & irrationality. The developer who has provided the inheritance-related contributions implemented here, believes that what is now in the upstream is so irrational that it cannot be used as is, & therefore has to use a fork that provides rational & efficient implementation for a main-stream persistence-related inheritance type like canonical COLLECTION-PER-CLASS approach (similar to TABLE-PER-CLASS inheritance type in JPA). The expression canonical COLLECTION-PER-CLASS type of inheritance is used here not as something set in stone, but just to avoid using a more ambiguous phrase like "classes that have a declared @Document annotation". Bottom line is that this implementation is now more efficient than upstream, even for projects that don't use any persistence-related inheritance at all, because the upstream project has become inefficient & irrational for all records (whether or not any persistence-related inheritance is involved in them).
- Inefficiencies & other issues in Spring Data ArangoDB optimized by this implementation
- Test report comparisons (showing that all upstream functionality is preserved, it is just optimized (not less, just better))
- Brief history
- Data pollution & disk space waste: amount of data persisted/processed, etc. when using this implementation is up to 4 times smaller.
- This data pollution & disk space waste in turn entail more memory utilization at run-time.
- This also entails unnecessary band-width utilization.
- All of the above also entail usage of more CPU cycles at run-time (considering storage of the unnecessary data, its retrieval, & processing).
- Issues 1-through-4, can lead to considerable & even noticeable increase in latency (responsiveness).
- Issues 1-through-4, (especially when using a Platform as a service) eventually (for a PaaS, quite quickly) translate to additional operating expenses (yes, there is also a cash aspect involved).
- Extremely absurd clutter when looking at the data (even for classes that have nothing to do with inheritance: namely, that don't extend another entity/document, & are not extended) (which is actually also a big factor, once one takes a look at it): as can be seen below.
- Issue 7 will most likely have a negative effect on developer & DB admin productivity: by inhibiting concentration on useful data due to presence of a lot of useless data.
- Unnecessary tight-coupling of DB records to Java classes: a re-factoring of any @Document Java class to a different package (or changing the name of any Document class which already! has a customized! collection name) as of now would require running a query to update all relevant DB records (this is a major code smell & reveals that now there is a conflict (& bizarre duplication) between the inheritance-support implementation focusing on non-Documents & the semantics of @Document value attribute (the former prevents the latter from freely decoupling DB records from the name of Java class): the upstream project now forces updating all relevant DB records if the name of the class is changed).
Absurd in upstream Spring Data ArangoDB:
Normal record provided with this implementation (the size is up to 3.69 times smaller (35/129 bytes)):
Absurd in upstream Spring Data ArangoDB:
Normal record provided with this implementation (in this example, the size is 1.97 times smaller (59/116 bytes)):
Absurd in upstream Spring Data ArangoDB:
Normal record provided with this implementation:
Absurd in upstream Spring Data ArangoDB (with (automatic) join, in this case redundant data would be present in all 3 entities/documents that get retrieved):
Normal record provided with this implementation:
Taking the example of a single record & estimating that the size of single record is 3.69 times smaller (35/129 bytes), in each of the following also quite simple 2 examples (involving JOINS into 2 other COLLECTIONS) the effect would be cumulative (i.e., absolute size of data (stored, transferred, processed, etc.) would be multiplied by a factor of 3 (i.e., 1 + 1 + 1 or 1 + 2):
-
@Document class A { B b; } @Document class B { C c; } @Document class C { }
-
@Document class D { C c; E e; } @Document class C { } @Document class E { }
A. If one adds to example 2. an eager retrieval of a simple List of instances of some class F of size 5, the cumulative effect would be even more noticeable:
@Document class D { C c; E e; List<F> f; } @Document class F { }
So in this example, absolute size of data (stored, transferred, processed, etc.) would be multiplied by a factor of 8 (i.e., 3 documents as in example 2. + 5 more for the list). Thus smaller size per record provides a cumulative effect for operations involving JOINs or multiple records matching a query, etc. (with propagating efficiencies & benefits in terms of memory, bandwidth, CPU, latency, operational expenses, productivity, as well as visual & perceptional aspects (simpler due to less clutter, less ambiguous), etc.).
- For a graph of x entities with x edges, the effect would also be cumulative (the effect for x entities AND the effect for x edges): potentially doubling some of the effects (e.g., amount of storage used) for graph traversal use-cases.
Assuming average record size difference to be as shown in example above for single record:
Conclusion: this implementation is significantly more efficient in terms of disk space, memory, bandwidth, & CPU usage, as well as in terms of latency, operational expenses, & productivity; & is better in terms of visual & perceptional aspects (simpler due to less clutter, less ambiguous), & in terms of DB records not being tightly-coupled with Java classes.
Test report comparisons (showing that all upstream functionality is preserved, it is just optimized (not less, just better))
Modified (branch) Upstream (original) Diff
Feel free to repeat the steps shown in Diff using the following (more recent) tag pairs:
- 2.3.0 (upstream) & 2.3.0.1 (-rational)
- 3.2.3 (upstream) & 3.2.3.1 (-rational)
Modified (branch) Upstream (original)
Modified (branch) Upstream (original) Diff
ArangoDB Spring Data had no support for inheritance in @Documents, so an issue was logged on
March 13, 2018 focusing on support for a main-stream inheritance type: canonical COLLECTION-PER-CLASS (similar to TABLE-PER-CLASS in JPA). On March 24th, a pull request was provided for it.
This pull request didn't receive the same quick treatment that others get. On April 5th, a strange
issue was opened by
another contributor to support inheritance in properties of interface type. That strange request was
followed by request to not merge the pull request for main-stream inheritance support of type COLLECTION-PER-CLASS. On April 12th, a pull request was submitted by
that same contributor that focuses on
inheritance in non-@Documents by persisting the fully-qualified class name. On April 17th, despite it having been stated that for canonical COLLECTION-PER-CLASS type of inheritance
storing the fully-qualified class name is 100% unnecessary, that alternative PR got merged into upstream Spring Data ArangoDB. Despite the fact that the inefficiencies introduced by the
alternative PR had been clearly shown, the maintainer of ArangoDB Spring Data refused to merge the original pull request (which had been updated to avoid persistence of the fully-qualified
class name for @Documents (because it's unnecessary & causes many issues & inefficiencies), leaving other cases as is (i.e., leaving them up to whatever ArangoDB Spring Data in general
wants to do with them (such as based on the alternative PR))), & closed it on May 22nd. To make it clear, the developer of this fork never made a request to not merge the alternative PR,
or to revert it: but the other developer requested the contributions here to not be merged, & that's how the PR got closed by the maintainer. The maintainer also has taken (on July 2, 2018) an insane position of refusing to accept a PR that removes fully-qualified class name storage, retrieval, & processing for @Edge
s without having been able to provide a single (!) use-case for which fully-qualified class names need to be stored for @Edge
s. Thus, to have
a rational mapping implementation for ArangoDB spring-data, there is a need for an alternative implementation: hence, this project.