Distributed Evolution and Hazelcast replacement #10382

tglman · 2025-01-13T15:36:48Z

tglman
Jan 13, 2025
Maintainer

The distributed implementation has been trough quite a lot of changes in the last few versions, we do have now a coherent way to determinate what transaction have been applied to a database, and also way to request recent partial changes if a specific database on a specific node is missing them, so in terms of the protocol that keep the general data consistent OrientDB right now is in a quite good shape, so in there is no need to design or introduce a new consensus protocol in the general transaction flow.

But this affirmation contrast a bit with the feedback of the users, that struggle to use OrientDB in a distributed setup, and is true OrientDB has still issues in distributed, and I'm going to detail a bit them, spoil ahead the issues are not in the data consensus protocol.

First major issues as today is what is called in OrientDB "distributed configuration" often users stumble on this in the form of "distributed-config.json" and "default-distributed-config.json", this data actually keep the network topology state, in the specific, what nodes are online, what databases are present on each node, and if a database in a specific node is available or not for participate to transactions.

This is quite critical information because it does affect when nodes need to check if they need to sync each other and what nodes do participate in the quorum for the data transactions.

As today though the protocol for keeping this metadata update is not consistent, is one of the protocol implementation of the various distributed protocols we implemented that is still be around, and is more or less the first that need to be redesigned, the current implementation is based on hazelcast network events and database status events, that each node catch and use to more or less concurrently try to merge in a "distributed configuration", this may sounds ok, but is not, the final distributed configuration result may be different based on the order of events applied (can't apply crdt rules to the current distributed configuration) with the result that each node may have a different view of the state of the network.

The good thing is that this events are few and not too common in the general flow of a distributed environment, so even with the not consistent protocol, if the nodes come up slowly one after the other and do not jump often offline and online is possible to have a consistent topology across the network, that allow OrientDB to process transaction correctly, the bad thing is if the topology is not correct this is used to decide the quorum, with as result quorum issues that are well known in the OrientDB issue tracker.

So the solution on this are is write a consistent protocol to manage the topology, the protocol will be likely based on the data protocol, that proved to be consistent as today, and probably also a limited use of crtd that are needed on the first bootstrap of the network when a quorum cannot be reached.

This is not the only thing that need to be evolved in the distributed implementation, another thing is the initial discovery of the nodes that participate in the network, we do rely today on a quite outdated version of hazelcast for this actually the only thing hazelcast is used today in OrientDB is node discovery and network topology management, and because network topology is going to be re-designed outside hazelcast, it may make sense to introduce something a bit more specific and more modern to handle node discovery, one good candidate on this would be Jgroups that has a quite wide support and do not require to buy in any pre-designed protocols or flows.

On the data flow side, there are improvement to be done as well, as today is missing the re-coordination of a pending transaction if the original coordinator goes offline, that may cause in case of a node going offline some transaction to stall (which is still consistent) , this can be improved but is secondary to the topology work.

Also as today the amount of data kept in the distributed logs of the nodes is not decided by the network state but on the need of the specific node, this can cause to trigger full transfer of database more often than needed, and in case of introduction of data partitioning may cause consistency issues, work will be done on making sure that there is some relation between the cleaning of the distributed log and the network state, this is also secondary on the topology work .

If anyone what to help to improve the distributed, the area that an outsider can help more is probably a new implementation of node discovery, feel free to join on this conversation with questions about it.

ikysil · 2025-02-16T22:54:09Z

ikysil
Feb 16, 2025

hi @tglman

I have some (outdated) experience with JGroups 2.4.
I noticed many improvements while skimming over the current manual but little too few conceptual changes since then.

Could you please elaborate your vision on the "node discovery"?
What is the ideal behavior to match your expectations?

Is there any documentation on the current OrientDB cluster management protocol?

2 replies

tglman Feb 17, 2025
Maintainer Author

Hi @ikysil,

So what I do require from JGroups (or any other library) is to discover and establish the first authenticated connection between the nodes of a cluster, this seems easy at first but require multiple integrations with external provider, do to a more or less comprehensive list of cases:

Manual list the IPs for the OrientDB nodes participating in the cluster
UDP multicast discovery of the nodes
Kubernates pods discovery
AWS EC2 node discovery
Google cloud node discovery
Azure node discovery
OpenStack node discovery

When the node is discovered is membership need as well to be validated against a secret (that each node has in the configuration)

I do not know how much work is needed to support each environment and I do prefer not need to do that work myself, after the nodes are identified and can connect to each other, the OrientDB own protocol can take over.

I do not need a library that has a complete set of distributed features like queues with guaranteed delivery or similar, we have implemented as state machine inside OrientDB that guarantee the data consistency, so I do not need to have that in the library.

Actually this library could be even optional as far as the only supported use case is listing the IPs manually and providing a secret (well is needed to implement a challenge based authentication as well that is not there now )

I hope it clarify as well the scope of the external dependency.

For the documentation of the current protocol there is not much there now, the network is based on the same structures that OrientDB have on the protocol that use for client/server communication, the consistency and the data flow is based on some internal state machine that the nodes agree on changes with a protocol based on paxos, plus some specific OrientDB things, to allow nodes to execute transactions in parallel(not on the same data).

I know a bit of documentation may help other people to contribute on this, but is not there now and it does take the same effort of write the project itself to write it.

tglman Feb 17, 2025
Maintainer Author

Also this : https://docs.hazelcast.com/imdg/3.12/clusters/discovery-mechanisms is the documentation of the feature that we have with hazelcast right now that need to be replaced, It could be possible that I do remove all the usage of hazelcast in our implementation and just keep it optionally for node discovery.

Ideally a library that does just portable node discovery would be amazing but I could not find one myself.

ikysil · 2025-02-17T22:36:12Z

ikysil
Feb 17, 2025

Summary of JGroups vs Hazelcast:

There are only few abstractions provided by JGroups library - initial discovery, notifications on membership changes, and sending/receiving messages.
The discovery protocols are more or less the same.
It has a better compatibility story compared to Hazelcast.

I am not aware of any existing library/approach which would support automated and efficient discovery over the wide range of network topologies and cloud providers. Moreover, such approach goes against best industry practices - the practices recommend restricting incoming and outgoing traffic of the database by default.

As of today, the usage of Hazelcast is not well encapsulated in the codebase - there are few places where objects are cast to Hazelcast-specific implementations, for example. Other than that, the metrics viewer is affected.

I could not check for the unit/integration tests for the areas affected by this change - are those robust enough to guide the implementation?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Evolution and Hazelcast replacement #10382

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Distributed Evolution and Hazelcast replacement #10382

tglman Jan 13, 2025 Maintainer

Replies: 2 comments · 2 replies

ikysil Feb 16, 2025

tglman Feb 17, 2025 Maintainer Author

tglman Feb 17, 2025 Maintainer Author

ikysil Feb 17, 2025

tglman
Jan 13, 2025
Maintainer

Replies: 2 comments 2 replies

ikysil
Feb 16, 2025

tglman Feb 17, 2025
Maintainer Author

tglman Feb 17, 2025
Maintainer Author

ikysil
Feb 17, 2025