Implementation of a snapshot algorithm that retrieves the current total balance (of bitcakes - currency) in a distributed peer to peer system in which a large number of transactions are constantly being made. The snapshot algorithm is a mix between the Lai Yang - Li and the Spezialetti-Kearns algorithms.
The peer to peer network is a graph where communication between nodes is asynchronous and non-FIFO.
Two nodes can communicate only if they are connected (adjacent) on the graph.
Each node has a starting bitcake amount. Constant transactions rapidly change this value, which in addition to the asynchronous and non-FIFO communication makes it hard to retrieve the current total currency in the system, due to late or unreceived transaction messages at the time of computation.
This problem is solved by a mix of two algorithms:
- Lai Yang - Li variation (enables each node to perform multiple snapshots at a time)
- Specialetti - Kearns (enables multiple nodes to perform a snapshot concurrently by working together)
The system supports scripted launching, for running multiple nodes simultaneously and providing input commands to nodes using text files as input.
To run the system, a MultipleServentStarter class is provided. This class starts separate Node programs using the ProcessBuilder.
The user specifies a network graph in the config file which the MultipleServentStarter reads and provides Nodes with their specified port and id number via the program arguments.
Also the System.out, System.err and System.in are redirected to files /output/serventID_out.txt, /error/serventID_err.txt and /input/serventID_in.txt, to allow the user to supply all nodes with input commands simultaneously.
The user can also interact with nodes using the CLI (command line interface).
The sending of each message is delayed by a small random amount to simulate a realistic distributed system (because the system is tested locally on one machine).
This algorithm is able to compute the correct total balance by storing a separate history, for all potential snapshot initiators, of all sent and received transaction messages for all adjacent (neighbor) nodes. This history is used to detect unreceived transaction messages, and add them to the total balance.
This algorithm enables multiple nodes to compute the snapshot concurrently by forming so called "regions". When a node receives a snapshot message from a certain initiator, if it's the first snapshot message it receives during concurrent snapshot initiation, it saves that initiator and belongs to "his" region. All other initiators are declined and a border is formed with their regions. Once these regions are formed, snapshots are computed within them, after which the initiators exchange results in multiple rounds (starting with neighbor regions) until the final result is formed (all regions exchanged results).
- pause X (pauses the CLI for X amount of seconds, useful for timing certain input commands during testing)
- transaction_burst (sends a burst of 5 transaction messages (random small amount of bitcakes) to all neighbors)
- bitcake_info (initiates a snapshot)
- stop (stops the node)
Parameters are read and set during application launch and cannot be changed during operation.
File structure:
servent_count=16 - number of nodes in the system
clique=false - (not important, leftover from past implementation)
fifo=false - non fifo communication
snapshot=ly - (not important, leftover from past implementation)
servent0.port=1100 - port numbers
servent1.port=1200
servent2.port=1300
...
servent0.neighbors=1,2 - graph connections defined
servent1.neighbors=0
servent2.neighbors=0,3,4
...
servent0.init=true - only initiator nodes can initiate a snapshot
servent1.init=false
servent2.init=false
...
In this example there are 16 nodes in the network, 4 of them are initiators (nodes 0, 6, 10, 14) and initiate a snapshot concurrently.
Each node started with 1000 bitcakes, so there are 16000 bitcakes in the system.
If the algorithm works correctly the snapshot result should always be 16000, no matter how many transactions are concurrently happening, and at what time.
Here we take a look at the algorithm output on node[0] after the snapshot is initiated.
In this image we can see in the highlighted area that during the snapshot computation process node[0] didn't receive all sent messages from node[1].
The missing amount on node[0] is caught, thanks to the transaction history of both nodes, and added to the bitcake sum.
In this photo we see the bitcake sum result of this region (node[0] region).
The final part of combining region results is initiated.
Multiple rounds of regions exchanging results.
Finally after all regions exchanged results, in the highlighted area we can see the final bitcake sum result of the entire system.
This project was an assignment as a part of the course - Concurrent and Distributed Systems during the 8th semester at the Faculty of Computer Science in Belgrade. All system functionalities were defined in the assignment specifications.
- Stefan Ginic - stefangwars@gmail.com