tip: 440
title: Transaction cache optimization
author: jell7@163.com
discussions to: https://github.com/tronprotocol/TIPs/issues/440
status: Final
type: Standards Track
category: Core
created: 2022-07-20
This TIP describes an optimization for transaction cache to reduce memory usage.
Analyzing the program runtime's memory, the transaction cache is shown to take up the overwhelming majority of memory. This article details an optimization for the transaction cache.
At present, hashmap is used to cache the recent transactions to filter duplicated ones, which leads to nearly 2GB memory consumption during program runtime(Probably 200-300 transactions per block). Meanwhile, frequent memory requests will trigger the JVM’s GC mechanism, which indirectly affects the processing time of the program. The above problems can be alleviated by introducing a new data structure instead of hashmap.
In order to record the recent transactions, a pair of Bloom Filters is introduced instead of a hashmap. Both filters record transactions’ IDs.
BloomFilter<byte[]>[] bloomFilters = new BloomFilter[2];
One is called the active filter, which records newer transactions and is prepared to add future transactions. The other is called an inactive filter, which will be transformed from an active filter when it is fully filled. These two filters can save the most recent block transaction hash with a minimum of 65536 and a maximum of 65536*2. When a transaction hash could not be found in both filters, it can be concluded that the transaction is not duplicated, otherwise, further querying of the transaction database is required to determine whether the transaction really exists.
The Bloom Filter is a space-efficient probabilistic data structure. It uses a constant memory space to cache elements and can identify whether a given element exists or not in a probabilistic way, that is, “definitely not in set“ or “possible in set“. Instead of requesting memory for each element, as a hashmap does, bloom filter records all elements in shared memory, which makes it very memory efficient. To identify duplicated transactions, recent transactions' IDs are recorded in a bloom filter. Then the filter checks the incoming transactions with it. If a query returns as “definitely not in set“, then the transaction is accepted and processed; otherwise, it means this transaction is “possible in set“, and then the filter queries the database for exact results.
However, the Bloom Filter cannot remove specified elements in it. In order to update the contents of the bloom filter, a pair of bloom filters is added. One filter is aimed to record earlier transactions, and it is called “ inactive filter“ as no transactions will be added to it; while the other one, called “active filter“, is to record fresher transactions, and it keeps adding new transactions.
When add a transaction to transaction cache:
- Initialization, create a Bloom Filter as the active filter
- Load recent transactions from the database to fill the active filter
- Put transaction hash in the active filter
- If the active filter is full, execute the filter rotation a. Drop inactive Filter b. Change active filter to inactive filter c. Create a new Bloom Filter as the active filter
- Continue step 3
When looking for a transaction in the transaction cache, two filters check are required.
With a pair of Bloom Filters, recent transactions can be tracked with reasonable and stable memory usage. The experimental conditions of the experiment are CPU with 16 cores and 18GB JVM heap memory. The results are as follows.
Solution | Memory Use | CMS Avg pause time | CMS Avg pause Interval |
---|---|---|---|
Hashmap | 2,084MB | 0.712s | 23977s |
Bloom Filter | 119MB | 0.109s | 33184s |
The above test shows that using Bloom Filters can significantly reduce memory usage and has a good optimization effect on JVM CMS Garbage Collection.
When a Fullnode is started, create a pair of bloom filters with a reasonable size to lower false positive probability.
private BloomFilter<byte[]>[] bloomFilters = new BloomFilter[2];
The currentFilterIndex records the index of the active filter. Initially, currentFilterIndex is assigned to be 0, which means, bloomFilters[0] is an active filter; bloomFilters[1] is assigned to be an inactive filter. Then load the recent transactions from the database to bloomFilters[1].
The filterStartBlock is used to record the start block of the active filter, by which it is determined whether the active filter has been filled. If the inactive filter is out-of-date, clear the whole filter's contents and swaps the pair of filters' roles by reassigning currentFilterIndex.
public void put(byte[] key, byte[] value) {
if (key == null || value == null) {
return;
}
Long blockNum = Longs.fromByteArray(value);
if (filterStartBlock == -1) {
// the first transaction, record the block num
filterStartBlock = blockNum;
currentFilterIndex = 0;
} else if (blockNum - filterStartBlock > MAX_SIZE) {
// active filter is full
currentFilterIndex ^= 1;
filterStartBlock = blockNum;
bloomFilters[currentFilterIndex] =
BloomFilter.create(Funnels.byteArrayFunnel(), MAX_SIZE * TRANSACTION_COUNT);
}
bloomFilters[currentFilterIndex].put(key);
}
The get method will check possible existed transactions in both filters.
public byte[] get(byte[] key) {
if (!bloomFilters[0].mightContain(key) && !bloomFilters[1].mightContain(key)) {
return null;
}
// this means exist
return DEFAULT_VALUE;
}