Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persistent cache to BucketListDB #3696

Open
SirTyson opened this issue Apr 4, 2023 · 6 comments · May be fixed by #4565
Open

Add persistent cache to BucketListDB #3696

SirTyson opened this issue Apr 4, 2023 · 6 comments · May be fixed by #4565

Comments

@SirTyson
Copy link
Contributor

SirTyson commented Apr 4, 2023

Currently, there is no persistent database caching layer that persists between ledgers. While LedgerTxnRoot maintains a cache for a single ledger, this cache is cleared when the ledger is committed. Before BucketListDB, the backend SQL implementation maintained its own cache. However, now that we are transitioning to BucketListDB, it is worth investigating a database wide cache.

For read/write entries, the BucketList structure provides an implicit cache already. Recently modified entries live in the smallest, top level buckets. These buckets are already very quick to search because they use an individual key index and are small enough to fit in OS file page caches. However, Soroban will introduce large, read-only data types that are frequently accessed but rarely written, such as Contract Code. These entries do not benefit from the implicit cache design of the BucketList since they are never rewritten to the top level buckets. We should investigate a persistent cache for these read-only entry types.

Possible implementations to consider include a single BucketList wide cache, a per-bucket cache, or a cache line per entry class (i.e. one cache for ContractCode, one for ContractData, one for Stellar Classic types).

@MonsieurNicolas
Copy link
Contributor

We'll need more data to prioritize this one: BucketDB beat SQL replaying historical ledgers, which includes a lot of "read only" access (trade attempts, etc), so OS file caches may already provide the bulk of the gains.
It's possible that new patterns such as "network settings" will need some level of caching, but even for those it will depend on how the code is written/frequency.

@graydon
Copy link
Contributor

graydon commented Apr 11, 2023

There's a hair to split here I want to be careful about. I suggested this in conversation with @SirTyson and I think there's a good idea here that's adjacent to a much-less-productive idea.

I do not think we should replace the existing ltx entry cache with a single unified "database wide cache". I think that would be just as fragile as the existing ltx cache -- the cache's content would be at risk of getting out of sync with the (mutable) bucketlist, and we'd likely wind up flushing it on every bucketlist-modifying operation like we do on the ltx cache today. And we might well have another cache-invalidation catastrophe bug. These are super bad when they happen, we don't want to tempt fate by asking to start anew debugging another fragile cache.

But I do think per-bucket entry caches make sense, since buckets are immutable. It's logically challenging to imagine a cache-invalidation bug on a single bucket, because a bucket literally never changes after it's written -- there's no conceptual "invalidation" event to perform. So I think adding such per bucket caches would let us remove the existing fragile ltx cache. Bucketlist lookup would still probe at worst 10 levels * 2 = 20 buckets, it'd just have (like with the index and the bloom filter) a cache to accelerate the read that results from a hit on one of those 20 probes.

It would (incidentally) probably perform a bit better since the cache (and any hot entries in it) for the last-level bucket would remain warm across multiple transactions and ledgers. But my main interest here is actually "simplifying the caching we do in the ltx already", moving from "a single fragile cache that can be invalidated" to "a bunch of little robust caches that can't meaningfully be invalidated".

@SirTyson
Copy link
Contributor Author

SirTyson commented Mar 5, 2024

Closing, no longer makes sense know that we have BucketList snapshoting.

@SirTyson
Copy link
Contributor Author

Originally, this was closed because a per-bucket cache was redundant with parallel bucketlist snapshots + LTX based cache. However, now that overlay and the transaction queue bypasses the LTX cache, we are seeing performance degradation due to a lack of BucketList level caching.

At first, I thought a global cache made the most sense. The BucketList is read-only, with the exception of a single addBatch write function, such that cache invalidation would be easy to implement and audit. However, inserting new entries into a global cache would be challenging. BucketList reads all occur based on snapshot objects which maintain separate pointers and file streams. However, snapshots may be based on different ledgers at any given time. If snapshot A looks up an entry from it's personal file streams, it can't automatically add that entry to the primary, BucketList wide cache, as the actual BucketList (and cache) may be on a different ledger.

For this reason, I think we should go back to per-bucket caches. This is much simpler and probably doesn't lose us much performance wise, if at all. A global cache allows us to bypass the BucketIndex lookup step, while a per-bucket cache does not. However, the BucketIndex is all in-memory, so we probably don't do any additional disk IO if we just have a per-bucket cache (except for the occasional bloom miss).

For these reasons, I am back in the per-bucket cache camp.

@MonsieurNicolas
Copy link
Contributor

if by "per bucket cache" you mean a simple random eviction cache, that sounds fine to me (as we get to control the amount of RAM to allocate for this). I know you mentioned before keeping entire buckets in RAM which may not be doable for a while as it depends on other hardware requirements, especially in the context of watcher nodes that also have to accommodate for other services like Stellar-RPC.

@SirTyson
Copy link
Contributor Author

SirTyson commented Nov 5, 2024

Ya, for now this is per bucket random eviction cache with configurable size parameters. This should get us parity (or better) with the previous performance before switching to BucketList snapshots for TX queue. We should do this and enable it by default.

In addition, I think there are two other interesting experimental flags to expose. One is just to do all reads based on an in-memory version of the BucketList. I think this would be interesting for things like parallel catchup and max TPS test modes for now. Eventually, when we have state archival on classic and require beefier validator SKUs, I think this mode could be the default.

On a shorter time period, we could also introduce a flag that caches all Soroban entries in memory. This flag we could enable today in the current protocol without issue since the amount of Soroban state is small, and it's growth is capped via the target BucketList size config setting preventing DOS. I think validators currently use < 2 GB, and we require 16 GB SKUs. This might have some impact on reducing OS disk caches, but would allow us to crank up a lot of the IO related soroban resource limits.

@SirTyson SirTyson linked a pull request Dec 6, 2024 that will close this issue
6 tasks
@SirTyson SirTyson assigned SirTyson and unassigned ThomasBrady Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants