Split the block cache into block pointer cache and block data cache#6037
Split the block cache into block pointer cache and block data cache#6037
Conversation
a3d1291 to
4d76568
Compare
4d76568 to
a2acdaa
Compare
lutter
left a comment
There was a problem hiding this comment.
Nice! This should enable a much better/logical block caching strategy
| &self, | ||
| hash: &BlockHash, | ||
| ) -> Result<Option<(String, BlockNumber, Option<u64>, Option<BlockHash>)>, StoreError>; | ||
| ) -> Result<Option<(String, BlockNumber, Option<BlockTime>, Option<BlockHash>)>, StoreError>; |
There was a problem hiding this comment.
Are all these Option still justified? I think they will all always be Some. It would also be nicer to have a struct for this. Maybe call it BlockPointer since it's one row from that table (and BlockPtr is than a small excerpt from that)
Also, this method should be renamed to block_pointer
There was a problem hiding this comment.
There's not always a timestamp, on the shared storage model it still can be None
The option BlockTime is a little weird but I kept it because there is a different between Some(epoch time) and None, it's more idiomatic to have Option than checking BlockTime == BlockTime::NONE or MIN which are also in fact the same value (I didn't really get why).
| &self, | ||
| block_hash: &BlockHash, | ||
| ) -> Result<Option<(BlockNumber, Option<u64>, Option<BlockHash>)>, StoreError>; | ||
| ) -> Result<Option<(BlockNumber, Option<BlockTime>, Option<BlockHash>)>, StoreError>; |
There was a problem hiding this comment.
And this could also just be called block_pointer
| @@ -0,0 +1,40 @@ | |||
| DATABASE_TEST_VAR_NAME := "THEGRAPH_STORE_POSTGRES_DIESEL_URL" | |||
| DATABASE_URL := "postgresql://graph-node:let-me-in@localhost:5432/graph-node" | |||
|
|
|||
There was a problem hiding this comment.
What's a justfile? This should be your local file, not something in the repo
There was a problem hiding this comment.
this is similar to a make file, it's intentionally to be in the repo, provides some shortcuts for common operations, you don't need to use it yourself but it's useful to have for others
There was a problem hiding this comment.
There was a problem hiding this comment.
|
|
||
| # Requires test-deps to be running, see test-deps-up | ||
| it-test *ARGS: | ||
| just _run_in_bash cargo test --test integration_tests -- --nocapture {{ ARGS }} |
There was a problem hiding this comment.
These can be just aliases in ~/.cargo/config.toml. I have e.g.
[alias]
store = "test -p graph-store-postgres"
tst = "test --workspace --exclude graph-tests"
docs = "doc --workspace --document-private-items"
gm = "install --bin graphman --path node --locked"
gmt = "install --bin graphman --path node --locked --root /var/tmp/cargo"
rt = "test -p graph-tests --test runner_tests"
it = "test -p graph-tests --test integration_tests -- --nocapture"There was a problem hiding this comment.
and that's local, this works for everyone.
store/postgres/src/chain_store.rs
Outdated
| INSERT INTO {nsp}.version VALUES ({version}) ON CONFLICT DO NOTHING; | ||
| ", | ||
| nsp = nsp, | ||
| version = Storage::CHAINS_SCHEMA_VERSION, |
There was a problem hiding this comment.
You don't need this version table and mechanism, and in a way it's a denormalization.
You can find out from information_schema.tables whether the block_pointers table exists and decide based on that whether the migration needs to be run. Since everything this migration does happens in one transaction, you can be sure that the changes to the blocks table also happened and don't need to check for that.
There was a problem hiding this comment.
I thought about doing it this way but it's entirely possible there's other changes in the future, having a version makes it easy to figure out what is the current version of the schema and implement the different changes sequentially, it's much simpler than trying to figure out each step through pg metadata
There was a problem hiding this comment.
The version table is completely unnecessary; if there are more changes in the future, they can also look at the information_schema to determine whether they have been applied or not. Plus, over time, people will forget what these version numbers mean. In any event, it would be good if the comment on this method actually explained what the migration is doing.
There was a problem hiding this comment.
The argument was never that it is necessary, it is that is simpler to use and understand (portable too) but whatever, I'll change it to use psql tables...
| /// The id of the sole publisher in the test data | ||
| static ref PUB1: IdVal = IdType::Bytes.parse("0xb1"); | ||
| /// The chain we actually put into the chain store, blocks 0 to 3 | ||
| // static ref CHAIN: Vec<FakeBlock> = vec![GENESIS_BLOCK.clone(), BLOCK_ONE.clone(), BLOCK_TWO.clone(), BLOCK_THREE.clone()]; |
store/test-store/src/block_store.rs
Outdated
| /// The parts of an Ethereum block that are interesting for these tests: | ||
| /// the block number, hash, and the hash of the parent block | ||
| #[derive(Clone, Debug, PartialEq)] | ||
| #[derive(Default, Clone, Debug, PartialEq)] |
There was a problem hiding this comment.
This doesn't need to be Default (and there's not really a sensible default for a block)
There was a problem hiding this comment.
The default here allows you to use { number x, ..Default::default() }, it's really just to make the tests a little less verbose but it turns out I didn't actually use it 😆
graph/src/data_source/offchain.rs
Outdated
| self.mapping.handler.clone(), | ||
| BlockPtr::new(Default::default(), self.creation_block.unwrap_or(0)), | ||
| BlockTime::NONE, | ||
| BlockTime::MIN, |
There was a problem hiding this comment.
from testing, I'll revert, it's the exact same value, not sure why either
graph/src/blockchain/types.rs
Outdated
| } | ||
| } | ||
|
|
||
| impl FromStr for BlockTime { |
There was a problem hiding this comment.
This impl is very unintuitive to me, that parsing a string will try to interpret the string as a hex/decimal number.
There was a problem hiding this comment.
That's how it was used I just move the implementation somewhere that was easier to find. The previous function was try_parse_timestamp or something similar. If it's the naming I can change it a method?
graph/src/blockchain/types.rs
Outdated
| /// have a timestamp | ||
| pub const NONE: Self = Self(Timestamp::NONE); | ||
| // /// A timestamp from a long long time ago used to indicate that we don't | ||
| // /// have a timestamp |
There was a problem hiding this comment.
Seems like some extra comment signs snuck in
1e10c68 to
1f7a117
Compare
store/postgres/src/chain_store.rs
Outdated
| fn make_ddl(nsp: &str) -> String { | ||
| format!( | ||
| " | ||
| CREATE TABLE IF NOT EXISTS {nsp}.block_pointers ( |
There was a problem hiding this comment.
THere's no need to make this idempotent. You run all this in one transaction, so either it all succeed or none of it succeeds. There's no way that this table gets created but other statements later on do not succeed.
store/postgres/src/chain_store.rs
Outdated
| INSERT INTO {nsp}.version VALUES ({version}) ON CONFLICT DO NOTHING; | ||
| ", | ||
| nsp = nsp, | ||
| version = Storage::CHAINS_SCHEMA_VERSION, |
There was a problem hiding this comment.
The version table is completely unnecessary; if there are more changes in the future, they can also look at the information_schema to determine whether they have been applied or not. Plus, over time, people will forget what these version numbers mean. In any event, it would be good if the comment on this method actually explained what the migration is doing.
| format!( | ||
| " | ||
| CREATE TABLE IF NOT EXISTS {nsp}.block_pointers ( | ||
| hash BYTEA not null primary key, |
There was a problem hiding this comment.
Yes, you can't use the number as a pk, I was talking about a synthetic pk, like an auto-incrementing counter. But thinking about this more, what we want in the fullness of time to avoid storing block hashes redundantly is to move the data column to the block_pointers table. Really, the main point of this PR is to add a timestamp column to the blocks table without requiring a rewrite/truncation of that table. The PR is a good first step to that, and we'll address the duplication by figuring out how to get the data into the block_pointers table at some point.
Split the block cache into block pointer cache and block data cache