-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Root specific lookup #84
Conversation
tests/test_hexary_trie.py
Outdated
db = {} | ||
trie = HexaryTrie(db=db) | ||
for key, val in changes: | ||
if val is None: | ||
del trie[key] | ||
missing_by_root[trie.root_hash].add(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this value to a set
instead of just assigning = key
might be a little overboard, since the trie never goes back to an old state root right now. But it seemed easy enough to keep it general right now, in case we change the input data later.
e0c7ff5
to
69f872f
Compare
TODO: use a different API, since the
|
ideas # local mutation
with trie.at_root(root_hash):
trie.get(key) # non-mutative to local trie
with trie.at_root(root_hash) as other:
other.get(key) or no context manager at all and encourage chaining. trie.at_root(root_hash).get(key) |
I was avoiding dealing with what happens if you try to change a trie at a different root. There are a lot of open questions about how this should work, especially if pruning is turned on: with trie.at_root(root_hash) as trie_at_root:
trie_at_root[b'key'] = b'val' (It's neat that using the context manager means you get all the methods on the trie upgraded to work with any root, for ~free) I could make |
69f872f
to
3bfb4ab
Compare
3bfb4ab
to
fda28be
Compare
tests/test_hexary_trie.py
Outdated
|
||
def test_hexary_trie_raises_on_pruning_snapshot(): | ||
db = {} | ||
trie = HexaryTrie(db, prune=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style flub due to copy-paste. Could combine previous lines since we don't need later access to db
. I'll save that fix until after review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing but nitpicks and vague concerns, left you some feedback but feel free to !
@@ -535,6 +542,14 @@ def squash_changes(self): | |||
yield memory_trie | |||
self.root_node = memory_trie.root_node | |||
|
|||
@contextlib.contextmanager | |||
def at_root(self, at_root_hash): | |||
if self.is_pruning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little dangerous, though I'm not sure how it could be fixed.
I guess the idea is: we're about to yield a copy of the trie and the two versions are likely to be used concurrently, if we allowed is_pruning
to be true then operations such as delete()
in one trie could mess with the other trie.
I scanned through get()
and set()
and some of the other methods and it seems this always works, is_pruning
means the underlying trie won't change out from under us, but there's also nothing which enforces that being true. This is probably just me being paranoid, but "no operations on the other trie will change our nodes" and "we don't prune away outdated nodes" don't seem like equivalent concepts.
In particular, an optimization which modified nodes in place rather than building new ones and pruning the old ones would be allowed under the is_pruning
interface but would break this method.
Definitely not suggesting this PR needs to do this! But: right now there are no restrictions on self.db
, a solution might involve requiring db
to be something which supports snapshotting. This would remove some potential for bugs and it'd also allow us to call at_root
on tries which have pruning enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sad because of Piper's suggestions I prefer the last two, the fluent interface is pretty!
The first one might be better though:
# local mutation
with trie.at_root(root_hash):
trie.get(key)
If it mutates the old trie then there's no chance for users to accidentally concurrently modify a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about inverting this and having the copy do something like acquire a lock on the parent trie, and the parent trie refuses to prune if that lock isn't free? That should allow more freedom of movement in this API while still providing roughly equal levels of safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, an optimization which modified nodes in place rather than building new ones and pruning the old ones would be allowed under the
is_pruning
interface but would break this method.
A trie that modifies in place (removing references to old nodes from the db) would certainly be "pruning" semantically, so is_pruning
should be True
there.
Definitely not suggesting this PR needs to do this! But: right now there are no restrictions on self.db, a solution might involve requiring db to be something which supports snapshotting. This would remove some potential for bugs and it'd also allow us to call at_root on tries which have pruning enabled.
Yup, I am on board with doing this in future work. Having an immutable version of the trie has been on my wishlist for a long time. I listed it as a potential breaking change to make the default trie immutable: #85
The first one might be better though:
# local mutation with trie.at_root(root_hash): trie.get(key)
If it mutates the old trie then there's no chance for users to accidentally concurrently modify a copy.
Here's my concern with that API. root_hash
mutability can really mess with you in an async context:
class TrieUser:
def __init__(self, trie):
self.trie = trie
async def look_up_old_val(self, key, root):
trie = self.trie
with trie.at_root(root):
# this checks if nodes are missing from the trie
while busted(trie):
await fix_up_trie(trie) # this is adding nodes to the trie
# this is returning the key at whatever root hash `trie` has
return trie[key]
def trie_update(self, key, val):
# Updates might come in while awaiting above.
self.trie[key] = val
If trie_update()
runs while await-ing fix_up_trie()
, then it:
- modifies against a different state root than the class was initialized with
- sets the new state root, which affects the
trie
await-ing inside the context above
trie/hexary.py
Outdated
@@ -8,7 +8,11 @@ | |||
keccak, | |||
) | |||
|
|||
from eth_utils import to_list, to_tuple | |||
from eth_utils import ( | |||
ValidationError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: trie.exceptions
has a ValidationError
which might be better
import fnmatch | ||
import itertools | ||
import json | ||
import os | ||
|
||
from eth_utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick of all nitpicks: I think our convention is to alphabetize these as well, maybe that could be part of this PR?
@@ -166,13 +168,20 @@ def unread_keys(self): | |||
def test_hexary_trie_saves_each_root(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: This test now tests a couple different things.
Well, it seems like an oversight that it was previously only checking whether there were extra items at all! It's a nice addition that it also verifies that at least part of the older tries are accessible.
However at_root
maybe deserves it's own test.
Also, Lines 166 to 176 in fda28be
|
- Split big test in two - Alphabetize some imports - Use as_root in get_from_proof - Use trie.exceptions.ValidationError instead of eth_utils
I'll hold on merging, in case someone still wants to convince me that mutable context manager is the better API. :) |
pinged you in chat but will do so here too. Re-suggesting my lock-based approach as I'm not sure you saw it: with trie.at_root(new_root) as other_trie:
... Implementation Details:
I'm sure I'm missing something but this seems ideal.
|
Sorry I didn't explicitly respond, I thought it was aimed at enabling the mutable API. This is the part I had trouble imagining a good solution for:
These are the implementations of that idea I came up with so far:
Concerns about these approaches:
Any other implementations I'm missing? Not supporting pruning seems like a relatively clean way to get this feature out now. I don't have a need for pruning in the syncing context. We can figure out how to support it if/when we need it later. Added: Any objections to me merging this and making a follow-up issue? |
What was wrong?
Want to look up the data in a node without first switching the root hash, eg, update API from:
to:
How was it fixed?
Builds on #83 -- this review can focus on just the last two commits (bf726c and e0c7ff5 currently) and I won't merge until #83 is approved.Dropped a few places where
HexaryTrie
was relying onself.root_hash
exclusively. Keep the old behavior for backwards compatibility.Extended a current test for checking values against old roots.
Cute Animal Picture