Root specific lookup #84

carver · 2019-03-15T18:47:05Z

What was wrong?

Want to look up the data in a node without first switching the root hash, eg, update API from:

original_root_hash = trie.root_hash
trie.root_hash = target_root_hash
assert trie.get(key) == expected_value_at_target_hash
trie.root_hash = original_root_hash

to:

assert trie.get(key, target_root_hash) == expected_value_at_target_hash

How was it fixed?

~~Builds on #83 -- this review can focus on just the last two commits (bf726c and e0c7ff5 currently) and I won't merge until #83 is approved.~~

Dropped a few places where HexaryTrie was relying on self.root_hash exclusively. Keep the old behavior for backwards compatibility.

Extended a current test for checking values against old roots.

Cute Animal Picture

carver · 2019-03-15T18:48:57Z

tests/test_hexary_trie.py

    db = {}
    trie = HexaryTrie(db=db)
    for key, val in changes:
        if val is None:
            del trie[key]
+            missing_by_root[trie.root_hash].add(key)


Adding this value to a set instead of just assigning = key might be a little overboard, since the trie never goes back to an old state root right now. But it seemed easy enough to keep it general right now, in case we change the input data later.

carver · 2019-03-19T00:33:09Z

TODO: use a different API, since the Mapping API already has a meaning for the 2nd-position: get(key, default). Some options:

get(key, *, at_root_hash=None)
lookup(key, at_root_hash)
get_by_root(key, at_root_hash)

pipermerriam · 2019-03-19T00:36:45Z

ideas

# local mutation
with trie.at_root(root_hash):
    trie.get(key)

# non-mutative to local trie
with trie.at_root(root_hash) as other:
    other.get(key)

or no context manager at all and encourage chaining.

trie.at_root(root_hash).get(key)

carver · 2019-03-19T18:07:57Z

I was avoiding dealing with what happens if you try to change a trie at a different root. There are a lot of open questions about how this should work, especially if pruning is turned on:

with trie.at_root(root_hash) as trie_at_root:
  trie_at_root[b'key'] = b'val'

(It's neat that using the context manager means you get all the methods on the trie upgraded to work with any root, for ~free)

I could make trie_at_root read-only, I suppose, but that concept isn't implemented yet, so would take some extra work. Maybe the open questions go away if pruning is disallowed in combination with this feature. Have to think about it a bit.

carver · 2019-03-19T18:50:48Z

tests/test_hexary_trie.py

+
+def test_hexary_trie_raises_on_pruning_snapshot():
+    db = {}
+    trie = HexaryTrie(db, prune=True)


Style flub due to copy-paste. Could combine previous lines since we don't need later access to db. I'll save that fix until after review.

lithp

I have nothing but nitpicks and vague concerns, left you some feedback but feel free to !

lithp · 2019-03-20T02:42:37Z

trie/hexary.py

@@ -535,6 +542,14 @@ def squash_changes(self):
            yield memory_trie
        self.root_node = memory_trie.root_node

+    @contextlib.contextmanager
+    def at_root(self, at_root_hash):
+        if self.is_pruning:


This seems a little dangerous, though I'm not sure how it could be fixed.

I guess the idea is: we're about to yield a copy of the trie and the two versions are likely to be used concurrently, if we allowed is_pruning to be true then operations such as delete() in one trie could mess with the other trie.

I scanned through get() and set() and some of the other methods and it seems this always works, is_pruning means the underlying trie won't change out from under us, but there's also nothing which enforces that being true. This is probably just me being paranoid, but "no operations on the other trie will change our nodes" and "we don't prune away outdated nodes" don't seem like equivalent concepts.

In particular, an optimization which modified nodes in place rather than building new ones and pruning the old ones would be allowed under the is_pruning interface but would break this method.

Definitely not suggesting this PR needs to do this! But: right now there are no restrictions on self.db, a solution might involve requiring db to be something which supports snapshotting. This would remove some potential for bugs and it'd also allow us to call at_root on tries which have pruning enabled.

I'm sad because of Piper's suggestions I prefer the last two, the fluent interface is pretty!

The first one might be better though:

# local mutation with trie.at_root(root_hash): trie.get(key)

If it mutates the old trie then there's no chance for users to accidentally concurrently modify a copy.

What about inverting this and having the copy do something like acquire a lock on the parent trie, and the parent trie refuses to prune if that lock isn't free? That should allow more freedom of movement in this API while still providing roughly equal levels of safety.

In particular, an optimization which modified nodes in place rather than building new ones and pruning the old ones would be allowed under the is_pruning interface but would break this method.

A trie that modifies in place (removing references to old nodes from the db) would certainly be "pruning" semantically, so is_pruning should be True there.

Definitely not suggesting this PR needs to do this! But: right now there are no restrictions on self.db, a solution might involve requiring db to be something which supports snapshotting. This would remove some potential for bugs and it'd also allow us to call at_root on tries which have pruning enabled.

Yup, I am on board with doing this in future work. Having an immutable version of the trie has been on my wishlist for a long time. I listed it as a potential breaking change to make the default trie immutable: #85

The first one might be better though:

# local mutation with trie.at_root(root_hash): trie.get(key)

If it mutates the old trie then there's no chance for users to accidentally concurrently modify a copy.

Here's my concern with that API. root_hash mutability can really mess with you in an async context:

class TrieUser: def __init__(self, trie): self.trie = trie async def look_up_old_val(self, key, root): trie = self.trie with trie.at_root(root): # this checks if nodes are missing from the trie while busted(trie): await fix_up_trie(trie) # this is adding nodes to the trie # this is returning the key at whatever root hash `trie` has return trie[key] def trie_update(self, key, val): # Updates might come in while awaiting above. self.trie[key] = val

If trie_update() runs while await-ing fix_up_trie(), then it:

modifies against a different state root than the class was initialized with

sets the new state root, which affects the trie await-ing inside the context above

lithp · 2019-03-20T02:44:12Z

trie/hexary.py

@@ -8,7 +8,11 @@
    keccak,
 )

-from eth_utils import to_list, to_tuple
+from eth_utils import (
+    ValidationError,


nitpick: trie.exceptions has a ValidationError which might be better

lithp · 2019-03-20T02:44:30Z

tests/test_hexary_trie.py

 import fnmatch
+import itertools
 import json
 import os

 from eth_utils import (


nitpick of all nitpicks: I think our convention is to alphabetize these as well, maybe that could be part of this PR?

lithp · 2019-03-20T03:03:21Z

tests/test_hexary_trie.py

@@ -166,13 +168,20 @@ def unread_keys(self):
 def test_hexary_trie_saves_each_root():


nitpick: This test now tests a couple different things.

Well, it seems like an oversight that it was previously only checking whether there were extra items at all! It's a nice addition that it also verifies that at least part of the older tries are accessible.

However at_root maybe deserves it's own test.

lithp · 2019-03-20T03:11:46Z

Also, get_from_proof does the root_hash thing, maybe it could be updated as part of this PR:

py-trie/trie/hexary.py

Lines 166 to 176 in fda28be

    
           @classmethod 
        
           def get_from_proof(cls, root_hash, key, proof): 
        
               trie = cls({}) 
        
               for node in proof: 
        
                   trie._set_raw_node(node) 
        
               trie.root_hash = root_hash 
        
               try: 
        
                   return trie.get(key) 
        
               except KeyError as e: 
        
                   raise BadTrieProof("Missing proof node with hash {}".format(e.args))

- Split big test in two - Alphabetize some imports - Use as_root in get_from_proof - Use trie.exceptions.ValidationError instead of eth_utils

carver · 2019-03-20T20:05:25Z

I'll hold on merging, in case someone still wants to convince me that mutable context manager is the better API. :)

pipermerriam · 2019-03-20T20:10:50Z

pinged you in chat but will do so here too.

Re-suggesting my lock-based approach as I'm not sure you saw it:

with trie.at_root(new_root) as other_trie:
    ...

Implementation Details:

other_trie would be a separate object from trie, but sharing the same database.
other_trie would acquire a "lock" from trie which isn't released until the context exits.
trie will refuse to prune if this lock is not free.
other_trie would be created with pruning disabled.

I'm sure I'm missing something but this seems ideal.

allows use of the API even when pruning is enabled on the base trie
seems to protect against the weird concurrency issues you referred to.

carver · 2019-03-20T21:13:17Z

Sorry I didn't explicitly respond, I thought it was aimed at enabling the mutable API. This is the part I had trouble imagining a good solution for:

trie will refuse to prune if this lock is not free.

These are the implementations of that idea I came up with so far:

raise an exception if trying to prune while "lock" is held by context
hold an asyncio.Lock() until context ends
hold a threading.Lock() until context ends

Concerns about these approaches:

Getting a surprise exception sometime later during a get or set seems worse than just disabling the option up-front, until we have a solution that actually works for a snapshot on a pruning trie.
AFAICT requires all of .get(), .set() etc to be converted to async, and drop the old APIs -- I don't see the win as big enough to justify this kind of API upset
If the trie is being used by an async method, using a threading.Lock() would lock up the event loop indefinitely

Any other implementations I'm missing?

Not supporting pruning seems like a relatively clean way to get this feature out now. I don't have a need for pruning in the syncing context. We can figure out how to support it if/when we need it later. ~~Is there something I'm missing about why it needs that support right away?~~

Added: Any objections to me merging this and making a follow-up issue?
Otherwise, I am happy to leave it open and get back to syncing stuff.

pipermerriam · 2019-03-20T22:14:04Z

carver commented Mar 15, 2019

View reviewed changes

carver requested a review from lithp March 15, 2019 18:49

carver force-pushed the root-specific-lookup branch from e0c7ff5 to 69f872f Compare March 18, 2019 19:35

carver force-pushed the root-specific-lookup branch from 69f872f to 3bfb4ab Compare March 19, 2019 18:36

Enable root-specific lookups on trie

fda28be

carver force-pushed the root-specific-lookup branch from 3bfb4ab to fda28be Compare March 19, 2019 18:40

carver commented Mar 19, 2019

View reviewed changes

lithp approved these changes Mar 20, 2019

View reviewed changes

Review fixups

9daafaf

- Split big test in two - Alphabetize some imports - Use as_root in get_from_proof - Use trie.exceptions.ValidationError instead of eth_utils

Add release notes

6460a95

pipermerriam mentioned this pull request Mar 20, 2019

asyncio support #87

Open

carver merged commit 76071d2 into ethereum:master Mar 21, 2019

carver deleted the root-specific-lookup branch March 21, 2019 00:38

pacrob pushed a commit to pacrob/py-trie that referenced this pull request May 12, 2023

introduce contributing doc + main branch convention (ethereum#84)

da9b95f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Root specific lookup #84

Root specific lookup #84

carver commented Mar 15, 2019 •

edited

Loading

carver Mar 15, 2019 •

edited

Loading

carver commented Mar 19, 2019 •

edited

Loading

pipermerriam commented Mar 19, 2019

carver commented Mar 19, 2019

carver Mar 19, 2019 •

edited

Loading

lithp left a comment

lithp Mar 20, 2019

lithp Mar 20, 2019

pipermerriam Mar 20, 2019

carver Mar 20, 2019 •

edited

Loading

lithp Mar 20, 2019

lithp Mar 20, 2019

lithp Mar 20, 2019

lithp commented Mar 20, 2019

carver commented Mar 20, 2019

pipermerriam commented Mar 20, 2019 •

edited

Loading

carver commented Mar 20, 2019 •

edited

Loading

pipermerriam commented Mar 20, 2019

		@@ -166,13 +168,20 @@ def unread_keys(self):
		def test_hexary_trie_saves_each_root():

Root specific lookup #84

Root specific lookup #84

Conversation

carver commented Mar 15, 2019 • edited Loading

What was wrong?

How was it fixed?

Cute Animal Picture

carver Mar 15, 2019 • edited Loading

Choose a reason for hiding this comment

carver commented Mar 19, 2019 • edited Loading

pipermerriam commented Mar 19, 2019

carver commented Mar 19, 2019

carver Mar 19, 2019 • edited Loading

Choose a reason for hiding this comment

lithp left a comment

Choose a reason for hiding this comment

lithp Mar 20, 2019

Choose a reason for hiding this comment

lithp Mar 20, 2019

Choose a reason for hiding this comment

pipermerriam Mar 20, 2019

Choose a reason for hiding this comment

carver Mar 20, 2019 • edited Loading

Choose a reason for hiding this comment

lithp Mar 20, 2019

Choose a reason for hiding this comment

lithp Mar 20, 2019

Choose a reason for hiding this comment

lithp Mar 20, 2019

Choose a reason for hiding this comment

lithp commented Mar 20, 2019

carver commented Mar 20, 2019

pipermerriam commented Mar 20, 2019 • edited Loading

carver commented Mar 20, 2019 • edited Loading

pipermerriam commented Mar 20, 2019

carver commented Mar 15, 2019 •

edited

Loading

carver Mar 15, 2019 •

edited

Loading

carver commented Mar 19, 2019 •

edited

Loading

carver Mar 19, 2019 •

edited

Loading

carver Mar 20, 2019 •

edited

Loading

pipermerriam commented Mar 20, 2019 •

edited

Loading

carver commented Mar 20, 2019 •

edited

Loading