refactor: use batch confirmation that was upstreamed to eigenda client #192

samlaf · 2024-10-25T11:27:50Z

This PR depends on Layr-Labs/eigenda#821 being merged first.

Main simplification that this allows is getting rid of the very complex retry logic that we had where an inner functionw as returning an error that was being caught by a very outward for loop, which was hard to understand. This PR makes that logic more local, but I opted to still keep the check in eigenda-proxy as well as an assert/safety guarantee. Think we also need it in the Get route... but not super familiar with that code path cc @epociask to make sure logic in this PR still works for that.

Fixes Issue

Fixes #

Changes proposed

Screenshots (Optional)

Note to reviewers

go.mod

samlaf · 2024-10-25T13:05:24Z

verify/cert.go

+	// 1. verify batch is actually onchain at the batchMetadata's state confirmedBlockNumber
+	// This assert is technically not necessary, but it's a good sanity check.
+	// It could technically happen that a confirmed batch's block gets reorged out,
+	// yet the tx is included in an earlier or later block, making this check fail...
+	confirmationBlockNumber := batchMetadata.GetConfirmationBlockNumber()
+	confirmationBlockNumberBigInt := big.NewInt(0).SetInt64(int64(confirmationBlockNumber))
+	_, err := cv.retrieveBatchMetadataHash(ctx, batchID, confirmationBlockNumberBigInt)
 	if err != nil {
-		return fmt.Errorf("failed to get context block: %w", err)
+		return fmt.Errorf("batch not found onchain at supposedly confirmed block %d: %w", confirmationBlockNumber, err)


Is this true? If so I think we should get rid of this check but document exactly why we are getting rid of it. Would have saved me a lot of time if this had been documented in the initial code.

need to review other PR, but this function seems useful to determine if it is EigenDA fault or ETH re-org

Interesting. But I'm not even sure, because the case I'm describing is this:
0. (assumption: we wait for 5 block confirmation)

batcher confirms batch onchain at block 1000

eigenda-client wait for 5 blocks and returns a BlobStatus

proxy receives this one block later at block 1006, and the batch has actually been reorged and included in block 1001 instead.

In this case the check highlighted here would fail, even though the block is actually confirmed with depth 5 (the next assert below around line 70 would catch if that was not the case)

For this reason I think this check is actually wrong here, and I still think we should remove this check. Does this make sense?

EigenDA batcher detects and updates confirmation number, if it detects reorg, https://github.com/Layr-Labs/eigenda/blob/master/disperser/batcher/finalizer.go#L198

bxue-l2 · 2024-10-25T16:35:26Z

verify/cert.go

+	// 1. verify batch is actually onchain at the batchMetadata's state confirmedBlockNumber
+	// This assert is technically not necessary, but it's a good sanity check.
+	// It could technically happen that a confirmed batch's block gets reorged out,
+	// yet the tx is included in an earlier or later block, making this check fail...
+	confirmationBlockNumber := batchMetadata.GetConfirmationBlockNumber()
+	confirmationBlockNumberBigInt := big.NewInt(0).SetInt64(int64(confirmationBlockNumber))
+	_, err := cv.retrieveBatchMetadataHash(ctx, batchID, confirmationBlockNumberBigInt)
 	if err != nil {
-		return fmt.Errorf("failed to get context block: %w", err)
+		return fmt.Errorf("batch not found onchain at supposedly confirmed block %d: %w", confirmationBlockNumber, err)


need to review other PR, but this function seems useful to determine if it is EigenDA fault or ETH re-org

wait for ethen's review

flags/eigendaflags/cli.go

verify/cert.go

epociask · 2024-10-25T19:34:29Z

verify/cert.go

-	expectedHash, err := cv.manager.BatchIdToBatchMetadataHash(&bind.CallOpts{BlockNumber: blockNumber}, id)
+	// 2. verify that the confirmation status has been reached
+	// We retry 5 times waiting 12 seconds (block time) between each retry,
+	// in case our eth node is behind that of the eigenda_client's node that deemed the batch confirmed


why would we ever use two nodes? or is the idea to have some resiliency to transient failures where the provider could be unreliable due to poor configuration?

Both eigenda-client and proxy dial the endpoint individually, establishing independent connections. If the nodes are behind a load-balancer, then both connections could actually be connected to different eth-nodes, its very frequent that rpc provider nodes are not in sync (bit me very hard once) for some dumb reason (web3 standards...).

Even if we injected the dialed connection from the proxy into the eigenda-client, the underlying tcp connection could drop in between the eigenda-client returning and proxy making its call, which would reconnect to another node, with some effect.

makes sense and was thinking that too - got burnt in the past using a node cluster with an ALB ingress where subsequent RPC queries could fail due to nodes being out of sync. Ik OP clusters typically use proxyd to mitigate this from happening. from what I remember the connection would be held with the ALB directly with different nodes being queried per each new request on the client --> ALB connection. Feel free to resolve this comment!

Actually on second thought - isn't this a bit naive when we have access to the L1ReferenceBlock number at which the batch should've confirmed? Also we're adding a 12 second cool-off which is aligned with the block production interval on Ethereum but not syncing

proxyd doesn't seem to solve this problem, they say: "we consider healthy is... latest block lag ≤ configurable threshold". Way to solve this is to use sticky sessions.

Not sure I understand your last comment. Is #192 (comment) related to what you are talking about?

Talked offline. Let's update to retrying every 2 seconds for the same amount of time (60 seconds).

verify/cert.go

samlaf · 2024-10-27T18:30:32Z

@epociask I'll let you review my answers before rebasing and fixing conflicts, so that the commit links still work.

flags/eigendaflags/cli.go

epociask · 2024-10-27T22:53:04Z

verify/cert.go

-	expectedHash, err := cv.manager.BatchIdToBatchMetadataHash(&bind.CallOpts{BlockNumber: blockNumber}, id)
+	// 2. verify that the confirmation status has been reached
+	// We retry 5 times waiting 12 seconds (block time) between each retry,
+	// in case our eth node is behind that of the eigenda_client's node that deemed the batch confirmed


makes sense and was thinking that too - got burnt in the past using a node cluster with an ALB ingress where subsequent RPC queries could fail due to nodes being out of sync. Ik OP clusters typically use proxyd to mitigate this from happening. from what I remember the connection would be held with the ALB directly with different nodes being queried per each new request on the client --> ALB connection. Feel free to resolve this comment!

verify/cert.go

…tion depth guarantee flags: add new eigenda-client flags for blob confirmation depth also pass those flags to verifier (until we also upstream verification to the eda client) comment: was pointing to wrong eigenda-client repo in TODO comment fix: go.mod to point to PR commit instead of using local replace directive chore: go mod tidy to generate go.sum chore: use proto Getter functions instead of fields (that are potentially nil) ci: upgrade golangci-lint version 1.60->1.61 fix: verifySecurityParams func arguments after rebase chore: make more robust verifyBatchConfirmedOnchain logic Added retry logic and better comments style: Onchain -> OnChain docs: better comment describing eth_getBlockByNumber call args style: better error msg when memstore enabled but cert verification is not fix: verifier.WaitForFinalization was not set fix(flags): deleted deprecated flags that had same name as new ones in other package, causing panic style(flags): merged WaitForFinalizationFlagName into ConfirmationDepth flag It now accepts uint conf depth or 'finalized' string now chore: remove unused utils.EqualBytes function (same as stdlib exp function anyways) chore: remove log line added for debugging

…y checking batch confirmation depth

epociask

LGTM

samlaf requested review from bxue-l2 and epociask October 25, 2024 11:27

samlaf commented Oct 25, 2024

View reviewed changes

go.mod Outdated Show resolved Hide resolved

samlaf force-pushed the samlaf/refactor--upstream-cert-verification-to-eigenda-client branch from e76a187 to 5b74b93 Compare October 25, 2024 12:19

samlaf commented Oct 25, 2024

View reviewed changes

bxue-l2 previously approved these changes Oct 25, 2024

View reviewed changes

epociask reviewed Oct 25, 2024

View reviewed changes

samlaf requested review from epociask and bxue-l2 October 27, 2024 18:30

epociask reviewed Oct 27, 2024

View reviewed changes

samlaf added 2 commits October 28, 2024 15:41

fix: missing contexts in a few places (after rebase)

ba94b43

samlaf force-pushed the samlaf/refactor--upstream-cert-verification-to-eigenda-client branch from de950b2 to ba94b43 Compare October 28, 2024 15:46

fix: lint issues

061ec30

samlaf changed the title ~~refactor: to use cert verification that was upstreamed to eigenda client~~ refactor: use batch confirmation that was upstreamed to eigenda client Oct 28, 2024

samlaf added 3 commits October 28, 2024 23:39

deps: update eigenda to commit 5fe3e910a22d after merging upstream PR

6a25446

docs(verifier): expand explanation for reorg edge case

55edfae

docs(verifier): make more precise explanation for why we need to retr…

0c54503

…y checking batch confirmation depth

samlaf requested a review from epociask October 28, 2024 23:54

style: fix lint

adbe3f7

epociask approved these changes Oct 29, 2024

View reviewed changes

samlaf merged commit c6ab8fb into main Oct 29, 2024
7 checks passed

samlaf deleted the samlaf/refactor--upstream-cert-verification-to-eigenda-client branch October 29, 2024 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: use batch confirmation that was upstreamed to eigenda client #192

refactor: use batch confirmation that was upstreamed to eigenda client #192

samlaf commented Oct 25, 2024 •

edited

Loading

samlaf Oct 25, 2024

bxue-l2 Oct 25, 2024

samlaf Oct 27, 2024

bxue-l2 Oct 28, 2024

samlaf Oct 28, 2024

bxue-l2 Oct 25, 2024

epociask Oct 25, 2024

samlaf Oct 27, 2024

epociask Oct 27, 2024

epociask Oct 28, 2024

samlaf Oct 28, 2024

samlaf Oct 28, 2024

samlaf Oct 28, 2024

samlaf commented Oct 27, 2024

epociask Oct 27, 2024

epociask left a comment

refactor: use batch confirmation that was upstreamed to eigenda client #192

refactor: use batch confirmation that was upstreamed to eigenda client #192

Conversation

samlaf commented Oct 25, 2024 • edited Loading

Fixes Issue

Changes proposed

Screenshots (Optional)

Note to reviewers

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samlaf commented Oct 27, 2024

Choose a reason for hiding this comment

epociask left a comment

Choose a reason for hiding this comment

samlaf commented Oct 25, 2024 •

edited

Loading