-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fallback-reorg-sweep-case #10506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fallback-reorg-sweep-case #10506
Conversation
We have two versions: for itests, we just use one conf, but in prod, we'll scale the number of confirmations.
This'll be useful for the set up upcoming itests.
In this commit, we add a new param that'll allow us to scale up the number of confirmations before we act on a new close. We'll use this later to improve the current on chain handling logic.
We wnt to add better handling, but not break any UIs or wallets. So we'll continue to send out a notification after a single confirmation, then send another after things are fully confirmed.
…re n is num confs In this commit, we update the close logic to handle re-ogs up to the final amount of confirmations. This is done generically, so we're able to handle events such as: coop close confirm, re-org, breach confirm, re-org, force close confirm, re-org, etc. The upcoming set of new tests will exercise all of these cases. We modify the block beat handling to unify the control flow. As it's possible we get the beat, then see the spend, or the oher way around.
We'll use this for all the upcoming tests.
All the tests need to send a confirmation _after_ the spend is detected now.
This set of new tests ensures that if have created N RBF variants of the coop close transaction, that any of then can confirm, and be re-org'd, with us detecting the final spend once it confirms deeploy enough.
In this commit, we add a set of generic close re-org tests. The most important test is the property based test, they will randomly confirm transactions, generate a re-org, then assert that eventually we dtect the final version.
This ensures that during the RBF process, if one confirms, a re-org occurs, then another confirms, that we'll properly detect this case.
…oses In this commit, we add a fast-path optimization to the chain watcher's closeObserver that immediately dispatches close events when only a single confirmation is required (numConfs == 1). This addresses a timing issue with integration tests that were designed around the old synchronous blockbeat behavior, where close events were dispatched immediately upon spend detection. The recent async confirmation architecture (introduced in commit f6f716a) properly handles reorgs by waiting for N confirmations before dispatching close events. However, this created a race condition in integration tests that mine blocks synchronously and expect immediate close notifications. With the build tag setting numConfs to 1 for itests, the async confirmation notification could arrive after the test already started waiting for the close event, causing timeouts. We introduce a new handleSpendDispatch method that checks if numConfs == 1 and, if so, immediately calls handleCommitSpend to dispatch the close event synchronously, then returns true to skip the async state machine. This preserves the old behavior for integration tests while maintaining the full async reorg protection for production (where numConfs >= 3). The implementation adds the fast-path check in both spend detection paths (blockbeat and spend notification) to ensure consistent behavior regardless of which detects the spend first. We also update the affected unit tests to remove their expectation of confirmation registration, since the fast-path bypasses that step entirely. This approach optimizes for the integration test scenario without compromising production safety, as the fast-path only activates when a single confirmation is sufficient - a configuration that only exists in the controlled test environment.
With this change, we'll go to chain even earlier to ensure that we have enough time to sweep a potentially contested HTLC, now that we're waiting longer before sweeps to ensure that the commitment transaction is sufficeitnyl burried before we sweep.
This increases the minimum CLTV delta allowed for invoice creation to provide more headroom above DefaultFinalCltvRejectDelta (19 blocks). The previous value of 18 was below the reject threshold, which could allow users to create invoices with CLTV deltas that would be rejected when receiving payments.
For confTarget >= 1008, estimateFeeRate uses min relay fee, so a sweep could proceed even if the estimator would otherwise return a fee equal to the max fee. However for confTarget below (1007) if the fee-estimate returns a fee estimate greater or equal to the maxfee we would not sweep the input until we hit the l.width=1, now we always at least check the min-relay-fee and use it as a fallback before not broadcasting the sweep at all.
Summary of ChangesHello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of channel closure processes against blockchain reorganizations. It introduces a sophisticated state machine within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to channel closing logic, particularly for handling blockchain re-organizations. The chain_watcher is refactored into a state machine to robustly track spend confirmations and re-orgs. It also introduces dynamic scaling for the number of confirmations required for channel closes based on channel capacity, improving both security for large channels and user experience for small ones. The fee logic for sweeping inputs is also made more resilient. The changes are well-tested with an extensive new test suite, including property-based tests for re-org scenarios. My review found one potential issue in the new chain_watcher state machine logic that could lead to an inconsistent state if a confirmation registration fails.
| return spendProcessResult{ | ||
| pendingSpend: currentPendingSpend, | ||
| confNtfn: currentConfNtfn, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If RegisterConfirmationsNtfn fails after a different spend was detected and the old currentConfNtfn was cancelled, this function returns the now-cancelled currentConfNtfn. The closeObserver would then use this cancelled notification, which could cause the goroutine to exit unexpectedly when it reads from the closed notification channels.
To prevent this, it would be safer to transition back to the spendStateNone by returning an empty spendProcessResult. This ensures the watcher resets its state and waits for a new spend detection, which is a more robust failure handling approach.
return spendProcessResult{}|
|
||
| return nil, ErrZeroFeeRateDelta | ||
| // The delta is zero, we at least try to broadcast the tx with | ||
| // the min relay fee if we have enough budget to cover it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct - when the delta is zero, it simply means the deadline delta is too large or the budget is too small, in that sense we should fail the fee func because the fee delta increased every block will be zero, effectively making the RBF not work. If we noticed zero delta in the wild, we can instead increase the precision here for the delta here, like increase mSatPerKWeight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this came up during testing on regtest:
so the problem is the following:
imagine you have a budget which cannot suffice the fee-rate required, for example estimator has not enough data and fallsback to the "fallback" fee:
Regtest:
HTLC output was 10k sats => budget was 50% => we return the maxFee (endfeeRate)
Line 147 in eb7fbd4
| return maxFeeRate, nil |
now in that case the delta is always 0, I find it very reasonable if we can pay the min_relay_fee we should pay it and publish it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come the delta is always 0 when it's using the maxFeeRate? in addition, when the delta is 0, it means the fee bumping won't work, this is why it returns an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because startingFee = endFeeRate so 0/width = 0 ?
The idea of this PR is to always to use the MinRelay Fee if we can at least pay for it instead of waiting until thhe width is 1 (while the delta is 0), so an additional check. Feebumping will still work because we use the min_relay as a starting fee.
With this change the reorg PR behaves the same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last commit adds a test which describes what we are fixing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before the reorg change we did not see this behaviour because we would for HTLC which have no deadline use the default of 1008 which uses the min_relay as start-fee by default
builds on top of #10331