Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions routing/payment_lifecycle.go
Original file line number Diff line number Diff line change
Expand Up @@ -1114,8 +1114,9 @@ func (p *paymentLifecycle) patchLegacyPaymentHash(
}

// reloadInflightAttempts is called when the payment lifecycle is resumed after
// a restart. It reloads all inflight attempts from the control tower and
// collects the results of the attempts that have been sent before.
// a restart. It reloads all inflight attempts from the control tower,
// reconciles each attempt via the configured ReconcileAttempt callback, and
// then collects the results of the attempts that have been confirmed in-flight.
func (p *paymentLifecycle) reloadInflightAttempts() (paymentsdb.DBMPPayment,
error) {

Expand All @@ -1134,6 +1135,18 @@ func (p *paymentLifecycle) reloadInflightAttempts() (paymentsdb.DBMPPayment,
// it's a legacy payment.
a = p.patchLegacyPaymentHash(a)

// Execute the configured reconciliation strategy.
if err := p.router.cfg.ReconcileAttempt(&a); err != nil {
log.Warnf("Reconciliation failed for attempt "+
"%v in payment %v: %v. Skipping result "+
"collection; will retry on next restart.",
a.AttemptID, p.identifier, err)

continue
}
Comment on lines +1139 to +1146

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

A critical logic flaw exists here: if the ReconcileAttempt callback returns an error, the payment lifecycle skips result collection but continues to wait for the attempt, causing it to block indefinitely and potentially lock funds. To remediate this, the attempt should be failed in the database (e.g., via p.router.cfg.Control.FailAttempt) if reconciliation fails with an unresolvable error. Additionally, the current log message does not adhere to the repository's structured logging style guide and should be converted to use log.WarnS with key-value pairs, requiring "log/slog" to be imported.

log.WarnS("Reconciliation failed, skipping result collection",
	slog.Uint64("attempt_id", a.AttemptID),
	slog.String("payment_hash", p.identifier.String()),
	slog.String("error", err.Error()))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist This is expected. The primary goal is that existing lnd users are not impacted - which is guaranteed as we default to a no-op reconciler which cannot return an error here. The concern would only apply to someone running the lnd ChannelRouter type in a remote process - but even there, the current behavior seems to be correct approach. Consider:

  • The lifecycle goroutine exits cleanly on router shutdown and retries on restart. The "hang" is an idle goroutine waiting for a result that will arrive on a future lifecycle (after restart when the Switch is reachable again). In MPP scenarios, results for successfully reconciled shards are still collected normally — only the unreconciled shard's result is deferred to the next restart.
  • The unreconciled attempt's HTLC (if it exists on the Switch) will resolve via the normal HTLC timeout mechanism. If it settles, the preimage is collected on the next restart. If it times out, the funds return to the sender.
  • The suggestion to call FailAttempt when reconciliation fails would create a real fund safety risk. FailAttempt is only safe when we have definitive knowledge that the attempt failed. Reconciliation failure means "we could not determine the state of this attempt" — the correct response is to leave it as-is (in-flight) and try again later. The entire point of this addition is to provide a life-cycle hook for users to confirm the status of the htlc dispatch via "write-first" recovery - retrying the dispatch request and either receiving nil or AlreadyExists error before proceeding to result collection. When there is doubt about htlc dispatch status, we must be conservative and avoid making destructive state changes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation, @calvinrzachman. I understand your reasoning now regarding the ReconcileAttempt error handling. Your clarification that a reconciliation failure means "we could not determine the state of this attempt" and that FailAttempt would introduce a fund safety risk is well-taken. The current approach of skipping result collection and retrying on restart to maintain a conservative stance on fund safety is indeed a valid design choice for this specific context.

However, the logging message still does not adhere to the repository's structured logging style guide. It should be converted to use log.WarnS with key-value pairs to ensure consistency and better parseability of logs. This is a medium severity issue.

Suggested change
if err := p.router.cfg.ReconcileAttempt(&a); err != nil {
log.Warnf("Reconciliation failed for attempt "+
"%v in payment %v: %v. Skipping result "+
"collection; will retry on next restart.",
a.AttemptID, p.identifier, err)
continue
}
log.WarnS("Reconciliation failed, skipping result collection",
slog.Uint64("attempt_id", a.AttemptID),
slog.String("payment_hash", p.identifier.String()),
slog.String("error", err.Error()))


// The HTLC attempt is confirmed to be in-flight, so it
// is safe to proceed to awaiting the attempt result.
p.resultCollector(&a)
}

Expand Down
76 changes: 76 additions & 0 deletions routing/payment_lifecycle_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ func newTestPaymentLifecycle(t *testing.T) (*paymentLifecycle, *mockers) {
TrafficShaper: fn.Some[htlcswitch.AuxTrafficShaper](
&mockTrafficShaper{},
),
ReconcileAttempt: noOpReconcile,
},
quit: quitChan,
}
Expand Down Expand Up @@ -1862,3 +1863,78 @@ func TestReloadInflightAttemptsLegacy(t *testing.T) {
// Assert the result is received as expected.
require.Equal(t, result, r.result)
}

// TestReloadInflightAttemptsWithReconciliation checks that when a
// ReconcileFunc is configured, it is called for each in-flight attempt and
// result collection proceeds on success.
func TestReloadInflightAttemptsWithReconciliation(t *testing.T) {
t.Parallel()

p, m := newTestPaymentLifecycle(t)

// Create an attempt.
paymentAmt := 10_000
preimage := lntypes.Preimage{1}
attempt := makeSettledAttempt(t, paymentAmt, preimage)

// Track which attempts the callback receives.
var reconciledIDs []uint64
p.router.cfg.ReconcileAttempt = func(a *paymentsdb.HTLCAttempt) error {
reconciledIDs = append(reconciledIDs, a.AttemptID)
return nil
}

// Mock FetchPayment and InFlightHTLCs.
m.control.On("FetchPayment", p.identifier).Return(m.payment, nil).Once()
m.payment.On("InFlightHTLCs").Return(
[]paymentsdb.HTLCAttempt{*attempt},
).Once()

// Call the method under test.
payment, err := p.reloadInflightAttempts()
require.NoError(t, err)
require.Equal(t, m.payment, payment)

// The callback should have been invoked with the attempt.
require.Len(t, reconciledIDs, 1)
require.Equal(t, attempt.AttemptID, reconciledIDs[0])

// Result collection should have proceeded.
require.Equal(t, 1, m.collectResultsCount)
}

// TestReloadInflightAttemptsReconciliationError checks that when
// ReconcileFunc returns an error, result collection is skipped for that
// attempt.
func TestReloadInflightAttemptsReconciliationError(t *testing.T) {
t.Parallel()

p, m := newTestPaymentLifecycle(t)

// Create an attempt.
paymentAmt := 10_000
preimage := lntypes.Preimage{1}
attempt := makeSettledAttempt(t, paymentAmt, preimage)

// Configure a ReconcileFunc that always fails, simulating
// the case where the htlc's in-flight status cannot be
// confirmed.
p.router.cfg.ReconcileAttempt = func(a *paymentsdb.HTLCAttempt) error {
return errors.New("htlc status unknown: dispatch unavailable")
}

// Mock FetchPayment and InFlightHTLCs.
m.control.On("FetchPayment", p.identifier).Return(m.payment, nil).Once()
m.payment.On("InFlightHTLCs").Return(
[]paymentsdb.HTLCAttempt{*attempt},
).Once()

// Call the method under test.
payment, err := p.reloadInflightAttempts()
require.NoError(t, err)
require.Equal(t, m.payment, payment)

// Result collection should NOT have been called — the attempt was
// skipped because reconciliation failed.
require.Equal(t, 0, m.collectResultsCount)
}
33 changes: 33 additions & 0 deletions routing/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,24 @@ type MissionControlQuerier interface {
amt lnwire.MilliSatoshi, capacity btcutil.Amount) float64
}

// ReconcileFunc defines the callback invoked for each in-flight HTLC attempt
// during startup before result collection begins. This enables write-first
// recovery for remote dispatchers that need to confirm dispatch status after
// a crash.
//
// Implementations should re-dispatch the attempt idempotently, returning nil
// once the attempt is confirmed in-flight (newly dispatched or duplicate
// acknowledged). Transient failures should be retried internally; only return
// an error for unresolvable cases.
type ReconcileFunc func(a *paymentsdb.HTLCAttempt) error

// noOpReconcile is the default ReconcileFunc that proceeds directly to result
// tracking, preserving historic behavior of the daemon. The router and switch
// share a crash domain, so no additional reconciliation is needed.
func noOpReconcile(_ *paymentsdb.HTLCAttempt) error {
return nil
}

// FeeSchema is the set fee configuration for a Lightning Node on the network.
// Using the coefficients described within the schema, the required fee to
// forward outgoing payments can be derived.
Expand Down Expand Up @@ -295,6 +313,14 @@ type Config struct {
// TrafficShaper is an optional traffic shaper that can be used to
// control the outgoing channel of a payment.
TrafficShaper fn.Option[htlcswitch.AuxTrafficShaper]

// ReconcileAttempt is the strategy executed for each in-flight
// HTLC attempt during startup before result collection begins.
// See ReconcileFunc for details.
//
// Defaults to noOpReconcile when not set, preserving standard
// lnd behavior where the router and switch share a crash domain.
ReconcileAttempt ReconcileFunc
}

// EdgeLocator is a struct used to identify a specific edge.
Expand Down Expand Up @@ -338,6 +364,13 @@ type ChannelRouter struct {
// channel graph is a subset of the UTXO set) set, then the router will proceed
// to fully sync to the latest state of the UTXO set.
func New(cfg Config) (*ChannelRouter, error) {
// Default to a no-op reconciliation callback, preserving
// standard lnd behavior where the router and switch share a
// crash domain.
if cfg.ReconcileAttempt == nil {
cfg.ReconcileAttempt = noOpReconcile
}

return &ChannelRouter{
cfg: &cfg,
quit: make(chan struct{}),
Expand Down
Loading