Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch+routerrpc: support unencrypted failure reasons #7067

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented Oct 21, 2022

This PR adds the possibility for users of the htlc interception api to supply arbitrary pre-encoded failure messages. LND will only add an hmac and encrypt the message for the first hop.

Interceptor applications that implement a virtual channel can use this to return for example a FeeInsufficient failure as if it was generated within the LND node.

The existing failure_code field on the htlc interceptor response is deprecated. Extending that with all possible failure codes and failure attributes doesn't look like an attractive path.

htlcswitch/interceptable_switch.go Outdated Show resolved Hide resolved
htlcswitch/interceptable_switch.go Outdated Show resolved Hide resolved
lnrpc/routerrpc/forward_interceptor.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch 3 times, most recently from 3b61a30 to 57823b6 Compare October 24, 2022 08:29
@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch from 57823b6 to 6067bc3 Compare October 24, 2022 10:40
@joostjager joostjager marked this pull request as ready for review October 24, 2022 11:55
@alexbosworth
Copy link
Contributor

not that i think this is in scope of the issue but it would also be cool generally if you could return the error as if it was from the next hop, if you know the private key associated with the next hop like for a virtual peer returning an invalid payment hash error

@joostjager
Copy link
Contributor Author

Assuming I understand the comment correctly, this is already supported on the htlc interceptor api via an encrypted failure message? For example in lnmux, we encrypt for the sender outside of lnd.

@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch 2 times, most recently from 221fe21 to 818c44e Compare October 25, 2022 12:15
@alexbosworth
Copy link
Contributor

Assuming I understand the comment correctly, this is already supported on the htlc interceptor api via an encrypted failure message? For example in lnmux, we encrypt for the sender outside of lnd.

yes exactly, what i mean is to do this at the API level in an unencrypted way, to avoid the caller having to build the encrypted failure message

@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch from 818c44e to 215c2c4 Compare October 25, 2022 15:10
bytes failure_message = 4;

// Indicates whether the failure message still needs to be encrypted for the
// first hop.
bool failure_message_unencrypted = 6;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly, what i mean is to do this at the API level in an unencrypted way, to avoid the caller having to build the encrypted failure message

Okay got it. I think it is debatable whether this is functionality that should live inside lnd. Passing in the private key is a security risk too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair, the problem though is that it seems like people using this API are messing up routing because they are returning errors from the wrong node and I hear reports about people needing to write workarounds for that. Another option to satisfy the no private key material constraint could be to not pass a private key but instead use an arbitrary child key for the message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion thread still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point, and indeed just passing in the shared secret for the virtual hop and letting lnd do the encryption could be a middle way.

This PR is about adding support for unencrypted failure reasons though, not making it easier to use the encrypted reason functionality that is pre-existing. I think that would be more suitable as a follow-up? In that a failure_message_shared_secret could be added to indicate that failure_message still has to be encrypted. At that point, failure_message_unencrypted = false can be deprecated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager IIUC, this is intended for a use case for "virtual nodes", so anycast-like behavior using special hop hints. In that case, how does this help? It'll let you send a plaintext error (from this interceptor sub-system) and have lnd encrypted that, where it's the source node. IIUC though, you actually want to be able to send errors from the "virtual node", which means you need to use the shared secret for that hop for the path finding retry/apply logic to work properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager IIUC, this is intended for a use case for "virtual nodes", so anycast-like behavior using special hop hints. In that case, how does this help? It'll let you send a plaintext error (from this interceptor sub-system) and have lnd encrypted that, where it's the source node. IIUC though, you actually want to be able to send errors from the "virtual node", which means you need to use the shared secret for that hop for the path finding retry/apply logic to work properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC though, you actually want to be able to send errors from the "virtual node", which means you need to use the shared secret for that hop for the path finding retry/apply logic to work properly.

That is correct.

But sometimes though the virtual node may want to return an error originating from the preceding real lnd node. An example of this is when there is a routing fee associated with the virtual channel. NYDIG-OSS/lnmux#101 illustrates this. The real node has to encrypt the insufficient fee failure message.

Intercepted htlcs do not pass through the switch in the real lnd instance. This means that any forwarding logic (such as the fee check) needs to be replicated by the external interceptor, and the interceptor requires the ability to instruct the real node to produce certain failure messages.

This has been possible in master for some time (

lnrpc.Failure.FailureCode failure_code = 5;
) using the failure_code field. This however is not a complete API. Ideally the complete failure message structure would need to be replicated on the proto level to be able to signal any failure. This PR takes the approach of accepting an encoded but not yet encrypted failure message to avoid that.

In summary this PR:

  • Completes the interceptor API by allowing users to supply a full failure message rather than a restrictive set of failure codes without additional data fields.
  • Enable use cases that require different failure messages to be returned, such as a non-zero virtual channel fee.

lnrpc/routerrpc/forward_interceptor.go Outdated Show resolved Hide resolved
lnrpc/routerrpc/forward_interceptor.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch 6 times, most recently from dbc18c2 to a5504ea Compare October 27, 2022 18:18
@joostjager joostjager requested a review from Roasbeef May 23, 2023 11:18
@saubyk saubyk modified the milestones: v0.17.0, v0.17.1 Jun 15, 2023
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look pretty good, I think the last commit needs to be squashed somewhere, or re-worked slightly. Main question here is re the intended behavior now, and if what is wanted is actually already supported, eg: lnd node N, makes virtual nodes M_i..., then makes invoices with pubkey of node M, they can then process the onion twice to decode full contents, and utilize the existing method to send back the error as if it came from node M.

@@ -134,7 +135,7 @@ func (r *forwardInterceptor) resolveFromClient(
// message size + hmac + two uint16 lengths. See BOLT
// #4.
if len(in.FailureMessage) !=
lnwire.FailureMessageLength+32+2+2 {
lnwire.FailureMessageLength+sha256.Size+2+2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah had thought it was introduced in this diff for some reason, I guess a rebase removed an earlier portion (was stacked on something else?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the exact history, but may be the case that a later refactor left just this.

}

code, err := unmarshalFailCode(in.FailureCode)
err := unmarshallFailureResolution(in, fwdRes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it's blocking, but we could have this instead return the new value and assign it here, s.t the new function is a pure function. Then tests can just exercise that the right value is returned vs asserting that the input is mutated as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying out multiple ways to make it a pure function, but after the changes in the later commit that extend this function it never really worked out in a clean way. So settled for this.

bytes failure_message = 4;

// Indicates whether the failure message still needs to be encrypted for the
// first hop.
bool failure_message_unencrypted = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager IIUC, this is intended for a use case for "virtual nodes", so anycast-like behavior using special hop hints. In that case, how does this help? It'll let you send a plaintext error (from this interceptor sub-system) and have lnd encrypted that, where it's the source node. IIUC though, you actually want to be able to send errors from the "virtual node", which means you need to use the shared secret for that hop for the path finding retry/apply logic to work properly.

bytes failure_message = 4;

// Indicates whether the failure message still needs to be encrypted for the
// first hop.
bool failure_message_unencrypted = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager IIUC, this is intended for a use case for "virtual nodes", so anycast-like behavior using special hop hints. In that case, how does this help? It'll let you send a plaintext error (from this interceptor sub-system) and have lnd encrypted that, where it's the source node. IIUC though, you actually want to be able to send errors from the "virtual node", which means you need to use the shared secret for that hop for the path finding retry/apply logic to work properly.

@@ -93,6 +94,11 @@ type InterceptableSwitch struct {
// currentHeight is the currently best known height.
currentHeight int32

// signChannelUpdate is used when an intercepting application includes
// an unsigned channel update to be signed by us.
signChannelUpdate func(u *lnwire.ChannelUpdate) (*ecdsa.Signature,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context in which this is to be used is to update the routing policy of a virtual channel. The channel isn't broadcasted through gossip and doesn't exist in the graph on disk.

Assuming the virtual channel has a distinct pubkey (and also shared secret in the onion), then how would this work in practice? lnd would use diff logic for if it was the final hop vs the penultimate hop.

return err
}

err = s.resignChannelUpdate(update)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also commit this update to disk as well? Apologies if this was already an older review comment...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the real node isn't aware of this virtual channel at all. It's all up to the external interceptor logic to keep track of the virtual channel properties. As mentioned in #7067 (comment), the external interceptor is taking over part of the responsibilities of the switch.

update *lnwire.ChannelUpdate) error {

// The maxHTLC flag is mandatory.
if !update.MessageFlags.HasMaxHtlc() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More validation is needed here, eg: max HTLC shouldn't be greater than the channel capacity, etc, etc. I think we should just call the normal validation routine.

Copy link
Contributor Author

@joostjager joostjager Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to call

func ValidateChannelUpdateFields(capacity btcutil.Amount,

But that would create a circular reference. For now, I just copied over the additional max_htlc >= min_htlc check. Capacity checking is not possible, because the real lnd doesn't know what the capacity is. It's probably infinite because the channel is typically virtual.

If desired I can do a bigger refactor and move the validation logic into htlcswitch. But not sure if worth it.

@joostjager
Copy link
Contributor Author

Changes look pretty good, I think the last commit needs to be squashed somewhere, or re-worked slightly.

Yes, will squash once I am certain we're all on the same page with this PR.

Main question here is re the intended behavior now, and if what is wanted is actually already supported, eg: lnd node N, makes virtual nodes M_i..., then makes invoices with pubkey of node M, they can then process the onion twice to decode full contents, and utilize the existing method to send back the error as if it came from node M.

Hopefully my replies in this round have clarified this. I know it is confusing, but the scope of this PR is not to change anything about errors encrypted by M.

@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch from e415913 to f5b80e8 Compare August 8, 2023 09:18
Prepare for adding a header to a pre-encoded message.
This commit adds the possibility for users of the htlc
interception api to supply arbitrary pre-encoded failure
messages. LND will only add an hmac and encrypt the message
for the first hop.
@lightningnetwork lightningnetwork deleted a comment from Roasbeef Aug 8, 2023
@joostjager joostjager force-pushed the interceptor-unencrypted-failure branch from f5b80e8 to 58dfa61 Compare August 8, 2023 12:35
@saubyk saubyk added the P1 MUST be fixed or reviewed label Aug 8, 2023
@saubyk saubyk removed this from the High Priority milestone Aug 8, 2023
@JssDWt
Copy link
Contributor

JssDWt commented Dec 29, 2023

Is this PR ready to be merged?
I'd love to use the functionality in this PR in Breez's htlc interceptor.

@JssDWt
Copy link
Contributor

JssDWt commented Jan 26, 2024

@saubyk Is this PR planned any time soon?

@saubyk
Copy link
Collaborator

saubyk commented Jan 26, 2024

@saubyk Is this PR planned any time soon?

Hi @JssDWt I've added this pr to 18's backlog. But at this point, uncertain if it will be included, as there is still a lot of work to get through to complete the in-flight work. Will make a determination in another week or so, if it will stay in the scope or moved to a minor release

@saubyk saubyk added this to the v0.18.1 milestone Feb 6, 2024
@saubyk saubyk modified the milestones: v0.18.1, 0.19.0 Mar 21, 2024
@saubyk saubyk modified the milestones: v0.18.1, 0.19.0 May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interceptors P1 MUST be fixed or reviewed
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

9 participants