Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream_proxy_protocol: Introduce custom TLV support #37591

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

timflannagan
Copy link

@timflannagan timflannagan commented Dec 9, 2024

Commit Message:

This commit introduces support for injecting custom TLVs into the Proxy Protocol v2 (PP2) header for upstream transport sockets. This enables xDS control planes to build upstream PP2 headers with greater flexibility. Previously, upstream PP2 headers only passed through TLVs from downstream connections when using the Proxy Protocol listener, limiting customization.

With this change, users can define custom TLVs by configuring the custom_tlvs field in the upstream_proxy_protocol transport socket config, or specifying host metadata in a well-known namespace in order to provide dynamic and more granular control over PP2 header content. For example:

      transport_socket:
        name: envoy.transport_sockets.upstream_proxy_protocol
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.proxy_protocol.v3.ProxyProtocolUpstreamTransport
          config:
            version: V2
            added_tlvs:
              - type: 150
                value: Zm9v
              - type: 151
                value: YmFy

And

clusters:
      - name: httpbin
        load_assignment:
          ...
          endpoints:
          - lbEndpoints:
            - metadata:
                filter_metadata:
                  envoy.transport_socket_match:
                    outbound-proxy: true
                typed_filter_metadata:
                  envoy.transport_sockets.proxy_protocol:
                    "@type": type.googleapis.com/envoy.config.core.v3.ProxyProtocolConfig
                    added_tlvs:
                      - type: 0xD7
                        value: b3Zy
                      - type: 0xD8
                        value: bmV3
               ...

By decoupling upstream PP2 customization from downstream listener config, this unlocks more flexible use cases for Proxy Protocol in upstream paths.

Additional Description: N/A
Risk Level: Low
Testing: Unit & integration
Docs Changes: Include in the protobuf docs
Release Notes: Included in the changelog as a new feature
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue] #18520
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Copy link

Hi @timflannagan, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #37591 was opened by timflannagan.

see: more, trace.

Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #37591 was opened by timflannagan.

see: more, trace.

@timflannagan
Copy link
Author

For more discussion on some of the implementation choices, see timflannagan#1 for an initial round of review. Also, I haven't written C++ in a while so I'm happy to make any style-related changes.

@timflannagan timflannagan changed the title Introduce custom TLV support for upstream PP2 headers proxy_protocol: Introduce custom TLV support for upstream PP2 headers Dec 9, 2024
@timflannagan
Copy link
Author

Looks like integration tests are failing. I'll fix that tomorrow. At a glance, the assertions are expecting a certain ordering for the custom TLVs that are being defined, and there's deterministic output issues.

This commit introduces support for injecting custom TLVs into the Proxy
Protocol v2 (PP2) header for upstream transport sockets. This enables xDS
control planes to build upstream PP2 headers with greater flexibility.

Previously, upstream PP2 headers only passed through TLVs from downstream
connections when using the Proxy Protocol listener, limiting customization.

With this change, users can define custom TLVs by specifying host metadata
in a well-known namespace, providing dynamic, granular control over PP2
header content. For example:

```yaml
clusters:
      - name: httpbin
        load_assignment:
          ...
          endpoints:
          - lbEndpoints:
            - metadata:
                filter_metadata:
                  envoy.transport_socket_match:
                    outbound-proxy: true
                typed_filter_metadata:
                  envoy.transport_sockets.proxy_protocol:
                    "@type": type.googleapis.com/envoy.extensions.transport_sockets.proxy_protocol.v3.CustomTlvMetadata
                    entries:
                      - type: 0x96
                        value: Zm9v # foo
                      - type: 0x97
                        value: YmFy # bar
               ...

```

By decoupling upstream PP2 customization from downstream listener config,
this unlocks more flexible use cases for Proxy Protocol in upstream paths.

Earlier approaches considered extending upstream_proxy_protocol to
support TLV configuration but were rejected due to added control plane
complexity. Similarly, reusing the envoy.network.proxy_protocol_options
namespace was evaluated but required significant refactoring and risk.

Signed-off-by: timflannagan <timflannagan@gmail.com>
@timflannagan timflannagan force-pushed the feat/custom-pp2-upstream-tlvs branch from ba3c628 to 09d7ce1 Compare December 10, 2024 18:42
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

@ggreenway ggreenway self-assigned this Dec 10, 2024
Signed-off-by: timflannagan <timflannagan@gmail.com>
changelogs/current.yaml Outdated Show resolved Hide resolved
@timflannagan
Copy link
Author

Looks like the format check is failing:

ERROR: From ./source/extensions/transport_sockets/proxy_protocol/proxy_protocol.cc
ERROR: ./source/extensions/transport_sockets/proxy_protocol/proxy_protocol.cc:217: Don't use UnpackTo() directly, use MessageUtil::unpackToNoThrow() instead
ERROR: check format failed. diff has been applied'

I don't see that function defined anywhere, but I see several PRs that seem relevant here and I think I can figure out how to silence that linting violation. Does that linting rule need to be updated as well?

@mattklein123
Copy link
Member

/api lgtm

mattklein123
mattklein123 previously approved these changes Dec 12, 2024
@timflannagan
Copy link
Author

@ggreenway Can you take another pass when you get a second? I need to look into getting the codecov check green, but the rest of the CI checks looked good last run.

Signed-off-by: timflannagan <timflannagan@gmail.com>
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

api/envoy/config/core/v3/proxy_protocol.proto Outdated Show resolved Hide resolved
api/envoy/config/core/v3/proxy_protocol.proto Outdated Show resolved Hide resolved
api/envoy/config/core/v3/proxy_protocol.proto Outdated Show resolved Hide resolved
@@ -52,6 +53,9 @@ class UpstreamProxyProtocolSocket : public TransportSockets::PassthroughSocket,
const UpstreamProxyProtocolStats& stats_;
const bool pass_all_tlvs_;
absl::flat_hash_set<uint8_t> pass_through_tlvs_{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to this PR, but instead of this (and your new config) being copied by value here, the normal pattern is to hold a shared_ptr to a Config object that has already converted the protobuf config to the internal represenation, and done validation such as no duplicates.

Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

api/envoy/config/core/v3/proxy_protocol.proto Outdated Show resolved Hide resolved
std::vector<Envoy::Network::ProxyProtocolTLV> custom_tlvs = {
{0x8, std::vector<unsigned char>(65536, 'a')},
};
EXPECT_LOG_CONTAINS("warn", "Generating Proxy Protocol V2 header: TLVs exceed length limit 65535",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was probably unreachable before this change, but now a bad config could result in a connection not generating a PP header at all. I think this case needs to be handled better, probably with the connection being aborted, or alternately generate a PP header omitting the TLV that is too long. cc @botengyao who added the TLV passthrough code.

Copy link
Author

@timflannagan timflannagan Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this becomes more problematic with these changes. I think I'd lean towards truncating the custom (and passthrough) TLVs for the upstream PP2 header. At a minimum, I think we'd need to log and possible document this behavior so users aren't caught over guard. cc @yuval-k in case you have any thoughts as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +142 to 155
// Filter out TLVs that would exceed the 65535 limit.
uint64_t extension_length = 0;
bool skipped_tlvs = false;
for (auto&& tlv : combined_tlv_vector) {
uint64_t new_size = extension_length + PROXY_PROTO_V2_TLV_TYPE_LENGTH_LEN + tlv.value.size();
if (new_size > std::numeric_limits<uint16_t>::max()) {
ENVOY_LOG_MISC(warn, "Skipping TLV type {} because adding it would exceed the 65535 limit.",
tlv.type);
skipped_tlvs = true;
continue;
}
extension_length = new_size;
final_tlvs.push_back(tlv);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncated custom or passthrough TLVs that exceed the max length in ad46d23. There were some header unit tests that were failing as a result as

if (!Common::ProxyProtocol::generateV2Header(options, header_buffer_, pass_all_tlvs_,
pass_through_tlvs_, custom_tlvs)) {
// There is a warn log in generateV2Header method.
stats_.v2_tlvs_exceed_max_length_.inc();
expects this function to return false to increment the stats.

I went back and forth on whether returning false here would be ambiguous when writing the output header was successful, but I didn't want to extend the function signature to return (bool written, bool skipped_tlvs) to handle this edge case at the call sites.

@phlax
Copy link
Member

phlax commented Jan 22, 2025

ping @ggreenway i think this is awaiting further review

@timflannagan requires main merge

// - type: 0xD8
// value: bmV3
//
// **Precedence behavior**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if this should be a header

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any examples on how to do that? I tried following https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections, but ci/do_ci.sh docs was failing when I ran it and I can't seem to find where the generated files live on my local box.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated files live in bazel ether - unless you build them directly (eg bazel build //docs:rst inside the container)

you can view the rendered docs eg here:

https://storage.googleapis.com/envoy-pr/3122ab3/docs/api-v3/config/core/v3/proxy_protocol.proto.html#envoy-v3-api-msg-config-core-v3-proxyprotocolpassthroughtlvs

thinking more about my heading suggestion, im wondering if we should move some of this to a reference page and link it - eg the config example

i tried setting the "heading" to h5 - it kinda works but probs we shouldnt have headings inside the descriptions - they should be relatively terse

ill comment on the config now ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phlax I like the reference page idea looking more at the current docset. Is this something we can tackle as a follow-up or is this a blocker in your mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker - but please at least fix the config indents

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! really appreciated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the indentation issues in b343e67. I also took a stab at the literalinclude approach in 7c5d4ed after mirroring the examples from #35532.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - using literalinclude is what is super appreciated - its much more maintainable

probs we should add some docs somewhere - i have been pondering a plan to work on the tech debt here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the reference docs or some high-level field documentation in the prot is still worth chasing -- I just have limited capacity right now. At a minimum, I think it would be good to have a header that explains pass_through_tlvs & added_tlvs and their interlope as there's some nuance in their behavior.

// The PROXY protocol transport socket allows carrying connection metadata (such as
// the original source IP address) to upstream services in either a human-readable
// (Version 1) or binary (Version 2) header.
//
// Behavior
// ========
// When the PROXY protocol transport socket is configured, Envoy can prepend a PROXY
// protocol header to connections going upstream. Depending on the protocol version,
// additional metadata such as custom Type-Length-Value (TLV) entries may also be sent.
//
// Configuration
// =============
// .. code-block:: yaml
//
//   transport_socket:
//     name: envoy.transport_sockets.proxy_protocol
//     typed_config:
//       "@type": type.googleapis.com/envoy.config.core.v3.ProxyProtocolConfig
//       version: V2
//       pass_through_tlvs:
//         match_type: INCLUDE_ALL
//
// Passing & Adding TLVs
// ---------------------
// - *pass_through_tlvs*:
//   Controls which TLVs from the downstream PROXY protocol request are forwarded
//   to the upstream connection. 
// - *added_tlvs*:
//   Defines new TLVs to add into the header for the upstream request. TLVs defined
//   at a host level take precedence over those defined at the transport socket level.

And then the individual added_tlvs field documentation could introduce the two config options and the precedence behavior. I'm not sure what's best w.r.t organization so I stopped messing around with it locally.

@phlax
Copy link
Member

phlax commented Jan 22, 2025

re unpackToNoThrow ...

I don't see that function defined anywhere, but I see several PRs that seem relevant here and I think I can figure out how to silence that linting violation. Does that linting rule need to be updated as well?

it would seem it does

cc @tyxia @alyssawilk

@alyssawilk
Copy link
Contributor

yeah looks like that rule should have been updated with https://github.com/envoyproxy/envoy/pull/32775/files

#38138 should fix

Signed-off-by: timflannagan <timflannagan@gmail.com>
@phlax
Copy link
Member

phlax commented Jan 22, 2025

/docs

Copy link

Docs for this Pull Request will be rendered here:

https://storage.googleapis.com/envoy-pr/37591/docs/index.html

The docs are (re-)rendered each time the CI envoy-presubmit (precheck docs) job completes.

🐱

Caused by: a #37591 (comment) was created by @phlax.

see: more, trace.

Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for configs - we strongly encourage using a literalinclude with a full bootstrap yaml, as these are linted and tested variously for validity.

this avoids bitrot and is much more helpful for end users

we also enforce least indent on yaml formatting which is missed in code-block s

if we shift the example to a reference page its ~easier, but it can be done for api also

api/envoy/config/core/v3/proxy_protocol.proto Outdated Show resolved Hide resolved
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants