[Proposal] Rewrite Path and Dynamic Endpoint - Resolve Conflicting Policies #1298
Replies: 3 comments
-
Safeguard: Prevent Ext Proc from Modifying System HeadersSince the ext proc filter is only responsible for setting the - name: envoy.filters.http.ext_proc
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor
grpc_service:
envoy_grpc:
cluster_name: ext-proc-server
processing_mode:
request_header_mode: SEND
mutation_rules:
disallow_system: true
disallow_is_error: trueWith Additionally, the policy engine should validate if a policy is trying to update system headers like |
Beta Was this translation helpful? Give feedback.
-
Problem Explanation with ExampleScenario 1: Without Dynamic Endpoint — No IssuesUpstream: http://httpbin.org/anything
Routes:
- GET /abc:
- timeout = 50s
- if headers[env] == "dev" → path = /abc/foo
- if headers[env] == "prod" → path = /abc/bar
- substitutePath /abc → /anything
- GET /abc/foo:
- timeout = 5s
- substitutePath /abc/foo → /anything
- GET /abc/bar:
- timeout = 20s
- substitutePath /abc/bar → /anythingA request
Everything works as expected. The rewritten path is only used for forwarding to upstream — it does not affect route matching. Scenario 2: With Dynamic Endpoint — The ConflictNow a Dynamic Endpoint action is added to change the upstream cluster: Upstreams:
- name: httpbin
url: http://httpbin.org/anything
- name: httpbin2
url: http://httpbin.org/anything2
Routes:
- GET /abc:
- timeout = 50s
- if headers[env] == "dev" → path = /abc/foo
- if headers[env] == "prod" → path = /abc/bar
- substitutePath /abc → /anything
- upstream = httpbin2 ### SETTING a Dynamic Endpoint here ..........
- GET /abc/foo:
- timeout = 5s
- substitutePath /abc/foo → /anything
- GET /abc/bar:
- timeout = 20s
- substitutePath /abc/bar → /anythingA request
What goes wrongWithin ext proc's
Envoy re-matches the route against
The request ends up with the wrong timeout (5s instead of 50s) and potentially wrong route-level settings. If the rewritten path doesn't match any route at all, the request gets a 404 Not Found. |
Beta Was this translation helpful? Give feedback.
-
|
Update |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
clear_route_cache: false(to avoid re-matching against the new path), while Dynamic Endpoint requiresclear_route_cache: true(to re-evaluate the cluster from thecluster_header). Both cannot coexist in a single ext proc filter.clear_route_cache: false) modifies:path/:methodwithout triggering re-evaluation.cluster_header: x-wso2-target-upstream-clusterandrequest_headers_to_removeto strip the internal header before forwarding upstream.decodeHeaders()triggers lazy route re-evaluation before the Lua script runs — at that point the original:pathis intact and the new cluster header is present.Demo Docker Compose Setup with Envoy + Ext Proc + Lua filters: envoy-external-processor-sample.zip
1. Problem
In the current implementation of the policy engine, both the Rewrite Path action and the Dynamic Endpoint action require different settings for
clear_route_cachein Envoy, which leads to a conflict when both actions are present in the same policy.1.1. Rewrite Path Action
The Rewrite Path action allows you to modify the path of incoming requests before they are forwarded to the upstream service. To accomplish this, we can change the header
:pathand we should not doclear_route_cacheon Envoy, otherwise the route cache will be cleared and the new path Envoy will reevaluate the route and may cause unexpected routing results.Example:
If a policy rewrites the path from
/api-1to/api-2, the request will be routed to the second route and the timeout will be 60 seconds. If the policy rewrites the path from/api-1to/foo, the request will not match any route and responds with 404 Not Found.Hence
clear_route_cacheshould be set tofalseto avoid unexpected routing results.1.2. Dynamic Endpoint Action
The Dynamic Endpoint action allows you to modify the upstream cluster of incoming requests so that they can be forwarded to different upstream services. We can use
cluster_headerto specify the header that contains the name of the target upstream cluster. But in this case we have to setclear_route_cachetotrueto clear out the already selected cluster and force Envoy to re-evaluate the route and select the new cluster specified in the header.So we can't achieve both actions at the same time because they have conflicting requirements for
clear_route_cache.2. What
clear_route_cacheDoesWhen a request arrives, Envoy resolves the route and caches both the matched route and cluster info. The
clear_route_cacheoperation wipes this cached state. The re-evaluation does not happen immediately at the point of clearing — it happens lazily when the next filter in the chain accesses the route (typically the router filter, or any intermediate filter that callsroute()).The re-evaluation is a full route matching from scratch: virtual host matching, route matching (prefix/path/regex/headers), and cluster resolution (including reading
cluster_header). All route-level settings (timeouts, retry policies, buffer limits, etc.) are also refreshed from the newly matched route.Check section 6.1 below for more details.
3. Solution
This solution requires
cluster_header(x-wso2-target-upstream-cluster) to be configured in the route section, which tells Envoy to read the target cluster name from a request header instead of a static cluster assignment. With this in place, an ext proc filter followed by a Lua filter is used to handle both actions. Each filter is responsible for the mutations that match itsclear_route_cacherequirement:clear_route_cachetotrueon every request. This is required becausecluster_headeris evaluated during route matching — before any HTTP filter runs, so the initial route always has an empty cluster name. The ext proc filter must always set the cluster header and clear the route cache for the cluster to resolve correctly.:pathand/or:methodheaders without clearing the route cache (clear_route_cache: false). This is necessary because clearing the route cache after changing:pathwould cause Envoy to re-match the route against the new path, which may not match any route (404) or match a different route with different settings (wrong timeouts, retries, etc.).The cluster header (
x-wso2-target-upstream-cluster) is an internal routing header and should not be forwarded to the upstream service. Userequest_headers_to_removein the route configuration to strip it before the request is sent upstream.The ext proc filter communicates the desired path/method changes to the Lua filter via dynamic metadata, so the Lua filter applies them locally without a gRPC round-trip.
3.1. Ext Proc Filter
x-wso2-target-upstream-cluster) with the target cluster name and setsclear_route_cachetotrue. This is required becausecluster_headeris evaluated during initial route matching — before any HTTP filter runs. At that point the header is not present, so the initial cached route has an empty cluster name. The ext proc filter must always set this header and clear the route cache to trigger re-evaluation with the correct cluster.rewrite_pathorchange_method) to communicate the desired changes to the Lua filter. It does not modify:pathor:methoddirectly — that is delegated to the Lua filter to avoid the conflict withclear_route_cache.3.2. Lua Filter
clear_route_cache: falseto prevent route cache clearing on header modifications. This is critical — without this, the default Lua behavior would clear the route cache on every header modification, causing the new:pathto be used in route re-matching.:pathor:methodheader accordingly.clear_route_cacheis disabled, modifying headers does not trigger route re-evaluation, preserving the original route and all its settings (timeouts, retries, rate limits, etc.).3.3. Why This Works
The key insight is the lazy route re-evaluation behavior and the ordering of operations across the two filters:
Ext proc sets the cluster header and clears the route cache. The cache is wiped but no re-evaluation happens yet —
clearRouteCache()only setscached_route_to an empty optional.The Lua filter's
decodeHeaders()triggers route re-evaluation. At the start ofdecodeHeaders(), the Lua filter callsgetPerLuaCodeSetup()→resolveMostSpecificPerFilterConfig()→getRoute()→route(). Since the cache is empty, this triggersrefreshCachedRoute()— a full route re-match. At this point, the original:pathis still intact (the Lua script hasn't run yet) and the new cluster header is present (ext proc already set it). The route matches correctly with the proper cluster and all original route settings.The Lua script runs and modifies
:path/:method. Sinceclear_route_cacheisfalse, these header modifications do not clear the route cache. The correctly-resolved cached route from step 2 persists.The router filter uses the cached route. It sees a valid cached route with the correct cluster, original timeouts, and all other route settings intact. The modified
:pathis forwarded to the upstream as-is.sequenceDiagram participant EP as Ext Proc Filter participant LUA as Lua Filter<br/>(decodeHeaders) participant LS as Lua Script participant R as Router Filter Note over EP: cached_route_ = [old route]<br/>:path = "/api-1"<br/>cluster header = (empty) EP->>EP: Set cluster header = "my-svc" EP->>EP: clear_route_cache: true Note over EP: cached_route_ = [empty]<br/>:path = "/api-1"<br/>cluster header = "my-svc" EP->>LUA: Continue to next filter LUA->>LUA: getPerLuaCodeSetup()<br/>→ resolveMostSpecificPerFilterConfig()<br/>→ route() → cache empty<br/>→ refreshCachedRoute() Note over LUA: Route re-evaluated with<br/>original :path "/api-1" +<br/>new cluster header "my-svc"<br/>cached_route_ = [new route ✓] LUA->>LS: Run Lua script LS->>LS: Replace :path = "/new-path"<br/>(clear_route_cache: false → no cache clear) Note over LS: cached_route_ = [new route ✓]<br/>:path = "/new-path"<br/>cluster header = "my-svc" LS->>R: Continue to next filter R->>R: callbacks_->route()<br/>→ cache valid → use cached route Note over R: Routes to cluster "my-svc"<br/>with original route settings<br/>(timeouts, retries, etc.)<br/>:path "/new-path" forwarded upstreamCheck section 6.2 below for more details.
3.4. Why a Single Ext Proc Filter Cannot Achieve Both
Within a single ext proc
HeadersResponse, all header mutations are applied first, thenclearRouteCacheis called after (processor_state.cc#L175-L185):There is no way to split these within a single ext proc response. So if you set both the cluster header and change
:pathin the same response withclear_route_cache: true, the route re-evaluation will see the new:path, which may not match any route (404) or match a different route with different settings.4. Drawbacks
findVirtualHost), route entry matching (prefix/path/regex/header matching viagetRouteFromEntries), cluster specifier plugin invocation (clusterEntry), thread-local cluster info lookup (getThreadLocalCluster), and refreshing timeouts, idle/flush timeouts, buffer limits, and tracing. While these are all in-memory operations with no I/O, they do add CPU overhead on the request hot path for every request.x-wso2-target-upstream-clusterheader with the upstream cluster name on every request. Previously, the ext proc server could skip processing when no policies were applicable. Now the ext proc request header processing cannot be bypassed, as the route will have an empty cluster name without it.5. Alternatives
5.1. Remove Dynamic Endpoint Support from the Gateway at all
So only the existing ext proc filter is required.
Rejected: Dynamic Endpoint support is required.
5.2. Lua Cluster Specifier Plugin
The route matching is performed at the time the request headers are received. The same clearRouteCache requirement applies.
Rejected: This won't work because the Lua cluster specifier plugin (and
inline_cluster_specifier_pluginin general) is evaluated during route matching via the sameclusterEntry()code path ascluster_header. All cluster specifier plugins (cluster_header,inline_cluster_specifier_plugin,cluster_specifier_plugin,weighted_clusters) are assigned to the samecluster_specifier_plugin_field on the route and invoked at route resolution time. The ext proc filter hasn't run yet when route matching happens, so the header value is not present. Check section 6.3 below for more details.5.3. Change Cluster Directly - Write CPP Extension
Write a cpp extension to directly set the cluster based on a dynamic metadata set from the ext proc filter, which will not require to set
clear_route_cacheto true and avoid the conflict between rewrite path and dynamic endpoint actions.This would use the
setRoute()API available on filter callbacks to wrap the current route in aDynamicRouteEntrythat only overridesclusterName(), preserving all other route settings (timeouts, retries, rate limits, etc.) without any route re-matching.Check section 6.4 below for more details.
Rejected: This approach requires writing a custom CPP extension, which adds complexity and maintenance overhead.
6. Envoy Code Analysis - Detailed Explanation
6.1. Clear Route Cache and Reevaluate Route - Envoy Code Analysis
6.1.1.
clearRouteCache()— Wipe the cached stateconn_manager_impl.cc#L2411-L2420:
This sets
cached_route_to an empty optional (has_value() == false). No route matching happens here — it only wipes the cache.6.1.2. Lazy re-resolution when
route()is calledconn_manager_impl.cc#L2287-L2294:
Since
clearRouteCache()setcached_route_to an empty optional,has_value()returnsfalse, triggeringrefreshCachedRoute(). The same lazy pattern applies toclusterInfo()at conn_manager_impl.cc#L2278-L2286.6.1.3.
refreshCachedRoute()— Full route matching from scratchconn_manager_impl.cc#L1778-L1800:
snapped_route_config_->route()runs the entire route matching pipeline again against the current state of the request headers.6.1.4.
cluster_headeris read during route matchingDuring route matching, once the path/headers match a route entry,
clusterEntry()is called — config_impl.cc#L1320-L1327:For routes configured with
cluster_header, the plugin isHeaderClusterSpecifierPlugin— header_cluster_specifier.cc#L11-L24:The cluster name is read from the header and baked into a
DynamicRouteEntryas aconst std::string— delegating_route_impl.h#L143-L152:6.1.5.
setVirtualHostRoute()— Cache the new route + cluster infoconn_manager_impl.cc#L2313-L2344:
6.1.6. What gets re-evaluated
snapped_route_config_->route()re-matches vhostcluster_header)HeaderClusterSpecifierPlugin::route()re-reads the headergetThreadLocalCluster(clusterName())looks up the new clusterrefreshDurationTimeout(),refreshIdleAndFlushTimeouts()refreshBufferLimit()6.2. Route Re-evaluation in
decodeHeaders()Stage - Envoy Code AnalysisThe route re-evaluation after
clearRouteCache()does not happen immediately. It is triggered lazily when the next filter in the chain accesses the route during itsdecodeHeaders()stage. This is a common pattern across most HTTP filters — not specific to ext proc or Lua.Most HTTP filters that support per-route configuration access the route at the very start of
decodeHeaders()to resolve per-route config overrides. This includes ext proc (mergePerRouteConfig()), Lua (getPerLuaCodeSetup()) and others. This route access triggers lazy re-evaluation if the route cache was previously cleared. Simple filters like health_check, ip_tagging, adaptive_concurrency, and grpc_stats do not access the route duringdecodeHeaders()and therefore would not trigger re-evaluation.Regardless of which filter triggers it, the common code path is through the filter manager's
getRoute()— filter_manager.cc#L300-L305:This calls
ActiveStream::route()which triggersrefreshCachedRoute()if the cache was cleared (as shown in section 6.1.2 above).This is the key mechanism that makes the ext proc + Lua solution work: the Lua filter's
decodeHeaders()accesses the route (viagetPerLuaCodeSetup()→resolveMostSpecificPerFilterConfig()→getRoute()) before the Lua script runs. At that point, the original:pathis still intact and the new cluster header is present, so the route re-evaluates correctly. Then the Lua script modifies:pathwithout clearing the cache, preserving the correctly-resolved route.If no intermediate filter accesses the route, the router filter's
decodeHeaders()will trigger re-evaluation as a last resort — router.cc#L471:6.3. Lua Cluster Specifier Plugin - Envoy Code Analysis
All cluster specifier plugins (
cluster_header,inline_cluster_specifier_plugin,cluster_specifier_plugin,weighted_clusters) are assigned to the samecluster_specifier_plugin_field on the route entry and share the same code path — config_impl.cc#L609-L627:All of these are called through the same
clusterEntry()at route matching time — config_impl.cc#L1320-L1327:The plugin's
route()method receives the request headers at route matching time. Since route matching occurs before any HTTP filter runs, any header value that an ext proc filter would set is not yet present. Usinginline_cluster_specifier_pluginwith a Lua cluster specifier (or any other plugin) does not change this fundamental timing issue — the sameclearRouteCacherequirement applies.6.4. setRoute() — Directly Set a Route with the Desired Cluster
The
setRoute()API is available on filter callbacks — filter.h#L350:A custom C++ filter could get the current route and wrap it in a
DynamicRouteEntrywith the desired cluster name:DynamicRouteEntrydelegates everything to the original route exceptclusterName()— delegating_route_impl.h#L143-L152:This replaces the cached route directly — no route re-matching at all. All original route settings (timeouts, retry policies, rate limits, CORS, hash policies, etc.) are preserved. A route carries much more than just a cluster name — the
RouteEntryinterface defines over 30 route-level settings (router.h#L916-L1175).However, no existing HTTP filter (ext proc, Lua, or any other) calls
setRoute()today. Using this approach would require writing a custom C++ HTTP filter.There is also
refreshRouteCluster()available as a lighter-weight alternative toclearRouteCache()— filter.h#L386. It refreshes only the cluster without re-matching the route. However,cluster_header'sDynamicRouteEntrystores the cluster name asconst std::stringand the baseRouteEntryImplBase::refreshRouteCluster()is a no-op (config_impl.h#L591-L592), so this approach would not work withcluster_headerwithout modifying Envoy core.Beta Was this translation helpful? Give feedback.
All reactions