forked from linkerd/linkerd2
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable 2.12.5+plaid 1 :: [PATCH] Fix bug where topology routing would not disable while service was under load. #191
Draft
jandersen-plaid
wants to merge
43
commits into
main
Choose a base branch
from
stable-2.12.5+plaid-1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## stable-2.12.2 This stable release fixes an issue with CNI chaining that was preventing the Linkerd CNI plugin from working with other CNI plugins such as Cilium. It also fixes some sections of the Viz dashboard appearing blank, and adds an optional PodMonitor resource to the Helm chart to enable easier integration with the Prometheus Operator. Several other fixes are included. * Proxy * Fixed proxies emitting some duplicate inbound metrics * Control Plane * Fixed handling of `.conf` files in the CNI plugin so that the Linkerd CNI plugin can be used alongside other CNI plugins such as Cilium * Added a noop init container to injected pods when the CNI plugin is enabled to prevent certain scenarios where a pod can get stuck without an IP address * Fixed the `NotIn` label selector operator in the policy resources being erroneously treated as `In`. * Fixed a bug where the`config.linkerd.io/proxy-version` annotation could be empty * CLI * Added a `linkerd diagnostics policy` command to inspect Linkerd policy state * Added a check that ClusterIP services are in the cluster networks * Expanded the `linkerd authz` command to display AuthorizationPolicy resources that target namespaces (thanks @aatarasoff!) * Fixed warning logic in the "linkerd-viz ClusterRoles exist" and "linkerd-viz ClusterRoleBindings exist" checks in `linkerd viz check` * Fixed the CLI ignoring the `--api-addr` flag (thanks @mikutas!) * Helm * Added an optional PodMonitor resource to the main Helm chart (thanks @jaygridley!) * Dashboard * Fixed the dashboard sections Tap, Top, and Routes appearing blank (thanks @MoSattler!) * Updated Grafana dashboards to use variable duration parameter so that they can be used when Prometheus has a longer scrape interval (thanks @TarekAS)
) Currently, the `noop` init container created by the Linkerd CNI plugin causes issues when a workload with ```yaml securityContext: runAsNonRoot: true ``` is injected, since it will add a container that runs as root to that workload. This branch resolves this issue by changing the Helm template for the noop init container to use the same user as the `proxyInit` init container. I've tested this by injecting a deployment with the above `securityContext` configuration and verifying that the `noop` init container is now allowed to run. This PR is against the `release/stable-2.12` branch, as the `noop` init container has been removed on the edge branch (as it was replaced with the CNI validator init container). Fixes linkerd#9671
… no ClusterIP (linkerd#9662) Fixes linkerd#9661 This excludes any service with no ClusterIP from this check, which includes the services of type ExternalName.
Signed-off-by: Steve Jenson <stevej@buoyant.io> Signed-off-by: Steve Jenson <stevej@buoyant.io>
When installing the multicluster extension through the CLI, the gateway's `pause` container `runAsUser` field is empty. K8s then uses the UID defined in the `pause` image, which is [65535](https://github.com/kubernetes/kubernetes/blob/master/build/pause/Dockerfile#L19). The source of the problem is that the `gateway.UID` values.yaml entry isn't backed by an entry in the multicluster `values.go`'s `Gateway` struct. How to repro: ```bash # before the fix $ linkerd mc install --ignore-cluster | grep runAsUser runAsUser: # after the fix $ linkerd mc install --ignore-cluster | grep runAsUser runAsUser: 2103 ```
…linkerd#9575) Having the proxyProtocol listed as HTTP/1 in the multicluster gateway Server is confusing because this value is actually unused in the case of multicluster (since Linkerd wraps all multicluster traffic in its own opaque transport protocol). We delete the proxyProtocol line here altogether (unknown is the default) to invite the least confusion. Fixes linkerd#9574 Signed-off-by: Peter Smit <peter.smit@inscripta.io>
https://github.com/linkerd/linkerd2/blob/main/web/app/index_bundle.js.lodash.tmpl#L4-L17 Some browser plugins will insert script tags in HTML page, resulting in wrong root paths Fixes: linkerd#9438 Signed-off-by: Ye Sijun <junnplus@gmail.com>
Fix upgrade when using --from-manifests When the `--from-manifests` flag is used to upgrade through the CLI, the kube client used to fetch existing configuration (from the ConfigMap) is a "fake" client. The fake client returns values from a local source. The two clients are used interchangeably to perform the upgrade; which one is initialized depends on whether a value has been passed to `--from-manifests`. Unfortunately, this breaks CLI upgrades to any stable-2.12.x version when the flag is used. Since a fake client is used, the upgrade will fail when checking for the existence of CRDs, even if the CRDs have been previously installed in the cluster. This change fixes the issue by first initializing an actual Kubernetes client (that will be used to check for CRDs). If the values should be read from a local source, the client is replaced with a fake one. Since this takes place after the CRD check, the upgrade will not fail on the CRD precondition. Fixes linkerd#9788 Signed-off-by: Matei David <matei@buoyant.io>
When calling `linkerd upgrade, if the `linkerd-config-overrides` Secret is not found then we ask the user to run `linkerd repair`, but that has long been removed from the CLI. Also removed code comment as the error is explicit enough.
* Use self-hosted runner for ARM64 integration tests This refactors the "ARM64 integration tests" job in `relase.yaml` to use an ARM self-hosted runner tagged with `[self-hosted, Linux, ARM64]`, tied to the linkerd github org. We no longer use a local (linux/x86_64) linkerd CLI that connects to an existing k3s instance in the host. Instead, we run the CLI ARM64 binary in the host itself, after creating the cluster with k3d (which gets always torn down at the end of the tests regardless of their success). Please check the "ARM CI host at Equinix Metal" doc in Notion for the host setup. ## Other Changes - The cni test was removed. - Replaced `"$bindir"/docker` with just `docker` in `bin/image-load` as we do elsewhere. - Properly detect k3d arch in `bin/k3d`
The problem was our `TAG` environment variable (set to `edge-22.11.2`) which was conflicting with an env var of the same name in the k3d install.sh script.
…e network (linkerd#9819) Maps the request port to the container's port if the request comes in from the node network and has a hostPort mapping. Problem: When a request for a container comes in from the node network, the node port is used ignoring the hostPort mapping. Solution: When a request is seen coming from the node network, get the container Port from the Spec. Validation: Fixed an existing unit test and wrote a new one driving GetProfile specifically. Fixes linkerd#9677 Signed-off-by: Steve Jenson <stevej@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
This change aims to solve two distinct issues that have cropped up in the proxy-init configuration. First, it decouples `allowPrivilegeEscalation` from running proxy-init as root. At the moment, whenever the container is run as root, privilege escalation is also allowed. In more restrictive environments, this will prevent the pod from coming up (e.g security policies may complain about `allowPrivilegeEscalation=true`). Worth noting that privilege escalation is not necessary in many scenarios since the capabilities are passed to the iptables child process at build time. Second, it introduces a new `privileged` value that will allow users to run the proxy-init container without any restrictions (meaning all capabilities are inherited). This is essentially the same as mapping root on host to root in the container. This value may solve issues in distributions that run security enhanced linux, since iptables will be able to load kernel modules that it may otherwise not be able to load (privileged mode allows the container nearly the same privileges as processes running outside of a container on a host, this further allows the container to set configurations in AppArmor or SELinux). Privileged mode is independent from running the container as root. This gives users more control over the security context in proxy-init. The value may still be used with `runAsRoot: false`. Fixes linkerd#9718 Signed-off-by: Matei David <matei@buoyant.io>
Fixes linkerd#9896 The maps in `endpointTranslator` weren't being guarded against concurrent access, so we're adding locks at the `Add` and `Remove` methods. Also these functions ultimately call the `SendMsg` method on the gRPC `stream`, which is not ["thread-safe"](https://github.com/grpc/grpc-go/blob/master/stream.go#L122-L126), so we're guarding against other problems as well. A new unit test `TestConcurrency` was added that failed in the following ways before this fix: When running the test with the `-race` flag, we immediately get the data race warning: ```bash $ go test ./controller/api/destination/... -run TestConcurrency -race time="2022-11-25T16:48:52-05:00" level=info msg="waiting for caches to sync" time="2022-11-25T16:48:52-05:00" level=info msg="caches synced" ================== WARNING: DATA RACE Read at 0x00c0000c0040 by goroutine 161: github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:80 +0x29c github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x92 Previous write at 0x00c0000c0040 by goroutine 162: github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).sendFilteredUpdate() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:95 +0x66 github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:83 +0x330 github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x92 Goroutine 161 (running) created at: github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x6f testing.tRunner() /usr/local/go/src/testing/testing.go:1439 +0x213 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1486 +0x47 Goroutine 162 (running) created at: github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x6f testing.tRunner() /usr/local/go/src/testing/testing.go:1439 +0x213 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1486 +0x47 ``` If run without the `-race` flag, we get the `concurrent map writes` panic reported in linkerd#9896: ```bash $ go test ./controller/api/destination/... -run TestConcurrency -count=1 time="2022-11-25T16:53:25-05:00" level=info msg="waiting for caches to sync" time="2022-11-25T16:53:25-05:00" level=info msg="caches synced" fatal error: concurrent map writes goroutine 187 [running]: runtime.throw({0x1b57bc4?, 0x500000000000000?}) /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00013dc80 sp=0xc00013dc50 pc=0x43a5b1 runtime.mapassign(0xc00013dec8?, 0x2?, 0x0?) /usr/local/go/src/runtime/map.go:595 +0x4d6 fp=0xc00013dd00 sp=0xc00013dc80 pc=0x4113b6 github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add(...) /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:80 github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1() /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x1a8 fp=0xc00013dfe0 sp=0xc00013dd00 pc=0x16d1da8 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00013dfe8 sp=0xc00013dfe0 pc=0x46d721 created by github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency /home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x3c ```
…nkerd#9918) When performing the HostPort mapping introduced in linkerd#9819, the `containsIP` iterates through the pod IPs searching for a match against `targetIP` using `ip.String()`, but that returns something like `&PodIP{IP: xxx}`. Fixed that to just use `ip.IP`, and also completed the text fixtures to include both `PodIP` and `PodIPs` in the pods manifests. Note this wasn't affecting the end result, it was just producing an extra warning as shown below, that this change eliminates: ```bash $ go test -v ./controller/api/destination/... -run TestGetProfiles === RUN TestGetProfiles ... === RUN TestGetProfiles/Return_profile_with_endpoint_when_using_pod_DNS time="2022-11-29T09:38:48-05:00" level=info msg="waiting for caches to sync" time="2022-11-29T09:38:49-05:00" level=info msg="caches synced" time="2022-11-29T09:38:49-05:00" level=warning msg="unable to find container port as host (172.17.13.15) matches neither PodIP nor HostIP (&Pod{ObjectMeta:{pod-0 ns 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[linkerd.io/control-plane-ns:linkerd] map[] [] [] []},Spec:PodSpec{Volumes:[]Volume{},Containers:[]Container{},RestartPolicy:,TerminationGracePeriodSeconds:nil,ActiveDeadlineSeconds:nil,DNSPolicy:,NodeSelector:map[string]string{},ServiceAccountName:,DeprecatedServiceAccount:,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:nil,ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:nil,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,HostUsers:nil,},Status:PodStatus{Phase:Running,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:172.17.13.15,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},})" test=TestGetProfiles/Return_profile_with_endpoint_when_using_pod_DNS ```
When CNI plugins run in ebpf mode, they may rewrite the packet destination when doing socket-level load balancing (i.e in the `connect()` call). In these cases, skipping `443` on the outbound side for control plane components becomes redundant; the packet is re-written to target the actual Kubernetes API Server backend (which typically listens on port `6443`, but may be overridden when the cluster is created). This change adds port `6443` to the list of skipped ports for control plane components. On the linkerd-cni plugin side, the ports are non-configurable. Whenever a pod with the control plane component label is handled by the plugin, we look-up the `kubernetes` service in the default namespace and append the port values (of both ClusterIP and backend) to the list. On the initContainer side, we make this value configurable in Helm and provide a sensible default (`443,6443`). Users may override this value if the ports do not correspond to what they have in their cluster. In the CLI, if no override is given, we look-up the service in the same way that we do for linkerd-cni; if failures are encountered we fallback to the default list of ports from the values file. Closes linkerd#9817 Signed-off-by: Matei David <matei@buoyant.io>
* build(deps): bump actions/checkout from 3.0.2 to 3.1.0 (linkerd/linkerd2-proxy#1951) * build(deps): bump arbitrary from 1.1.4 to 1.1.7 (linkerd/linkerd2-proxy#1953) * build(deps): bump tj-actions/changed-files from 29.0.9 to 32.0.0 (linkerd/linkerd2-proxy#1952) * build(deps): bump tj-actions/changed-files from 32.0.0 to 32.1.2 (linkerd/linkerd2-proxy#1958) * build(deps): bump tokio-stream from 0.1.9 to 0.1.11 (linkerd/linkerd2-proxy#1954) * build(deps): bump libfuzzer-sys from 0.4.3 to 0.4.5 (linkerd/linkerd2-proxy#1960) * build(deps): bump anyhow from 1.0.64 to 1.0.65 (linkerd/linkerd2-proxy#1955) * dev: Update to dev:v32 with Rust 1.64 (linkerd/linkerd2-proxy#1961) * build(deps): bump actions/download-artifact from 3.0.0 to 3.0.1 (linkerd/linkerd2-proxy#1962) * build(deps): bump prettyplease from 0.1.19 to 0.1.21 (linkerd/linkerd2-proxy#1963) * build(deps): bump bumpalo from 3.11.0 to 3.11.1 (linkerd/linkerd2-proxy#1965) * build(deps): bump actions/checkout from 3.0.2 to 3.1.0 (linkerd/linkerd2-proxy#1968) * build(deps): bump actions/upload-artifact from 3.1.0 to 3.1.1 (linkerd/linkerd2-proxy#1966) * build(deps): bump tj-actions/changed-files from 32.1.2 to 34.1.1 (linkerd/linkerd2-proxy#1972) * build(deps): bump lock_api from 0.4.8 to 0.4.9 (linkerd/linkerd2-proxy#1976) * build(deps): bump unicode-normalization from 0.1.21 to 0.1.22 (linkerd/linkerd2-proxy#1977) * build(deps): bump extractions/setup-just from 1.4.0 to 1.5.0 (linkerd/linkerd2-proxy#1974) * build(deps): bump tj-actions/changed-files from 34.1.1 to 34.3.2 (linkerd/linkerd2-proxy#1975) * build(deps): bump tj-actions/changed-files from 34.3.2 to 34.3.4 (linkerd/linkerd2-proxy#1978) * build(deps): bump rustls from 0.20.6 to 0.20.7 (linkerd/linkerd2-proxy#1979) * build(deps): bump tonic-build from 0.8.0 to 0.8.2 (linkerd/linkerd2-proxy#1980) * build(deps): bump syn from 1.0.99 to 1.0.103 (linkerd/linkerd2-proxy#1981) * build(deps): bump smallvec from 1.9.0 to 1.10.0 (linkerd/linkerd2-proxy#1982) * Bump hyper & h2 (linkerd/linkerd2-proxy#1983) * build(deps): bump arbitrary from 1.1.7 to 1.2.0 (linkerd/linkerd2-proxy#1984) * build(deps): bump num_cpus from 1.13.1 to 1.14.0 (linkerd/linkerd2-proxy#1985) Signed-off-by: Oliver Gould <ver@buoyant.io>
Closes linkerd#10162. This adds resource limits to the `noop` initContainer which will allow users who require resource quotas to have a more seamless upgrade experience for stable 2.12 patches. I chose the current values by halving the current resource limits of the `proxy-init` initContainer; the `noop` initContainer basically does nothing so we shouldn't run into issues with those limits. The `noop` initContainer is replaced by the proxy-validator container in the current edge releases, so this is a temporary fix that will allow users to upgrade through the stable 2.12 patches. For this reason, I didn't add additional templating to make this configurable. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
The output of `linkerd viz tap` cli command has wrong values for latency/duration fields. This happens with default output format and with `-o wide` option, but works well with `-o json`, dashboard also shows proper values. The solution is to display duration with `AsDuration().Microseconds()`. Updated existing test + fixed couple of golden ones. Fixes: linkerd#9878 Signed-off-by: Oleg Vorobev <olegy2008@ya.ru>
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes linkerd#10043 Signed-off-by: Joe Bowbeer <joe.bowbeer@gmail.com>
…kerd#10013) Fixes linkerd#10003 When endpoints are removed from an EndpointSlice resource, the destination controller builds a list of addresses to remove. However, if any of the removed endpoints have a Pod as their targetRef, we will attempt to fetch that pod to build the address to remove. If that pod has already been removed from the informer cache, this will fail and the endpoint will be skipped in the list of endpoints to be removed. This results in stale endpoints being stuck in the address set and never being removed. We update the endpoint watcher to construct only a list of endpoint IDs for endpoints to remove, rather than fetching the entire pod object. Since we no longer attempt to fetch the pod, this operation is now infallible and endpoints will no longer be skipped during removal. We also add a `TestEndpointSliceScaleDown` test to exercise this. Signed-off-by: Alex Leong <alex@buoyant.io>
…d#10071) Helm chart has `identity.externalCA` value. CLI code sets `identity.issuer.externalCA` and fails to produce the desired configuration. This change aligns everything to `identity.externalCA`. Signed-off-by: Dmitry Mikhaylov <anoxape@gmail.com>
Removed old `replace` directives in `go.mod` that are no longer required, and updated the entry for `containerd` to address [ CVE-2022-23471](https://github.com/linkerd/linkerd2/security/dependabot/37)
Fixes linkerd#10164 The version of go-restful that we depend on has been flagged as a security vulnerability. Even though this vulnerability does not affect Linkerd, we upgrade this dependency to silence security warnings. Signed-off-by: Alex Leong <alex@buoyant.io>
…0235) Fixes linkerd#10138 Evaluating Helm expressions like `.cpu.limit` will fail with a nil pointer dereference error if `.cpu` is nil. If, for example, `.memory` is set but `.cpu` is not, the resources template will be executed but will fail. We add parentheses to cause these expressions to be evaluated as a pipeline. If the input to a pipeline stage is an empty value (such as nil), no output will be emitted to the next stage of the pipeline. This allows for more graceful dereference chaining. For example, when evaluating `(.cpu).limit`, if `(.cpu)` is nil, the rendering engine will not try to evaluate `nil.limit` but instead will emit no output for this expression. Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes linkerd#10036 The Linkerd control plane components written in go serve liveness and readiness probes endpoint on their admin server. However, the admin server is not started until k8s informer caches are synced, which can take a long time on large clusters. This means that liveness checks can time out causing the controller to be restarted. We start the admin server before attempting to sync caches so that we can respond to liveness checks immediately. We fail readiness probes until the caches are synced. Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes linkerd#8270 When a listener unsubscribes to port updates in Servers, we were removing the listener for the `ServerWatcher.subscriptions` map, leaving the map's key (`podPort` with holds the pod object and port) with an empty value. In clusters where there's a lot of pod churn, those keys with empty values were getting accumulated, so this change cleans that up. The repro (basically constantly rolling emojivoto) is described in linkerd#9947. A followup will be up shortly adding metrics to track these metrics, along with similar missing metrics from other parts of Destination.
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.2 to 1.25.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…nkerd#10225) Github actions has upgraded from `docker buildx 0.9.1+azure-2` to `buildx 0.10.0+azure-1` which by default adds provenance attestation to manifests (https://github.com/docker/buildx/releases/tag/v0.10.0). This means that our platform specific images now contain multiple manifests because the attestation counts as a manifest: ```console > docker buildx imagetools inspect ghcr.io/linkerd/policy-controller:edge-23.1.2-amd64 --format "{{ json .Manifest.Manifests }}" [ { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:1abeb519e76c71c7285b4435a3f85dd73f9c1982905a5a2ca59e0abb279f09aa", "size": 1055, "platform": { "architecture": "amd64", "os": "linux" } }, { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:1254d52f1bd4ffd1c17688f39e01a6796b85bbcaf07bf83bdeb1c88ebe5b4657", "size": 566, "annotations": { "vnd.docker.reference.digest": "sha256:1abeb519e76c71c7285b4435a3f85dd73f9c1982905a5a2ca59e0abb279f09aa", "vnd.docker.reference.type": "attestation-manifest" }, "platform": { "architecture": "unknown", "os": "unknown" } } ] ``` This causes the creation of our multi-arch image to fail because the `docker manifest create` command expects each of the constituent images to contain a single manifest each. We set `--provenance=false` to skip adding the attestation manifest. Signed-off-by: Alex Leong <alex@buoyant.io>
A missing `\` caused the `--provenance` flag to be interpreted as a separate command rather than a continuation of the previous one. This caused the build action to fail. Add the missing `\` character. Signed-off-by: Alex Leong <alex@buoyant.io>
## stable-2.12.5 This stable release fixes an incompatibility issue with the AWS CNI addon in EKS that was forbidding pods to acquire networking after scaling up nodes (thanks @frimik!). It also includes security updates for dependencies. * Detached the linkerd-cni plugin's version from linkerd's and bumped to v1.1.1 to fix incompatibility with EKS' AWS CNI addon * Bumped the memory limit for the no-op init container to 25Mi to address issues on OKE environments * Updated `h2` dependency in the policy controller to include a patch for a theoretical denial-of-service vulnerability discovered in CVE-2023-26964 * Updated `openssl` dependency in the policy controller, addressing RUSTSEC-2023-0022, RUSTSEC-2023-0023 and RUSTSEC-2023-0024
…le service was under load. (github.com/linkerd#10925) Add support for enabling and disabling topology aware routing when hints are added/removed. The testing setup is very involved because it involves so many moving parts 1) Setup a service which is layered over several availability zones. 1a) The best way to do this is one service object, with 3 replicasets explicitly forced to use a specific AZ each. 2) Add `service.kubernetes.io/topology-aware-hints: Auto` annotation to the Service object 3) Use a load tester like k6 to send meaningful traffic to your service but only in one AZ 3) Scale up your replica sets until k8s adds Hints to your endpointslices 4) Observe that traffic shifts to only hit pods in one AZ 5) Turn down the replicasets count until such time that K8s removes the hints from your endpointslices 6) Observe traffic shifts back to all pods across all AZ. Note: Patch applied on top of stable-2.12.5 with small adjustments
…e architectures Signed-off-by: Jack Andersen <jandersen@plaid.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
github.com/linkerd#10925
Add support for enabling and disabling topology aware routing when hints are added/removed.
The testing setup is very involved because it involves so many moving parts
Setup a service which is layered over several availability zones.
1a) The best way to do this is one service object, with 3 replicasets explicitly forced to use a specific AZ each.
Add
service.kubernetes.io/topology-aware-hints: Auto
annotation to the Service objectUse a load tester like k6 to send meaningful traffic to your service but only in one AZ
Scale up your replica sets until k8s adds Hints to your endpointslices
Observe that traffic shifts to only hit pods in one AZ
Turn down the replicasets count until such time that K8s removes the hints from your endpointslices
Observe traffic shifts back to all pods across all AZ.
Note: Patch applied on top of stable-2.12.5 with small adjustments
Opening as a PR to keep track of the branch.