Skip to content

Commit

Permalink
Merge pull request #1 from SumoLogic-Labs/adi-launch
Browse files Browse the repository at this point in the history
Open source `token-refresher`
  • Loading branch information
AdityaVallabh authored Mar 24, 2024
2 parents 17be83b + af9d1eb commit f0c6c49
Show file tree
Hide file tree
Showing 15 changed files with 1,274 additions and 0 deletions.
26 changes: 26 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
FROM --platform=$BUILDPLATFORM golang:1.22.0 as builder

ARG TARGETOS
ARG TARGETARCH

WORKDIR /token-refresher

COPY . .

RUN go vet ./... && \
go test -v -race ./... && \
CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o token-refresher

FROM alpine

WORKDIR /token-refresher

RUN addgroup token-refresher \
&& adduser -u 1000 -S -g 1000 token-refresher --ingroup token-refresher \
&& chown -R token-refresher:token-refresher /token-refresher

USER token-refresher

COPY --from=builder /token-refresher/token-refresher /usr/local/bin/token-refresher

ENTRYPOINT ["token-refresher"]
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Overview

`service-account-token-refresher` is a light-weight sidecar designed to ensure the continuous validity of the service account token projected by kubelet. It takes on the role of renewing the token when kubelet doesn't, addressing a known issue in Kubernetes where it ceases to refresh the token as a pod enters termination phase.

For more details on the issue, visit: [Kubelet stops rotating service account tokens when pod is terminating, breaking preStop hooks](https://github.com/kubernetes/kubernetes/issues/116481)

This issue particularly affects pods that require a significant amount of time, potentially hours or days, to shut down gracefully after Kubernetes sends a termination signal.

![Working](assets/token-refresher.png)

# Deployment Instructions

1. Build and push the Docker image using the provided Dockerfile to your preferred container registry.
2. Deploy the tool to your cluster using the sample Kubernetes manifest found here: [examples/token-refresher.yaml](./examples/token-refresher.yaml)

# How It Works

The token refresher is designed to operate as a sidecar container alongside your main application that depends on the service account token. It generates a new token at a custom location and updates the main container to use this new path. The refresher goes through several key phases:

1. **Initialization**

Initially, the refresher checks for the existence of a custom token. If it's missing, it sets up a symlink to the default projected token.

2. **Monitoring**

Then it enters a passive state where it periodically checks if the current token is expiring soon while waiting for a termination signal from Kubernetes.

If it receives a shutdown signal (either from Kubernetes or the application) or it detects that the token is about to expire, it transitions to the active state.

3. **Refreshing**

In the active state, the refresher begins to regularly request a new token from the Kubernetes API server before the current one expires. It includes robust error handling to manage potential API server issues. This process continues until the application signals the refresher to stop.

# Usage

```sh
$ token-refresher --help
A sidecar which starts auto-refreshing the service account token when the default one is close to expiry or container receives a shutdown signal.

Usage:
token-refresher [flags]

Flags:
--default_token_file string path to default service account token file (default "/var/run/secrets/eks.amazonaws.com/serviceaccount/token")
--expiration_duration duration token expiry duration (default 2h0m0s)
-h, --help help for token-refresher
--kubeconfig string (optional) absolute path to the kubeconfig file (default "/home/token-refresher/.kube/config")
--max_attempts int max retries on token refresh failure (default 3)
-n, --namespace string current namespace
--refresh_interval duration token refresh interval (default 1h0m0s)
-s, --service_account string name of service account to issue token for
--sleep duration sleep duration between retries (default 20s)
--token_audience strings comma separated token audience (default [sts.amazonaws.com])
--token_file string path to self-managed service account token file (default "/var/run/secrets/token-refresher/token")
```

# Backstory

While moving a microservice to Kubernetes, we encountered a scenario where the service required over 24 hours to fully drain. We set up a PreStop hook and extended the `terminationGracePeriodSeconds` to accommodate this. However, we soon faced `ExpiredTokenException` errors.

Investigation led us to a bug in Kubernetes, still unresolved as of March 2024, detailed here: [Kubelet stops rotating service account tokens when pod is terminating, breaking preStop hooks](https://github.com/kubernetes/kubernetes/issues/116481).

We attempted a workaround by extending the token expiration using the `eks.amazonaws.com/token-expiration` annotation, but it couldn't exceed 24 hours as discussed [here](https://github.com/aws/amazon-eks-pod-identity-webhook#amazon-eks-pod-identity-webhook). We then looked at the cluster level `service-account-max-token-expiration` flag, only to be blocked by an open feature request that prevented us from adjusting it ourselves: [Allow user to modify the kube-apiserver flag --service-account-max-token-expiration](https://github.com/aws/containers-roadmap/issues/1836).

We also considered using long-lived tokens, but they were incompatible due to a hardcoded issuer in the tokens, which was not accepted as per the error: `An error occurred (InvalidIdentityToken) when calling the AssumeRoleWithWebIdentity operation: Issuer must be a valid URL`. We needed the issuer to match the cluster's OIDC HTTP URL.

After exhausting all other available options, we decided to build our own token-refresher. It began as a simple shell script to fetch new tokens from the API server, but as complexity grew with retries and error handling, and with the need for better testing, we developed this Go-based service.

During testing, we encountered another hiccup where the refresher would start after the main container, causing errors due to the missing token. To resolve this, we added an init container to create the necessary symlink from the custom token to the default one at startup.

This refresher has proven to be very effective for us, and we hope it will be beneficial to you as well!
Binary file added assets/token-refresher.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions cmd/root.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
package cmd

import (
"fmt"
"os"
"path/filepath"
"time"

"github.com/SumoLogic-Labs/service-account-token-refresher/pkg/signals"
tokenrefresher "github.com/SumoLogic-Labs/service-account-token-refresher/pkg/token-refresher"

"github.com/spf13/cobra"
"github.com/spf13/viper"
"k8s.io/client-go/util/homedir"
)

type config struct {
tokenrefresher.TokenRefresher `mapstructure:",squash"`
}

var conf *config

var rootCmd = &cobra.Command{
Use: "token-refresher",
Short: "Automatic token refresher for terminating pods",
Long: `A sidecar which starts auto-refreshing the service account token when the default one is close to expiry or container receives a shutdown signal.`,
Run: func(cmd *cobra.Command, args []string) {
stopCh := signals.SignalShutdown()
refresher := conf.TokenRefresher
if err := refresher.Run(stopCh); err != nil {
fmt.Printf("unable to run: %s", err.Error())
os.Exit(2)
}
fmt.Println("Exiting")
},
}

func Execute() {
err := rootCmd.Execute()
if err != nil {
os.Exit(1)
}
}

func init() {
cobra.OnInitialize(initConfig)

// The flag names must match those from conf.TokenRefresher
rootCmd.Flags().StringP("namespace", "n", "", "current namespace")
rootCmd.Flags().StringP("service_account", "s", "", "name of service account to issue token for")
rootCmd.Flags().String("default_token_file", "/var/run/secrets/eks.amazonaws.com/serviceaccount/token", "path to default service account token file")
rootCmd.Flags().String("token_file", "/var/run/secrets/token-refresher/token", "path to self-managed service account token file")
rootCmd.Flags().StringSlice("token_audience", []string{"sts.amazonaws.com"}, "comma separated token audience")
rootCmd.Flags().Duration("expiration_duration", time.Hour*2, "token expiry duration")
rootCmd.Flags().Duration("refresh_interval", time.Hour*1, "token refresh interval")
rootCmd.Flags().Int("max_attempts", 3, "max retries on token refresh failure")
rootCmd.Flags().Duration("sleep", time.Second*20, "sleep duration between retries")

if home := homedir.HomeDir(); home != "" {
rootCmd.Flags().String("kubeconfig", filepath.Join(home, ".kube", "config"), "(optional) absolute path to the kubeconfig file")
} else {
rootCmd.Flags().String("kubeconfig", "", "absolute path to the kubeconfig file")
}

viper.BindPFlags(rootCmd.LocalFlags())
}

func initConfig() {
viper.AutomaticEnv() // read in upper-cased env vars corresponding to above CLI flags
conf = new(config)
err := viper.Unmarshal(conf)
if err != nil {
fmt.Printf("unable to decode into config struct, %v", err)
}
}
110 changes: 110 additions & 0 deletions examples/token-refresher.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: app
namespace: app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: token-refresher
namespace: app
rules:
- apiGroups: [""]
resources: ["serviceaccounts/token"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: token-refresher
namespace: app
subjects:
- kind: ServiceAccount
name: app
namespace: app
roleRef:
kind: ClusterRole
name: token-refresher
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Pod
metadata:
name: long-draining-pod
namespace: app
spec:
containers:
- image: service-account-token-refresher:latest # update this
imagePullPolicy: Always
name: token-refresher
env:
- name: DEFAULT_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
- name: TOKEN_FILE
value: /var/run/secrets/token-refresher/token
- name: EXPIRATION_DURATION
value: 10m
- name: REFRESH_INTERVAL
value: 1m
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: SERVICE_ACCOUNT
value: app
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
volumeMounts:
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
- mountPath: /var/run/secrets/token-refresher
name: token-refresher
- name: long-draining-app
image: alpine
command:
- sh
- -c
- |
for i in `seq 1 10`
do
# prints the token's expiry
echo "Now: $(date)"
EXPIRY=$(awk -F . '{if (length($2) % 4 == 3) print $2"="; else if (length($2) % 4 == 2) print $2"=="; else print $2; }' $AWS_WEB_IDENTITY_TOKEN_FILE | tr -- '-_' '+/' | base64 -d | awk -F , '{print $2}' | awk -F : '{print "@"$2}' | xargs date -d)
echo "Exp: $EXPIRY"
echo
sleep 20s
done
lifecycle:
preStop:
exec:
command:
- sh
- -c
- # custom draining logic here
sleep 30s &&
touch /var/run/secrets/token-refresher/shutdown
env:
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/token-refresher/token
volumeMounts:
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
- name: token-refresher
mountPath: /var/run/secrets/token-refresher
readOnly: false
terminationGracePeriodSeconds: 180
serviceAccountName: app
volumes:
- name: token-refresher
emptyDir: {}
- name: aws-iam-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: sts.amazonaws.com
path: token
67 changes: 67 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
module github.com/SumoLogic-Labs/service-account-token-refresher

go 1.22.0

require (
github.com/spf13/cobra v1.8.0
github.com/spf13/viper v1.18.2
k8s.io/api v0.29.2
k8s.io/apimachinery v0.29.2
k8s.io/client-go v0.29.2
)

require (
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/emicklei/go-restful/v3 v3.11.3 // indirect
github.com/evanphx/json-patch v4.12.0+incompatible // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
github.com/go-openapi/jsonreference v0.21.0 // indirect
github.com/go-openapi/swag v0.23.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/imdario/mergo v0.3.16 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/magiconair/properties v1.8.7 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pelletier/go-toml/v2 v2.1.1 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/sagikazarmark/locafero v0.4.0 // indirect
github.com/sagikazarmark/slog-shim v0.1.0 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
github.com/spf13/afero v1.11.0 // indirect
github.com/spf13/cast v1.6.0 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/subosito/gotenv v1.6.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/exp v0.0.0-20240222234643-814bf88cf225 // indirect
golang.org/x/net v0.22.0 // indirect
golang.org/x/oauth2 v0.18.0 // indirect
golang.org/x/sys v0.18.0 // indirect
golang.org/x/term v0.18.0 // indirect
golang.org/x/text v0.14.0 // indirect
golang.org/x/time v0.5.0 // indirect
google.golang.org/appengine v1.6.8 // indirect
google.golang.org/protobuf v1.33.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/klog/v2 v2.120.1 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
k8s.io/utils v0.0.0-20240102154912-e7106e64919e // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
Loading

0 comments on commit f0c6c49

Please sign in to comment.