-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use projected token volume for hostNetwork pods. #428
base: main
Are you sure you want to change the base?
Conversation
3b2b99f
to
557c193
Compare
9afba55
to
2f4c6e4
Compare
@@ -110,6 +113,15 @@ func (si *SidecarInjector) Handle(_ context.Context, req admission.Request) admi | |||
|
|||
// Inject container. | |||
injectSidecarContainer(pod, config, injectAsNativeSidecar) | |||
|
|||
if pod.Spec.HostNetwork { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any cases in which the cx would want to keep using the GCE SA? (even though its not very safe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a common use case? This implementation will not be able to accommodate that
if err != nil { | ||
return admission.Errored(http.StatusInternalServerError, fmt.Errorf("failed to get project id: %w", err)) | ||
} | ||
pod.Spec.Volumes = append(pod.Spec.Volumes, GetSATokenVolume(projectID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we should have the logic that injects volumes in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate?
pkg/webhook/sidecar_spec.go
Outdated
NobodyGID = 65534 | ||
NobodyUID = 65534 | ||
NobodyGID = 65534 | ||
tokenExpiryDuration = 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why 600? Is tokenExpiryDuration
something that can be adjusted? if so, should the cx have the ability to adjust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prolonged it to 3600-should be a more reasonable token expiry duration. See more about what this value does: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#launch-a-pod-using-service-account-token-projection
It's not very useful to make tokenExpiryDuration configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this expires, we have the ability to renew because of the StartTokenServer
call. Do we plan to add e2e tests in a followup PR to test the functionality of the token server? If we have a TokenServer
, do we favor a shorter or longer tokenExpiryDuration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to add an e2e test to test GCS bucket access with hostnetwork enabled pods. That should cover the functionality of the token server.
RetokenExpiryDuration
- we will have to weigh in the security benefits of shorter token lifetimes against the performance benefits of longer lifetimes. Kubernetes' approach provides a good starting point- it will refresh when it hits 80% of TTL or 24h (whichever is shorter). I think 3600s(1h) is a nice balance point between security and network traffic overhead.
WriteTimeout: 10 * time.Second, | ||
} | ||
|
||
if err := server.Serve(socket); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to stop the server gracefully during shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation follows the standard pattern across the repo. e.g.: metrics server:
if err := server.Serve(socket); err != nil { |
Can you give me an example of gracefully shutting down a server? Is there a significant benefit for doing that?
return audience, nil | ||
} | ||
|
||
func StartTokenServer(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behavior when the server failed to start? Will the GCS calls use node's auth scopes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod will not have access to GCS bucket in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And mounting will fail.
4d6f511
to
58f64ae
Compare
58f64ae
to
a4c50b1
Compare
Why we need this change
Before this change, GCSFuse CSI driver does not support pods with hostNetwork:true. This is because the gcsfuse process runs as part of a sidecar container that is injected into the user pod. GCSFuse uses the ADC workflow to fetch the necessary token to access a GCS bucket. However, with hostNetwork enabled, GKE metadata server cannot intercept the token requests for
GET /computeMetadata/v1/instance/service-accounts/default/token
endpoint.What is in this change
HostNetwork=true
user pods, GCSFuse CSI webhook injects a projected SA token volume to the user pod.gke-gcsfuse-sidecar
container prepares a unix domain socket and starts a handler to serve requests on this token; invoke gcsfuse with a config option to point to the socketgcs-auth:token-url:<path to the token>
gke-gcsfuse-sidecar
handles token request from GCSFuse.Local set up and testing
Test cases covered
Setup
hostNetwork=true
, GCS bucket mounted as a volume, and service account set as the one you created in step 2. Check I/O to your bucket from a container in your pod.