Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use cluster identity for using azcopy in volume cloning #1156

Merged

Conversation

umagnus
Copy link
Contributor

@umagnus umagnus commented Dec 8, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

azcopy already supports managed identity in a different way: Authorize access to blobs with AzCopy & Microsoft Entra ID | Microsoft Learn
if user only provides storage account key, then we should use sas token; otherwise we should leverage aks cluster identity(managed identity or spn) for volume cloning which is better way.

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:

use system managed identity logs:

I1213 03:10:29.755936       1 azure_auth.go:130] azure: using managed identity extension to retrieve access token
I1213 03:10:29.755952       1 azure_auth.go:152] azure: using System Assigned MSI to retrieve access token
...
I1213 03:11:01.336224       1 utils.go:75] GRPC call: /csi.v1.Controller/CreateVolume
I1213 03:11:01.336931       1 utils.go:76] GRPC request: {"capacity_range":{"required_bytes":107374182400},"name":"pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290","parameters":{"csi.storage.k8s.io/pv/name":"pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290","csi.storage.k8s.io/pvc/name":"pvc-blob-cloning","csi.storage.k8s.io/pvc/namespace":"default","protocol":"fuse2","skuName":"Premium_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{"mount_flags":["-o allow_other","--file-cache-timeout-in-seconds=120","--use-attr-cache=true","--cancel-list-on-mount-seconds=10","-o attr_timeout=120","-o entry_timeout=120","-o negative_timeout=120","--log-level=LOG_WARNING","--cache-size-mb=1000"]}},"access_mode":{"mode":5}}],"volume_content_source":{"Type":{"Volume":{"volume_id":"MC_xinyuyuaebld84227171_xinyuyuaebld84227171_eastus#fuse2a8dfa02b374049e9b8#pvc-4613e68c-f401-4950-8304-dc839177e38c##default#"}}}}
I1213 03:11:01.342382       1 blob.go:830] got storage account(fuse2a8dfa02b374049e9b8) from secret(azure-storage-account-fuse2a8dfa02b374049e9b8-secret) namespace(default)
I1213 03:11:01.342410       1 controllerserver.go:818] use managed identity to authorize azcopy
I1213 03:11:01.342419       1 controllerserver.go:822] set AZCOPY_AUTO_LOGIN_TYPE=MSI successfully
I1213 03:11:01.342428       1 controllerserver.go:832] authorize by using a system-assigned managed identity
I1213 03:11:01.367727       1 controllerserver.go:749] azcopy job status: NotFound, copy percent: %, error: <nil>
I1213 03:11:01.367760       1 controllerserver.go:753] begin to copy blob container pvc-4613e68c-f401-4950-8304-dc839177e38c to pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290
I1213 03:11:06.358083       1 controllerserver.go:758] azcopy job status: NotFound, copy percent: %, error: <nil>
I1213 03:11:06.358109       1 controllerserver.go:763] copy blob container pvc-4613e68c-f401-4950-8304-dc839177e38c to pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290
I1213 03:11:08.683189       1 controllerserver.go:768] copied blob container pvc-4613e68c-f401-4950-8304-dc839177e38c to pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290 successfully
I1213 03:11:08.695285       1 controllerserver.go:442] store account key to k8s secret(azure-storage-account-fuse2a8dfa02b374049e9b8-secret) in default namespace
I1213 03:11:08.695347       1 controllerserver.go:453] create container pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290 on storage account fuse2a8dfa02b374049e9b8 successfully
I1213 03:11:08.695465       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=7.358180327 request="blob_csi_driver_controller_create_volume_from_volume" resource_group="mc_xinyuyuaebld84227171_xinyuyuaebld84227171_eastus" subscription_id="26ad903f-2330-429d-8389-864ac35c4350" source="blob.csi.azure.com" volumeid="MC_xinyuyuaebld84227171_xinyuyuaebld84227171_eastus#fuse2a8dfa02b374049e9b8#pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290##default#" result_code="succeeded"
I1213 03:11:08.695520       1 utils.go:82] GRPC response: {"volume":{"capacity_bytes":107374182400,"content_source":{"Type":{"Volume":{"volume_id":"MC_xinyuyuaebld84227171_xinyuyuaebld84227171_eastus#fuse2a8dfa02b374049e9b8#pvc-4613e68c-f401-4950-8304-dc839177e38c##default#"}}},"volume_context":{"containername":"pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290","csi.storage.k8s.io/pv/name":"pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290","csi.storage.k8s.io/pvc/name":"pvc-blob-cloning","csi.storage.k8s.io/pvc/namespace":"default","protocol":"fuse2","secretnamespace":"default","skuName":"Premium_LRS"},"volume_id":"MC_xinyuyuaebld84227171_xinyuyuaebld84227171_eastus#fuse2a8dfa02b374049e9b8#pvc-fd769913-64fe-4bdf-9f79-57c0e91ca290##default#"}}

use service principal or user assigned managed identity, it should grant it "Storage Blob Data Contributor" role, or it will return AuthorizationPermissionMismatch error in azcopy. If it doesn't have this role, we will fall back to generate sas token for azcopy.
AuthorizationPermissionMismatch error:

INFO: SPN Auth via secret succeeded.
INFO: Authenticating to source using Azure AD
INFO: azcopy 10.20.0: A newer version 10.22.1 is available to download

INFO: failed to list blobs in container pvc-99172431-b942-47a1-a83e-f2ce142fc248: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=AuthorizationPermissionMismatch) =====
Description=This request is not authorized to perform this operation using this permission.
RequestId:7706cdb1-201e-00a7-100e-3a3676000000
Time:2023-12-29T04:19:28.1692799Z, Details: 
   Code: AuthorizationPermissionMismatch
   GET https://fuse22627572fa4bc4e80b4.blob.core.windows.net/pvc-99172431-b942-47a1-a83e-f2ce142fc248?comp=list&delimiter=%2F&include=metadata&restype=container&timeout=901
   Authorization: REDACTED
   User-Agent: [AzCopy/10.20.0 Azure-Storage/0.15 (go1.20.2; linux)]
   X-Ms-Client-Request-Id: [389341f5-11f2-4c61-6be4-9c460facdf59]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.
   Content-Length: [279]
   Content-Type: [application/xml]
   Date: [Fri, 29 Dec 2023 04:19:27 GMT]
   Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
   X-Ms-Client-Request-Id: [389341f5-11f2-4c61-6be4-9c460facdf59]
   X-Ms-Error-Code: [AuthorizationPermissionMismatch]
   X-Ms-Request-Id: [7706cdb1-201e-00a7-100e-3a3676000000]
   X-Ms-Version: [2020-10-02]

Release note:

none

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 8, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @umagnus. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 8, 2023
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use config.UserAssignedIdentityID (msi) or config.AADClientID and config.AADClientSecret (service principal) for volume cloning

https://github.com/kubernetes-sigs/cloud-provider-azure/blob/b9ede3fc98e9e529911336668756145ef294d8ae/pkg/provider/config/azure_auth.go#L128

pkg/blob/blob.go Outdated Show resolved Hide resolved
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 3 categories:

  1. the azure file auth is using account name & key only (bring your own account key scenario)
  2. service principle auth
  3. managed identity auth
    3.1 system assigned identity auth (you don't need to provide AZCOPY_MSI_CLIENT_ID env)
    3.2 user assigned identity auth

https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-authorize-azure-active-directory#authorize-a-managed-identity

@andyzhangx
Copy link
Member

most folks are using System Assigned MSI in managed blob csi driver, I think you could try a local build with System Assigned MSI support first, and check whether it works in AKS managed blob csi driver in standalone env.

I1207 12:58:44.436244       1 azure_auth.go:130] azure: using managed identity extension to retrieve access token
I1207 12:58:44.436274       1 azure_auth.go:152] azure: using System Assigned MSI to retrieve access token

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 8, 2023
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before copyVolume, if secrets is not empty, then this driver should use sas token to copy volume otherwise use cluster identity to copy volume:

if err := d.copyVolume(ctx, req, accountKey, validContainerName, storageEndpointSuffix); err != nil {
return nil, err
}

pkg/blob/blob.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@umagnus umagnus force-pushed the use_cluster_identity_for_azcopy branch from 20fc26d to 20a2b27 Compare December 14, 2023 03:20
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 14, 2023
@umagnus umagnus force-pushed the use_cluster_identity_for_azcopy branch 2 times, most recently from a45cd1d to 94cce6a Compare December 15, 2023 04:20
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@andyzhangx andyzhangx changed the title [draft] use cluster identity for using azcopy in volume cloning feat: use cluster identity for using azcopy in volume cloning Dec 15, 2023
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Show resolved Hide resolved
if azureAuthConfig.AADClientID == "" {
return []string{}, fmt.Errorf("AADClientID and AADClientSecret must be set when use service principal")
}
authAzcopyEnv = append(authAzcopyEnv, fmt.Sprintf(azcopySPAApplicationID+"="+azureAuthConfig.AADClientID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authAzcopyEnv = append(authAzcopyEnv, fmt.Sprintf("%s=%s", azcopySPAApplicationID, azureAuthConfig.AADClientID), fmt.Sprintf("%s=%s", ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

klog.V(2).Infof("use user assigned managed identity to authorize azcopy")
authAzcopyEnv = append(authAzcopyEnv, fmt.Sprintf(azcopyMSIClientID+"="+azureAuthConfig.UserAssignedIdentityID))
}
klog.V(2).Infof("use system-assigned managed identity to authorize azcopy")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else {
   klog.V(2).Infof("use system-assigned managed identity to authorize azcopy")
}

otherwise L810 & L815 would be both printed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return []string{}, fmt.Errorf("AADClientSecret shouldn't be nil or useManagedIdentityExtension must be set to true")
}

// getSASToken will generate sas token for azcopy if secrets is not nil or cluster not use managed identity/service principal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// getSASToken will only generate sas token for azcopy in following conditions:
1. secrets is not empty
2. driver is not using managed identity and service principal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/util/util.go Outdated
}

func (ec *ExecCommand) RunCommand(cmd string) (string, error) {
out, err := exec.Command("sh", "-c", cmd).CombinedOutput()
func (ec *ExecCommand) RunCommand(cmdStr string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could make the function more generic:

RunCommand(cmdStr string, authEnv []string)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(authEnv) > 0 {
  cmd.Env = append(os.Environ(), ec.AuthAzcopyEnv...)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@umagnus umagnus force-pushed the use_cluster_identity_for_azcopy branch 6 times, most recently from e448d3c to 3629254 Compare December 28, 2023 05:12
@umagnus
Copy link
Contributor Author

umagnus commented Dec 28, 2023

/retest

1 similar comment
@umagnus
Copy link
Contributor Author

umagnus commented Dec 28, 2023

/retest

pkg/blob/controllerserver.go Show resolved Hide resolved
pkg/blob/controllerserver.go Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@umagnus
Copy link
Contributor Author

umagnus commented Dec 29, 2023

/retest

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and pls squash commits, thanks.

@umagnus umagnus force-pushed the use_cluster_identity_for_azcopy branch 2 times, most recently from 1d63250 to 42832f1 Compare December 29, 2023 04:34
@umagnus umagnus force-pushed the use_cluster_identity_for_azcopy branch from 42832f1 to 7e0f0e9 Compare December 29, 2023 06:10
@umagnus
Copy link
Contributor Author

umagnus commented Dec 29, 2023

/retest

1 similar comment
@umagnus
Copy link
Contributor Author

umagnus commented Dec 29, 2023

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Dec 29, 2023

@umagnus: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-blob-csi-driver-e2e-vmss 625b024 link true /test pull-blob-csi-driver-e2e-vmss

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@umagnus
Copy link
Contributor Author

umagnus commented Dec 29, 2023

/retest

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 29, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, umagnus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 29, 2023
@andyzhangx andyzhangx merged commit 03a3da1 into kubernetes-sigs:master Dec 29, 2023
21 of 22 checks passed
@andyzhangx
Copy link
Member

/cherrypick release-1.23

@k8s-infra-cherrypick-robot

@andyzhangx: new pull request created: #1195

In response to this:

/cherrypick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants