Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup results in Access Denied for PutObject operation to S3 #8062

Closed
chrisRedwine opened this issue Jul 30, 2024 · 9 comments · Fixed by vmware-tanzu/velero-plugin-for-aws#218
Assignees

Comments

@chrisRedwine
Copy link

What steps did you take and what happened:

I’m testing out Velero, and when trying to do a backup for a set of Redis PVCs/PVs, I get the following error:

rpc error: code = Unknown desc = error putting object backups/backups/redis-test-2/redis-test-2-logs.gz: operation error S3: PutObject, https response error StatusCode: 403, RequestID: <redacted>, HostID: <redacted>, api error AccessDenied: Access Denied
  • Command: velero backup create redis-test-2 --include-resources=pvc,pv --selector app.kubernetes.io/name=redis
  • Running on EKS, where the volumes in question are provisioned via EBS CSI driver (for which I’ve ensured the snapshotter sidecar is running)
  • External snapshot controller and CRDs installed
  • VolumeSnapshotClass CR created
  • Velero node agent enabled
  • EnableCSI feature flag set
  • Velero service account annotated with the IRSA role
IRSA Policy:
{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:DeleteSnapshot",
                "ec2:CreateVolume",
                "ec2:CreateTags",
                "ec2:CreateSnapshot"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Ec2ReadWrite"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:ListMultipartUploadParts",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::<my-bucket>/*",
            "Sid": "S3ReadWrite"
        },
        {
            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::<my-bucket>",
            "Sid": "S3List"
        }
    ],
    "Version": "2012-10-17"
}
BackupStorageLocation spec:
spec:
  accessMode: ReadWrite
  config:
    checksumAlgorithm: ''
    region: us-east-2
    serverSideEncryption: AES256
    tagging: <redacted>
  default: true
  objectStorage:
    bucket: <my-bucket>
    prefix: backups
  provider: aws
  validationFrequency: 1m
Terraform for S3 bucket:
module "velero_backup_s3_bucket" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "4.1.2"
  create_bucket = var.enable_velero
  bucket = local.velero_s3_bucket_name
  attach_deny_insecure_transport_policy = true
  attach_require_latest_tls_policy      = true
  acl = "private"
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
  control_object_ownership = true
  object_ownership         = "BucketOwnerPreferred"
  versioning = {
    status     = true
    mfa_delete = false
  }
  server_side_encryption_configuration = {
    rule = {
      apply_server_side_encryption_by_default = {
        sse_algorithm = "AES256"
      }
    }
  }
  tags = local.tags_full
}

What did you expect to happen:

No 403 errors when uploading to S3

The following information will help us better understand what's going on:

velero debug --backup redis-test-2 support bundle contained info I'm not comfortable sharing (e.g., IPs) - if needed, I can DM through Slack or go through and redact everything, just let me know.

Anything else you would like to add:

Slack message for this issue is here.

I suspect the problem has something to do with the way SSE is set up or the bucket configuration or perhaps caCert being required, but I’m not certain. Any help would be greatly appreciated!

Environment:

  • Velero version (use velero version): v1.14.0
  • Helm chart version: 7.1.2
  • velero-plugin-for-aws version: 1.10.0
  • Velero features (use velero client config get features): This returns features: <NOT SET>, though I can confirm that the velero server container is running with --uploader-type=kopia --backup-sync-period=1m --fs-backup-timeout=4h --client-burst=30 --client-page-size=500 --client-qps=20 --default-backup-ttl=72h --default-item-operation-timeout=4h --garbage-collection-frequency=1h --log-format=json --log-level=debug --store-validation-frequency=1m --terminating-resource-timeout=10m --features=EnableCSI
  • Kubernetes version (use kubectl version): v1.29.6-eks-db838b0
  • Kubernetes installer & version: AWS EKS eks.10
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Amazon Linux 2

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li
Copy link
Contributor

Could you provide the velero debug bundle by running velero debug?

@chrisRedwine
Copy link
Author

Could you provide the velero debug bundle by running velero debug?

Here you go, @Lyndon-Li: bundle-2024-08-05-19-30-46.tar.gz

Thanks in advance!

@chrisRedwine
Copy link
Author

FYI, @reasonerjt - I came back to this after some weeks and realized it's just caused by a missing s3:PutObjectTagging permission, which is neither included in the docs for velero-plugin-for-aws nor in the policy in the iam-role-for-service-accounts-eks submodule.

So,

  • I've submitted a PR to fix the docs in the velero-plugin-for-aws repo, which I've marked as resolving this issue.
  • I also submitted an issue/PR to fix the policy in the iam-role-for-service-accounts-eks submodule.

@sseago
Copy link
Collaborator

sseago commented Sep 16, 2024

@chrisRedwine Hmm. I don't think I've ever added that one. The velero doc suggestion has always worked for me. I wonder whether this is a setting that is only needed by certain S3 providers.

@shubham-pampattiwar
Copy link
Collaborator

@chrisRedwine Are you using aws s3 object tagging ? (Like passing any tag keys in BSL s3 config) If yes then maybe that explains why we need this additional permission.

@chrisRedwine
Copy link
Author

@sseago yes, I've got a full repro described here that's just using standard AWS resources, and utilizes tagging on the objects (which is advertised here, implemented here)

@shubham-pampattiwar
Copy link
Collaborator

Hmm. Yeah then it makes sense to add s3:PutObjectTagging permission to the IAM policy.

@sseago
Copy link
Collaborator

sseago commented Sep 17, 2024

@chrisRedwine Ahh, ok that makes sense then. Yes, by default tagging isn't used, so that permission isn't needed. But if you use that, you'd need it.

Does it make sense to note that in your PR, maybe in the description text before the policy sample? -- '"s3:PutObjectTagging" is only needed if you make use of the tagging field in the BSL definition.'

@chrisRedwine
Copy link
Author

@sseago Makes sense to me - I've added the notes to the PR in this commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants