Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupRepository (restic) can become stale if velero deployment is not running to observe bsl update/create #8279

Open
kaovilai opened this issue Oct 8, 2024 · 6 comments

Comments

@kaovilai
Copy link
Contributor

kaovilai commented Oct 8, 2024

What steps did you take and what happened:

extension of #7292

We want to invalidate backupRepositories on server startup for all pre-existing BSLs.

Red Hat QE have found that after installing velero and running a successful backup with kopia
Scaling down velero deployment to 0 and deleting/recreating BSL with different prefix then scaling velero back to 1 replica.
Creating another kopia backup result in failed kopia backup

status:
  completionTimestamp: "2024-01-08T11:19:25Z"
  message: 'error to initialize data path: error to boost backup repository connection
    ts-dpa-1-ocp-todolist-mariadb-kopia: error to connect backup repo: error to connect
    repo with storage: error to connect to repository: repository not initialized
    in the provided storage'
  phase: Failed
  progress: {}
  startTimestamp: "2024-01-08T11:19:23Z"

and further, the BackupRepository were pointing to old resticIdentifier.

What did you expect to happen:

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li
Copy link
Contributor

@kaovilai

Uninstalling velero and deleting/recreating BSL with different prefix

What does Uninstalling velero mean? If it means running velero uninstall, everything will go away in the velero namespace, then everything will be recreated, this problem should not happen.

Looks like in your test, the backupRepository CR exists and is not removed. How is this made? Is this a valid operation?

Anyway, this problem falls the situation that BSL is modified, but backupRepository CR is not invalidated. IMO, either Velero server is running or not, at the time of BSL is modified, should go with the same solution.

@kaovilai
Copy link
Contributor Author

kaovilai commented Oct 9, 2024

I mean velero deployment isn't running or deleted.

Not via uninstall command.

@kaovilai
Copy link
Contributor Author

kaovilai commented Oct 9, 2024

Backup repository was generated from first successful backup before bsl modification during velero server absence from deployment deletion.

@Lyndon-Li
Copy link
Contributor

OK. Then see my above comment:
Anyway, this problem falls the situation that BSL is modified, but backupRepository CR is not invalidated. IMO, either Velero server is running or not, at the time of BSL is modified, should go with the same solution.

I think we should find a unified way to solve the problem caused by BSL modification in either velero server is on or off case.
A rough solution is like we should be able to detect the change of the BSL, once it is changed, we invalidate the BR CR. We should avoid invalidate all BSLs during velero server restart.
Additionally, it is better we can detect the changed fields to decide whether we need to invalidate the BR. In another word, we should do the invalidation in the smallest scope.

@kaovilai
Copy link
Contributor Author

kaovilai commented Oct 9, 2024

we should do the invalidation in the smallest scope.

Agree. Will check. Thanks!

@kaovilai kaovilai changed the title backupRepository (restic) can become stale before velero finish startup backupRepository (restic) can become stale if velero deployment is not running to observe bsl update/create Oct 9, 2024
@kaovilai
Copy link
Contributor Author

kaovilai commented Oct 10, 2024

Invalidate if

  • If generation is missing (new bsl)
  • if bslGeneration annotation on backupRepository does not match bsl, update it.

Potentially if there is a validation func (which causes error to initialize data path error above) during backup, try run that during startup for anything that fail the generation test to further minimize the need to invalidate backupRepository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants