Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "sparse" option for Kopia & Restic restore #6664

Closed
draghuram opened this issue Aug 16, 2023 · 34 comments
Closed

Allow "sparse" option for Kopia & Restic restore #6664

draghuram opened this issue Aug 16, 2023 · 34 comments
Assignees
Labels
area/datamover area/fs-backup Needs Design Needs triage We need discussion to understand problem and decide the priority Reviewed Q3 2023
Milestone

Comments

@draghuram
Copy link
Contributor

Describe the problem/challenge you have
Restic, by default, restores sparse files as non-sparse, thus increasing the disk space used on the target PV compared to the source PV. In some cases, this leads to "No space left on the device" errors which are confusing to users because the source PV has no issues hosting the file. For an example, see #2298.

Describe the solution you'd like
Restic 0.15 supports --sparse option which results in a sparse file being restored as sparse file. Velero 1.11 does ship with Restic 0.15 but there is currently no way of passing this option to the command constructed by Velero. The solution is to come up a mechanism where users can pass additional arguments to the restic command. Or provide a "options" field in restore CR that accepts Restic (and Kopia) specific options. Perhaps, the scheme can be generalized so that both backup and restore can accept additional options.

Environment:

  • Velero version (use velero version): 1.11.0

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@Lyndon-Li Lyndon-Li added the 1.13-candidate issue/pr that should be considered to target v1.13 minor release label Aug 17, 2023
@Lyndon-Li Lyndon-Li changed the title Allow "--sparse" option to be passed to Restic restore command Allow "--sparse" option to be passed to Kopia & Restic restore command Aug 17, 2023
@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Aug 17, 2023

This issue also applies to Kopia path.
Kopia also supports a similar option, let's make sure Kopia path is also covered as it is used by both fs-backup and data mover.
The Velero part should be shared by the both path.

@Lyndon-Li Lyndon-Li changed the title Allow "--sparse" option to be passed to Kopia & Restic restore command Allow "sparse" option for Kopia & Restic restore Aug 17, 2023
@reasonerjt
Copy link
Contributor

Given we have made the decision to move away from Restic, I don't think we want to continue making enhancement for Restic.

@draghuram
Copy link
Contributor Author

draghuram commented Aug 17, 2023

I was thinking the same way too but we have a user who cannot move from Restic at this point. So unless it imposes a huge burden, we should make an attempt to support Restic as well.

@draghuram
Copy link
Contributor Author

BTW, Can you please assign this issue to @dzaninovic? I will be away for few days and he will work on this in the mean while.

@sseago
Copy link
Collaborator

sseago commented Aug 17, 2023

@draghuram So there are different issues with restic:

  1. what level of support does it get
  2. how long will it be around

For 1, as long as it's considered supported, at a minimum, it needs bugfixes. As for new features, in the past we've taken the approach that we won't add them, mostly due to resource constraints/priorities. If a community member provides PRs to implement features we hadn't intended to include in restic, we'd probably need to have a discussion around it.

As for 2, I believe the intent is to deprecate restic in the near future, meaning that at some point afterwards it is expected to disappear entirely (still need to get that deprecation policy PR approved/merged, though).

@Lyndon-Li Lyndon-Li assigned draghuram and unassigned draghuram Aug 18, 2023
@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Aug 18, 2023

I tried, but couldn't assign the issue to David directly. Maybe he needs to join the discussion of this issue first, i.e., make some comments or submit a PR for this issue.
Let me reassign it later when available.

@dzaninovic
Copy link
Contributor

I will look into what code changes would be needed to support this.

@Lyndon-Li Lyndon-Li assigned dzaninovic and unassigned draghuram Aug 21, 2023
@sseago
Copy link
Collaborator

sseago commented Aug 22, 2023

Possibly related to #4129 in terms of needing a higher-level configuration framework for passing options to FSB

@reasonerjt
Copy link
Contributor

IMO we don't wanna add such param in restore CR.
A quick way is to introduce a configmap to customize the behavior of the fs-restorer, but a better way may be to design a more flexible restore-policy to contain such settings.

@reasonerjt reasonerjt added Needs triage We need discussion to understand problem and decide the priority Reviewed Q3 2023 labels Aug 30, 2023
@draghuram
Copy link
Contributor Author

I wonder if configuration being discussed in #6697 would help for this option as well.

@reasonerjt
Copy link
Contributor

IMO if this is a high priority, instead of adding restic config policy, and later kopia config policy, we should first introduce a higher level "policy" as a container of different policies and define how to reference such policy in the restore CR, how to expose it via CLI, etc, and then we can add these config policies into this policy.

@draghuram
Copy link
Contributor Author

I agree that a "container" policy would be useful. How about something like this? I am repurposing the idea proposed in #6697 but with some changes:

The following will go in "resource policies" configmap.

version: v1
datamoverPolicies:
  restore:
    sparse: true

As can be seen, "sparse" option is "global" option and should be processed by any uploader that supports it (both current uploaders - restic and kopia do).

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Sep 14, 2023

I agree with the concept of unified API for all uploaders.
Meanwhile, we would like to introduce an ALL-IN-ONE policy mechanism which could contain various configurations and could be shared by backups and restores, then the datamoverPolicies should be part of the ALL-IN-ONE policy.

We expect a loose struct of the policy, so the current datamoverPolicies should be embedded into the policy as is easily.

Please just wait some time, the design is on the way, I think the initial version of the ALL-IN-ONE policy will be available soon.

@draghuram
Copy link
Contributor Author

Thanks @Lyndon-Li. I look forward to seeing the details of ALL-IN-ONE policy.

@draghuram
Copy link
Contributor Author

In today's community call, Daniel mentioned that ALL-IN-ONE policy is being dropped and most probably, new field will be added to restore CR to control "sparse" behavior.

Daniel/Lyndon, please post your thoughts in this matter so that I can propose new design.

@qiuming-best
Copy link
Contributor

@draghuram it's highly welcome you to submit a design.

It would be not suitable to put all kinds of configurations or policies ALL-IN-ONE.
Some configurations may be global and remain unchanged after the Velero pod is up and running, while others may be per backup or per restore. Therefore, it’s not quite appropriate to put all policies in one.
Such as setting the concurrency of data path, some settings in the controller could not be changed after running, maybe those configurations we could put into one global policies.

The "sparse" option is more restore-related, it could change it in every restore. We may not need to complicate this. We can directly pass the restore-related action through the create restore option and store it in the spec of the restore CR.
Same as some options for backup.

Additionally, we can’t ALL-IN-ONE put all configurations in secrets, configmaps, or CRDs because ETCD has a size limit (1.5MiB) for each request.

@draghuram
Copy link
Contributor Author

Thanks @qiuming-best. I am ok to add this as an option in restore CR. How ever, there are few choices in how we can add there and we need to pick one:

  1. Add a top level boolean field called "sparseRestore".
  2. Add a map called "dataMoverOptions" and add a boolean inside the map.

I am personally ok with either though I have slight preference for (2). CC @sseago.

@sseago
Copy link
Collaborator

sseago commented Sep 20, 2023

@draghuram is sparse restore datamover-only, or is it also relevant for FS backup? If the latter, then we probably wouldn't want it in a datamover-specific map.

@Lyndon-Li
Copy link
Contributor

The option should be for VGDP (Velero Generic Data Path) which is shared by fs-backup and data mover, so probably we call it "dataPathOptions", or a more user friendly name with the analogous meaning.

@draghuram
Copy link
Contributor Author

As @Lyndon-Li says, it should apply to both file system backup and data mover but I intended "dataMoverOptions" to mean any backup that involves data transfer, as opposed to snapshots. But I can see the confusion here. I am not very particular about the "map" name as long it is reasonably clear. BTW, does this mean, we are agreeing to not add the boolean directly at the top level?

@reasonerjt reasonerjt removed the 1.13-candidate issue/pr that should be considered to target v1.13 minor release label Sep 27, 2023
@reasonerjt reasonerjt added this to the v1.13 milestone Sep 27, 2023
@reasonerjt
Copy link
Contributor

BTW, does this mean, we are agreeing to not add the boolean directly at the top level?
I think the answer is yes. However, as non-native speaker I'm also stuck at naming it, I think we need something more general than "datamoverOption", b/c it also impact the workflow for pvr.

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Sep 27, 2023

There is indeed a contradiction here for the option name:

  • fs-backup is also a "data mover" technically, since it moves data
  • CSI snapshot data mover is also a "file system backup" technically, since it can back up data from file system level
  • There is no backup type named data mover in Velero

If we want to use data mover as a collective name of the backup types which move data and are implemented by VGDP(Velero Generic Data Path), we need to document this somewhere, but even if we do this, it is still slightly contradictable as it is very close to CSI snapshot data mover.
Otherwise, we will have to pick a new name for as a collective name of the backup types which move data and are implemented by VGDP.

Let's discuss this further.

@draghuram
Copy link
Contributor Author

It will nice if some decision can be arrived at. This is one of those cases where code changes are small but user interface is hard. One alternative is to simply have a top level option "sparseRestore" or "sparse" and document that it applies only in some cases.

@Lyndon-Li
Copy link
Contributor

@draghuram
We've discussed this in the community meeting, but still couldn't not make a decision. Let's continue the discussion @reasonerjt @sseago @shubham-pampattiwar here.

Personally, I suggest we find a user friendly name for VGDP, this is helpful not only for the current requirement but for more future requirements that need to configure VGDP.

@Lyndon-Li
Copy link
Contributor

@draghuram @reasonerjt @qiuming-best @sseago @shubham-pampattiwar
Here is another proposal:

  1. We don't expose concept of VGDP to users
  2. VGDP primarily consists of uploaders and the backup repository. For global uploader options, we create a uploaderConfig CM; for backup repository options, we have them in BackupRepository CR
  3. Uploaders check the uploaderConfig CM. If a uploader supports a config in uploaderConfig, it should follow the config, otherwise, it ignore it (it can print some log message on its own decision). So for the current case, there will be a uploaderConfig CM, inside its data, there is a sparseRestore config

@Lyndon-Li
Copy link
Contributor

For now, if we don't want to create the global uploaderConfig CM, we can also put the uploaderConfig into Backup/Restore CRDs, but the backup CRD and restore CRD can share the same uploaderConfig structure. In future, if we want to move it outside, we can do it with very small efforts.

@weshayutin
Copy link
Contributor

@mrnold FYI

@draghuram
Copy link
Contributor Author

I am fine with adding "uploaderConfig" field in restore CR and inside that field, we can add a boolean flag "sparseRestore". I am fine with adding in Backup CR as well but for the purpose of this issue, only restore matters. If there is consensus on this design, we can work on implementing it.

@Lyndon-Li
Copy link
Contributor

FYI of the Uploader Config design PR #7005

@qiuming-best
Copy link
Contributor

@draghuram
Hello, What is the progress on this issue? We will prepare RC at the end of this month.

Perhaps you can refer to this issue for designing and implementing the code.

@draghuram
Copy link
Contributor Author

I will soon create design for the issue though there is no guarantee that implementation can be finished before RC date.

@qiuming-best
Copy link
Contributor

@draghuram I've implemented theSparse option for Kopia & Restic, I would appreciate it if you could help me verify it

@draghuram
Copy link
Contributor Author

Thanks @qiuming-best. We will give it a try.

@dzaninovic
Copy link
Contributor

Hi @qiuming-best,

The restore test using a data mover and Kopia works as expected in my test at CloudCasa.
I created a 1 GB test file from /dev/zero on a PVC and when restoring it without the new option the file is using 1 GB of data on the disk.
When restoring with --write-sparse-files option the disk usage is 0 but the file size is still 1 GB so option is working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datamover area/fs-backup Needs Design Needs triage We need discussion to understand problem and decide the priority Reviewed Q3 2023
Projects
None yet
Development

No branches or pull requests

7 participants