Multiple sets of archives stored in one repository #5645

ghost · 2021-01-23T01:48:46Z

ghost
Jan 23, 2021

Apologies if this is in the docs or just obvious to most... I've checked the docs, but didn't find an explicit answer.

I'm confused about the best practice with repositories and archives. If I have multiple sets of data on the same machine that I want to back up, is there any reason that I can't/shouldn't create uniquely named archives for each of the sets of data, and store them all in the same repository? Every example I've found shows the same data backed up in each archive in a repository.

There's nothing stopping me from having everything in one archive, or separate repositories for each set of data (though I believe this means I'd lose dedup), just curious if this would work or not?

Answered by elho

Jan 23, 2021

Multiple different archives are fine. Do use different prefixes for their names so you can properly match them in operations.
The can overlap as well, due to the beaty of deduplication.
I e.g. have hostname-set-isodate style names and examples for set are sysconfig (/etc + select thisgs from /var), src (sourcecode from my $HOME), Maildir, which e.g. can run more frequently, and home and system (/, ie. everything) to run daily. Thus e.g. home includes Maildir and src and system includes everything at the cost of the few kB it takes to reference the existing chunks.
While nowadays leaning towards never ever pruning, the idea to have e.g. both home and syntem was to at some point (e.g. 10+ y…

View full answer

elho · 2021-01-23T11:12:54Z

elho
Jan 23, 2021

Multiple different archives are fine. Do use different prefixes for their names so you can properly match them in operations.
The can overlap as well, due to the beaty of deduplication.
I e.g. have hostname-set-isodate style names and examples for set are sysconfig (/etc + select thisgs from /var), src (sourcecode from my $HOME), Maildir, which e.g. can run more frequently, and home and system (/, ie. everything) to run daily. Thus e.g. home includes Maildir and src and system includes everything at the cost of the few kB it takes to reference the existing chunks.
While nowadays leaning towards never ever pruning, the idea to have e.g. both home and syntem was to at some point (e.g. 10+ years) prune the system part and keep the home only.

Only possible downside is that there is operations like check and recreate whose runtime (partly) scales with number of archives. So my scheme above defenitely is a long term experiment and I may reach the point, where I at delete (without compaction) any archives that are a strict subset of the system ones to cut down there.

In your case, which AIUI is disjoint sets of data, it'd depend on amount of sets and backup frequency - 3 sets of daily backups wouldn't be something to be worried about, 10 sets being backed up hourly and all to be kept for years would get you into a situation where a prune will not be feasible anymore.

Sidenote: The recommended approach to work around maximum archive size limit (the thnig that shows 0-1% for almost everyone 😉) is to split data across two (or more) archives.

1 reply

ghost Jan 23, 2021

Thanks for the thorough response! Your overlapping setup is interesting, I hadn't even considered something like that - good to keep in mind.

My current use case is backing up ~8 sets of data from my media server to another machine on the network- archives look something like this:

/backups/<host>::photos-<timestamp>
/backups/<host>::games-<timestamp>
/backups/<host>::appdata-<timestamp>
/backups/<host>::sysconfig-<timestamp>
. . .

For each dataset, create and prune --prefix .. runs daily to keep 7 daily/4 weekly/6 monthly. At least for now, I'm nowhere near the archive limit, though splitting in the future if needed shouldn't be an issue. It sounds like I'm good to go with my configuration.

Thanks again!

ThomasWaldmann · 2021-01-23T12:57:34Z

ThomasWaldmann
Jan 23, 2021
Maintainer

"Do use different prefixes for their names so you can properly match them in operations."

Just re-emphasizing this: if you use multiple backup jobs, backing up different sets of input data, it is pretty much required that you use different archive names prefixes and borg prune --prefix .... Otherwise prune will run havoc deleting lots of archives you likely intended to keep. Using --dry-run until you have seen everything works correctly is also a good idea.

0 replies

jdchristensen · 2021-01-23T15:51:13Z

jdchristensen
Jan 23, 2021

While it works to have different data sets in the same repo, unless you expect there to be a big savings due to deduplication, I would strongly recommend putting different data sets in different repos. This has various advantages. If a repo is corrupted, you don't lose everything. You can move one of the repos to another disk if space becomes an issue. Various operations on repos will be faster and use a lot less memory. Etc. That said, if there is a lot of overlap in the data sets, then putting them in the same repo can make sense.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple sets of archives stored in one repository #5645

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Multiple sets of archives stored in one repository #5645

ghost Jan 23, 2021

Replies: 3 comments · 1 reply

elho Jan 23, 2021

ghost Jan 23, 2021

ThomasWaldmann Jan 23, 2021 Maintainer

jdchristensen Jan 23, 2021

ghost
Jan 23, 2021

Replies: 3 comments 1 reply

elho
Jan 23, 2021

ThomasWaldmann
Jan 23, 2021
Maintainer

jdchristensen
Jan 23, 2021