-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Updated deduplication section in zfsconcepts.7 for clarity #17893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Updated deduplication section in zfsconcepts.7 for clarity #17893
Conversation
concussious
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments from an enthusiast, I am not a real reviewer and someone else still needs to review this.
| It maintains a | ||
| deduplication table (DDT) in memory, which can grow significantly depending on | ||
| the amount of stored data. | ||
| As a general guideline, at least 1.25 GiB of RAM per 1 TiB of pool storage is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems shockingly low
| reducing the total amount of data stored. | ||
| If a file system has the | ||
| Deduplication is the process of eliminating redundant data blocks at the | ||
| storage level so that only one copy of each unique block is kept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bit misleading. IIUC, the point of dedup is that we keep only two copies, right? One in memory and one in storage, so that if we have to have a third or fourth, they'll get matched with the one in memory and just write a pointer to it instead. Right?
| production environment. | ||
| .Ss Block cloning | ||
| Block cloning is a facility that allows a file (or parts of a file) to be | ||
| Block cloning is a facility that allows a file, or parts of a file, to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this one. Unwinding parentheticals makes the same sentences so much more digestable!
| There are some limitations to block cloning. | ||
| Only whole blocks can be cloned, and blocks can not be cloned if they are not | ||
| yet written to disk, or if they are encrypted, or the source and destination | ||
| Only whole blocks can be cloned, and blocks cannot be cloned if they are not yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can not" is perfectly fine, and people really hate reflowing lines unnecessarily.
I do it too, but usually the opposite way, it's a real pain for people who use traditional console geometries when someone goes and rewraps right up to the edge. Traditionally roff text was wrapped near the middle or at commas, so that people don't have to disturb lines so much, and older greybeards with bigger fonts are looking at a mess.
| .Sy recordsize | ||
| properties differ. | ||
| The OS may add additional restrictions; | ||
| The operating system may add additional restrictions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also does not improve the clarity.
| Look for | ||
| .Qq clone , | ||
| .Qq dedupe | ||
| .Qq dedupe , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too prefer the oxford comma, but it is not wrong to omit it. This does not improve the clarity imo, but does disrupt git-blame.
| Unlike deduplication, this table has minimal overhead, so can be enabled at all | ||
| times. | ||
| Unlike deduplication, this table has minimal overhead, so it can be enabled at | ||
| all times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| pool due to memory exhaustion. | ||
| For these reasons, deduplication is not generally recommended unless there is a | ||
| clear need for it, such as virtual machine images or backup datasets containing | ||
| highly duplicated data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the document said to also have backups before enabling dedup. Why do you want to remove that?
Motivation and Context #2829
The Deduplication section in zfsconcepts.7 contained complex wording that could be made more clear for new users.
This change improves readability and clarity while maintaining the original meaning.
Description
De
Reworded and simplified explanations in the Deduplication section.
Improved grammar and sentence flow.
No functional, performance, or behavior changes.
Documentation-only update.
How Has This Been Tested?
This is a documentation-only change — no code modifications were made.
Verified formatting using .man syntax highlighting in VS Code and confirmed consistent structure with other sections.
Types of changes
Checklist:
Signed-off-by.