Skip to content

Conversation

@PuvvadaBhaskar
Copy link

@PuvvadaBhaskar PuvvadaBhaskar commented Nov 4, 2025

Motivation and Context #2829

The Deduplication section in zfsconcepts.7 contained complex wording that could be made more clear for new users.
This change improves readability and clarity while maintaining the original meaning.

Description

De

Reworded and simplified explanations in the Deduplication section.

Improved grammar and sentence flow.

No functional, performance, or behavior changes.

Documentation-only update.

How Has This Been Tested?

This is a documentation-only change — no code modifications were made.
Verified formatting using .man syntax highlighting in VS Code and confirmed consistent structure with other sections.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ *] Documentation (a change to man pages or other documentation)

Checklist:

  • [* ] My code follows the OpenZFS code style requirements.
  • [* ] I have updated the documentation accordingly.
  • [* ] I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • [ *] All commit messages are properly formatted and contain Signed-off-by.

@behlendorf behlendorf added Type: Documentation Indicates a requested change to the documentation Status: Code Review Needed Ready for review and testing labels Nov 6, 2025
Copy link
Contributor

@concussious concussious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments from an enthusiast, I am not a real reviewer and someone else still needs to review this.

It maintains a
deduplication table (DDT) in memory, which can grow significantly depending on
the amount of stored data.
As a general guideline, at least 1.25 GiB of RAM per 1 TiB of pool storage is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems shockingly low

reducing the total amount of data stored.
If a file system has the
Deduplication is the process of eliminating redundant data blocks at the
storage level so that only one copy of each unique block is kept.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit misleading. IIUC, the point of dedup is that we keep only two copies, right? One in memory and one in storage, so that if we have to have a third or fourth, they'll get matched with the one in memory and just write a pointer to it instead. Right?

production environment.
.Ss Block cloning
Block cloning is a facility that allows a file (or parts of a file) to be
Block cloning is a facility that allows a file, or parts of a file, to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this one. Unwinding parentheticals makes the same sentences so much more digestable!

There are some limitations to block cloning.
Only whole blocks can be cloned, and blocks can not be cloned if they are not
yet written to disk, or if they are encrypted, or the source and destination
Only whole blocks can be cloned, and blocks cannot be cloned if they are not yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can not" is perfectly fine, and people really hate reflowing lines unnecessarily.

I do it too, but usually the opposite way, it's a real pain for people who use traditional console geometries when someone goes and rewraps right up to the edge. Traditionally roff text was wrapped near the middle or at commas, so that people don't have to disturb lines so much, and older greybeards with bigger fonts are looking at a mess.

.Sy recordsize
properties differ.
The OS may add additional restrictions;
The operating system may add additional restrictions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also does not improve the clarity.

Look for
.Qq clone ,
.Qq dedupe
.Qq dedupe ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too prefer the oxford comma, but it is not wrong to omit it. This does not improve the clarity imo, but does disrupt git-blame.

Unlike deduplication, this table has minimal overhead, so can be enabled at all
times.
Unlike deduplication, this table has minimal overhead, so it can be enabled at
all times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pool due to memory exhaustion.
For these reasons, deduplication is not generally recommended unless there is a
clear need for it, such as virtual machine images or backup datasets containing
highly duplicated data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the document said to also have backups before enabling dedup. Why do you want to remove that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing Type: Documentation Indicates a requested change to the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants