Skip to content

Conversation

adam900710
Copy link
Collaborator

[BUG]
There is a bug report that legacy code of "btrfs rescue chunk-recover" is triggering false alerts from tree-checker, and refuse to work:

btrfs rescue chunk-recover /dev/nvme1n1p1

Scanning: DONE in dev0
corrupt leaf: root=1 block=13924671995904 slot=0, unexpected item end, have 16283 expect 0 <<< Note the "expect 0"
leaf 13924671995904 items 11 free space 12709 generation 1589644 owner ROOT_TREE
leaf 13924671995904 flags 0x1(WRITTEN) backref revision 1
[...]
Couldn't read tree root
open with broken chunk error

[CAUSE]
The item end checks is from __btrfs_check_leaf() from tree-checker, and for the first slot of a leaf, the expected end should be BTRFS_LEAF_DATA_SIZE(), which is fetched from fs_info->leaf_data_size.

However for the fs_info opened by chunk recover, it's not going through the regular open_ctree(), but open_ctree_with_broken_chunk(), which doesn't populate that member and resulting BTRFS_LEAF_DATA_SIZE() to return 0.

[FIX]
There is no need to cache leaf_data_size, as it can be easily calulated using nodesize.

And kernel is already doing that, so follow the kernel to remove btrfs_fs_info::leaf_data_size, and use a simple inline function to do the calculation instead.

Reported-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com
Link: https://lore.kernel.org/linux-btrfs/CABXGCsOug_bxVZ5CN1EM0sd9U4JAz=Jf5EB2TQe8gs9=KZvWEA@mail.gmail.com/

adam900710 and others added 30 commits September 11, 2025 15:23
[BUG]
On older kernels without the fix ae4477f93756 ("btrfs: update
superblock's device bytes_used when dropping chunk"), the test case 057
will result super block device item to mismatch with the chunk one.

Normally this is not a big deal, but newer btrfs-progs will check for
such mismatch.
Although newer btrfs-progs won't report this as an error, the test case
fsck/057 will manually check for any warnings, and fail the test case:

  ====== RUN CHECK /home/runner/work/btrfs-progs/btrfs-progs/btrfs check /dev/loop1
  [1/8] checking log skipped (none written)
  [2/8] checking root items
  [3/8] checking extents
  WARNING: device 2's bytes_used was 503316480 in tree but 570425344 in superblock
  [4/8] checking free space tree
  [5/8] checking fs roots
  [6/8] checking only csums items (without verifying data)
  [7/8] checking root refs
  [8/8] checking quota groups skipped (not enabled on this FS)
  Opening filesystem to check...
  Checking filesystem on /dev/loop1
  UUID: efd3e3b9-2fac-4a6f-b378-34694dc2d446
  found 147456 bytes used, no error found
  total csum bytes: 0
  total tree bytes: 147456
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 139992
  file data blocks allocated: 0
   referenced 0

That WARNING line will fail the test case, even if we didn't error out.

[CAUSE]
The test case itself is a test case for btrfs-progs commit 0dc8b8b
("btrfs-progs: check: fix wrong total bytes check for seed device"),
which will report minor warning like the following:

  [2/7] checking extents
  WARNING: minor unaligned/mismatch device size detected
  WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it

But in this particular case, 057 should only check for the related
wanrings, not something caused by the kernel.

[FIX]
Make the warning check in fsck/057 more specific, instead of "WARNING"
use "fix-device-size" as the keyword.

This is an unfortunate workaround for the CI kernels, which doesn't have
fast enough backport of upstream kernel fixes.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
…in chunk root

The superblock of each device contains a copy of the corresponding
struct btrfs_dev_item that lives in the chunk root.

Add a check that the total_bytes and bytes_used values of these two
copies match.

Signed-off-by: Mark Harmstone <maharmstone@fb.com>
[ Change the error to warning ]
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Extract resize argument parsing logic from check_resize_args() into a
separate parse_resize_args() function and makes the parsing logic
reusable for future features like offline resize.

Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Add support for resizing btrfs filesystems while unmounted using a new
option --offline, without requiring mount privileges or dealing with
mounted filesystem constraints.

Current limitations:

- increase size
- single-device filesystem

It works on both block devices and regular files. For regular files it
also truncates the file to the new size.

This provides an alternative for users who need to resize filesystems in
environments where mounting may not be possible or desirable, such as in
containers or during system recovery.

Pull-request: kdave#1007
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Add comprehensive test coverage for the new --offline resize feature.

The test case covers:

- Valid resize operations: incremental (+1G, 1:+1G) and absolute (2G)
- Invalid operations that should fail: shrinking, multi-device, cancel
- Error conditions: invalid device IDs, malformed size arguments
- Option compatibility: --offline with --enqueue should be rejected
- Mount state validation: offline resize should fail on mounted filesystems

Each test creates temporary filesystem images, performs resize operations,
and verifies the resulting filesystem size matches expectations. Error
cases use run_mustfail to ensure proper failure handling.

Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Add documentation for the new --offline.  Update the warning to reflect
that it is now possible to resize a btrfs image, but you need to specify
the --offline flag. Highlighted that --offline flag only supports
increasing the size of single device filesystems.

Signed-off-by: Leo Martins <loemra.dev@gmail.com>
The quotas are missing the status command, so add it with some basic
information about quotas that can be found in sysfs. The information
access is unrestricted and does not use any privileged ioctls.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To read the information about filesystem and devices we use the FS_INFO
ioctl, the num_devices however does not include the seeding device. To
compensate, search_chunk_tree_for_fs_info() enumerates the devices and
counts again using the SEARCH_TREE ioctl, which is privileged.

Due to that the whole get_fs_info() fails and prevents some commands to
run properly, eg. 'btrfs fi show' on a mounted path.

Change the counting to manual iteration up to the max_id which we have
read from the FS_INFO ioctl.

Signed-off-by: David Sterba <dsterba@suse.com>
The NOATIME opening mode is not necessary and can fail when opening a
path that would be other wise possible to open in read-only mode.

Signed-off-by: David Sterba <dsterba@suse.com>
Add basic config for python linters and make it work with our python
bindings.

Signed-off-by: David Sterba <dsterba@suse.com>
This is not used and redundant.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
Let's not use the ever confusing pattern of "if (!strcmp(...))" with
negated operator and positive expected outcome.

Signed-off-by: David Sterba <dsterba@suse.com>
Let's not use the ever confusing pattern of "if (!strncmp(...))" with
negated operator and positive expected outcome.

Signed-off-by: David Sterba <dsterba@suse.com>
Let's not use the ever confusing pattern of "if (!memcmp(...))" with
negated operator and positive expected outcome.

Signed-off-by: David Sterba <dsterba@suse.com>
Handle potential allocation errors of the filtering and sorting helper
structures (they have to be initialized before the option parsing). Also
drop conditional free() (reported by CodeQL scan).

Signed-off-by: David Sterba <dsterba@suse.com>
CodeQL scan reports a few cases of checked free, while this is not done
in the rest of the code. Calling free(NULL) is valid, so drop the 'ifs'
for consistency.

Signed-off-by: David Sterba <dsterba@suse.com>
CodeQL scan reports missing header guard in stubs, the actual macro
definition was missing.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…block()

The code has been disabled since 2008 when syncing progs and kernel
code.

Signed-off-by: David Sterba <dsterba@suse.com>
CodeQL reports some commented out code which is from csum switch feature
development times. Remove one and make the other a proper comment.

Signed-off-by: David Sterba <dsterba@suse.com>
readdir() could potentially return NULL in case there's no device in
the directory, which is unlikely but should be handled. Reported by
CodeQL scan.

Signed-off-by: David Sterba <dsterba@suse.com>
CodeQL reports a few unused imports, all seem to be unused since the
inital code.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The 'ci-' prefix is not used anywhere else.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
The mkfs output is quite noisy in the test output so it's run with '-q',
however for debugging make it more visible in a variable.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
Some of the subvolume tests fail with ENOSPC with the image size 1G.
Change it to 4G, as 2G is still not enough. Also reset the image
completely before truncating it to the size so there are no stale data.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
Run the python tests in CI, coverage is still not complete due to
various problems in the python version support. The claimed minimum
version is still 3.6, while tested minimum is 3.9.

Signed-off-by: David Sterba <dsterba@suse.com>
The latest version from 5.x is 3 years old, no need to expand the full
version history in the navbar on RTD.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
kdave and others added 23 commits September 19, 2025 17:25
[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
The value is called read_io_errs elsewhere.

Pull-request: kdave#1032
Signed-off-by: Robert Joslyn <robert.joslyn@redrectangle.org>
Signed-off-by: David Sterba <dsterba@suse.com>
…sion

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
Add new option --offline that reads device stats from the device (or a
file image) directly, ie. not using the GET_DEV_STATS ioctl. For that
the argument must be a block device or a file.

  $ btrfs device stats /tmp/img1
  ERROR: not a directory: /tmp/img1
  ERROR: to read device stats from a file image use --offline option

The output is the same as for online (mounted) case, with the exception
of missing devices that may not show full path in the output. Then the
device name is "devid:N" where N is the device id:

The following output is from 4 file backed images img[1234] that are
added to a filesystem as loop devices and then removed. The files exist
but the device scan cannot see them:

  $ btrfs device stats --offline /tmp/img1
  [/tmp/img1].write_io_errs    0
  [/tmp/img1].read_io_errs     0
  [/tmp/img1].flush_io_errs    0
  [/tmp/img1].corruption_errs  0
  [/tmp/img1].generation_errs  0
  [devid:2].write_io_errs    0
  [devid:2].read_io_errs     0
  [devid:2].flush_io_errs    0
  [devid:2].corruption_errs  0
  [devid:2].generation_errs  0
  [devid:3].write_io_errs    0
  [devid:3].read_io_errs     0
  [devid:3].flush_io_errs    0
  [devid:3].corruption_errs  0
  [devid:3].generation_errs  0
  [devid:4].write_io_errs    0
  [devid:4].read_io_errs     0
  [devid:4].flush_io_errs    0
  [devid:4].corruption_errs  0
  [devid:4].generation_errs  0

Signed-off-by: David Sterba <dsterba@suse.com>
The commit 6f5b0b1 ("btrfs-progs: remove the unused fs_info
parameter for btrfs_csum_data()") changed btrfs_csum_data() but
btrfs-sb-mod was not update so build fails.

Signed-off-by: David Sterba <dsterba@suse.com>
ASAN build reports that there are some leaks:

Direct leak of 7 byte(s) in 1 object(s) allocated from:
    #0 0x7f7fa15200e4 in strdup (/lib64/libasan.so.8+0x1200e4)
    #1 0x000000579d7e in main /home/ds/x/g/btrfs-progs/btrfs-sb-mod.c:417
    kdave#2 0x7f7fa102b2fa in __libc_start_call_main (/lib64/libc.so.6+0x2b2fa)
    kdave#3 0x7ffce11eee69  ([stack]+0x20e69)
SUMMARY: AddressSanitizer: 7 byte(s) leaked in 1 allocation(s).

It's the operation name, so free it after it's processed.

Signed-off-by: David Sterba <dsterba@suse.com>
Use btrfs-sb-mod and do some arbitrary changes to super blocks. Then run
'btrfs check'. Check will fail to read the filesystem most of the time
and cannot do any actual repair but must not crash.

Signed-off-by: David Sterba <dsterba@suse.com>
The CodeQL scan points out a local variable is stored in a non-local
memory, in this case it's the uuid buffer for new chunk. There are error
paths leading to the label 'out' that do not reset it. This is not a
problem as this does not leak outside, but let's reset it in all cases.

Signed-off-by: David Sterba <dsterba@suse.com>
All the variables are for paths or path fragments so we know the upper
limit and don't need to allocate the memory dynamically.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The page flow goes from introduction right to the data structures, the
overview is a better fit.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
For better flow and overview move the ioctl list befroe the data
structures that are more specific.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
Add DEV_INFO, feature bit ioctls.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ fix a typo that always fails the CI ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
…ed branches

The Codespell workflow is useful and catches a lot of typos. However it
also makes development and CI use harder. Documentation updates do not
need the full CI so it's skipped. If a typo sneaks in any followup pull
request or devel update will fail.

We want to run the CI for code and updating and restarting the CI spends
all the free minutes available for the github actions.

At the slight inconvenience of missing some typos during development
limit the codespell workflow only on the 'release-test' branch or with
the prefix 'codespell/'.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
…vice

[BUG]
There is a reproducer that can trigger btrfs to flips RO:

  # mkfs.btrfs -f -mraid1 -draid1 /dev/sdd /dev/sde
  # mount /dev/sdd /mnt/btrfs
  # echo 1 > /sys/block/sde/device/delete
  # btrfs balance start -mconvert=dup -dconvert=single /mnt/btrfs
  ERROR: error during balancing '.': Input/output error
  There may be more info in syslog - try dmesg | tail

Then btrfs will flip read-only with the following errors:

  btrfs: attempt to access beyond end of device
  sde: rw=6145, sector=21696, nr_sectors = 32 limit=0
  btrfs: attempt to access beyond end of device
  sde: rw=6145, sector=21728, nr_sectors = 32 limit=0
  btrfs: attempt to access beyond end of device
  sde: rw=6145, sector=21760, nr_sectors = 32 limit=0
  BTRFS error (device sdd): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device sdd): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 1, corrupt 0, gen 0
  btrfs: attempt to access beyond end of device
  sde: rw=145409, sector=128, nr_sectors = 8 limit=0
  BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5)
  BTRFS error (device sdd): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0
  btrfs: attempt to access beyond end of device
  sde: rw=14337, sector=131072, nr_sectors = 8 limit=0
  BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5)
  BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0
  BTRFS error (device sdd): error writing primary super block to device 2
  BTRFS info (device sdd): balance: start -dconvert=single -mconvert=dup -sconvert=dup
  BTRFS info (device sdd): relocating block group 1372585984 flags data|raid1
  BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 2, corrupt 0, gen 0
  BTRFS warning (device sdd): chunk 2446327808 missing 1 devices, max tolerance is 0 for writable mount
  BTRFS: error (device sdd) in write_all_supers:4044: errno=-5 IO failure (errors while submitting device barriers.)
  BTRFS info (device sdd state E): forced readonly
  BTRFS warning (device sdd state E): Skipping commit of aborted transaction.
  BTRFS error (device sdd state EA): Transaction aborted (error -5)
  BTRFS: error (device sdd state EA) in cleanup_transaction:2017: errno=-5 IO failure
  BTRFS info (device sdd state EA): balance: ended with status: -5

[CAUSE]
The root cause is that, deleting devices using sysfs interface normally
will trigger the shutdown callback for the fs.

But btrfs doesn't handle that callback at all, thus it can not really
know that device is no longer available, thus btrfs will still try to do
usual read/write on that device.

This is fine if the user do nothing, as RAID1 can handle it properly.

But if we try to convert to SINGLE/DUP, btrfs will still use that device
to allocate new data/metadata chunks.
And if a new metadata chunk is allocated to the removed device, all the
write will be lost, and trigger the super block write/barrier errors
above.

[USER SPACE ENHANCEMENT]
For now, add extra missing devices check at btrfs-balance command.
If there is a missing devices, 'btrfs balance' will add a 10 seconds
delay and warn the possible dangerous.

The root fix is to introduce a failing/removed device detection for
btrfs, but that will be a pretty big feature and will take quite some
time before landing it upstream.

Pull-request:  kdave#946
Reported-by: Jeff Siddall <news@siddall.name>
Link: https://lore.kernel.org/linux-btrfs/2cb1d81e-12a8-4fb1-b3fc-e7e83d31e059@siddall.name/
Signed-off-by: Qu Wenruo <wqu@suse.com>
…ified

[BUG]
There is a bug report that, when using --chunk-root option, if the
specified chunk root tree block has a different level than the one in
the super block, btrfs check will reject the run:

  ERROR: root [3 0] level 0 does not match 1
  ERROR: cannot read chunk root
  ERROR: cannot open file system
  Opening filesystem to check...

[CAUSE]
During btrfs_setup_chunk_tree_and_device_map(), although it accepts a
@chunk_root_bytenr parameter, it still uses the chunk root level from
the super block.

Thus if the provided chunk root is at a different level, it will still
be rejected.

[FIX]
Read out the tree block at @chunk_root_bytenr, and use the level from
the tree block instead.

Pull-request: kdave#1037
Link: https://lore.kernel.org/linux-btrfs/CAKZK7uxiRmDxk-1goC4yj7QZPSmL-=GAoAuF=OdekbSNVrG8fg@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
As reported (cppcheck), the conditions are reversed as we first need to
check the limit and only then access the data.

Issue: kdave#1030
Signed-off-by: David Sterba <dsterba@suse.com>
…_table_head()

As reported (cppcheck), there's bitwise operator used in place of
logical. This will still work but it's a misuse.

Issue: kdave#1031
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that legacy code of "btrfs rescue chunk-recover"
is triggering false alerts from tree-checker, and refuse to work:

  # btrfs rescue chunk-recover /dev/nvme1n1p1
  Scanning: DONE in dev0
  corrupt leaf: root=1 block=13924671995904 slot=0, unexpected item end, have 16283 expect 0 <<< Note the "expect 0"
  leaf 13924671995904 items 11 free space 12709 generation 1589644 owner ROOT_TREE
  leaf 13924671995904 flags 0x1(WRITTEN) backref revision 1
  [...]
  Couldn't read tree root
  open with broken chunk error

[CAUSE]
The item end checks is from __btrfs_check_leaf() from tree-checker,
and for the first slot of a leaf, the expected end should be
BTRFS_LEAF_DATA_SIZE(), which is fetched from fs_info->leaf_data_size.

However for the fs_info opened by chunk recover, it's not going through
the regular open_ctree(), but open_ctree_with_broken_chunk(), which
doesn't populate that member and resulting BTRFS_LEAF_DATA_SIZE() to
return 0.

[FIX]
There is no need to cache leaf_data_size, as it can be easily calulated
using nodesize.

And kernel is already doing that, so follow the kernel to remove
btrfs_fs_info::leaf_data_size, and use a simple inline function to do
the calculation instead.

Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CABXGCsOug_bxVZ5CN1EM0sd9U4JAz=Jf5EB2TQe8gs9=KZvWEA@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants