Skip to content

Conversation

romanstingler
Copy link

@romanstingler romanstingler commented Sep 26, 2025

https://download.virtualbox.org/virtualbox/7.2.2/ -> VBoxGuestAdditions_7.2.2.iso

❯ sudo mkdir -p /mnt/vbox
  sudo mount -o loop VBoxGuestAdditions_7.2.2.iso /mnt/vbox
  cp /mnt/vbox/VBoxLinuxAdditions.run . 
  sudo umount /mnt/vbox
  rmdir /mnt/vbox
  ls -l VBoxLinuxAdditions.run 

Partial Write Bug in uutils dd

The partial write bug in uutils dd occurs when writing large blocks to a pipe with a slow reader, resulting in truncated output (as seen with 0+1 records out and mismatched MD5 sums). The key area of interest is how dd handles writing data to the output destination.

In dd.rs, the write_block method of the Output struct (lines 864-883) is responsible for writing a block of data to the destination.

Issue Identified

The current implementation retries writing if the write operation is interrupted (io::ErrorKind::Interrupted), which is correct. However, it does not handle the case where a partial write occurs (i.e., wlen < chunk[base_idx..].len()) without being interrupted. When writing to a pipe with a slow reader, the kernel may return a partial write (less than the requested amount) without an error, and the code exits the loop if !self.settings.iflags.fullblock is true. Since iflags.fullblock is typically not set for output operations (it's meant for input), the loop exits after the first partial write, leading to truncated output.

Root Cause

The condition if (base_idx >= full_len) || !self.settings.iflags.fullblock means that unless fullblock is set (which it often isn't for output), the function returns after the first write attempt, even if only part of the data was written. This mimics the behavior we observed in tests where uutils dd does not retry to write the remaining data, causing the 0+1 records out and mismatched byte counts/MD5 sums.

Proposed Fix

To fix the partial write issue, we need to ensure that write_block continues to retry writing until the entire block is written or an error occurs, regardless of the fullblock flag. The fullblock flag should only apply to input operations, not output. Here's how we can modify the code:

  • Remove the !self.settings.iflags.fullblock condition from the loop exit criteria in write_block.
  • Continue looping until base_idx >= full_len or a non-interrupted error occurs.

This change ensures that uutils dd matches the behavior of GNU dd in handling partial writes to slow pipes, preventing data truncation.

related
https://bugs.launchpad.net/ubuntu/+source/makeself/+bug/2125535
VirtualBox/virtualbox#226 (comment)
megastep/makeself@51e7299

I wanted to hear your opinion before adding some tests

#!/bin/bash

# Define sizes, Increased to stress pipe buffer
SKIP_BYTES=20487
BLOCK1=4194304
BLOCK2=20971520  # 20MB to exceed typical pipe buffers
TOTAL_COPY=$((BLOCK1 + BLOCK2))  # 25165824
FILE_SIZE=$((SKIP_BYTES + TOTAL_COPY))  # 25186311

# Use VBox file if available
TESTFILE="testfile"
VBOX_FILE="VBoxLinuxAdditions.run"
if [ -f "$VBOX_FILE" ]; then
    TESTFILE="$VBOX_FILE"
    echo "Using $VBOX_FILE (size: $(stat -c %s "$VBOX_FILE" 2>/dev/null) bytes)"
    ACTUAL_SIZE=$(stat -c %s "$VBOX_FILE" 2>/dev/null)
    if [ "$ACTUAL_SIZE" -lt $((SKIP_BYTES + TOTAL_COPY)) ]; then
        TOTAL_COPY=$((ACTUAL_SIZE - SKIP_BYTES))
        BLOCK2=$((TOTAL_COPY - BLOCK1))
        echo "Adjusted TOTAL_COPY=$TOTAL_COPY, BLOCK2=$BLOCK2"
    fi
else
    echo "No VBox file, generating $TESTFILE (size: $FILE_SIZE bytes)"
    dd if=/dev/urandom of="$TESTFILE" bs="$FILE_SIZE" count=1 status=progress
fi

# Expected MD5 and size
EXPECTED_MD5=$(tail -c +$((SKIP_BYTES + 1)) < "$TESTFILE" | head -c "$TOTAL_COPY" | md5sum | cut -d' ' -f1)
EXPECTED_SIZE=$TOTAL_COPY
echo "Expected MD5: $EXPECTED_MD5"
echo "Expected size: $EXPECTED_SIZE bytes"
echo "System: $(uname -a), Pipe buffer: $(cat /proc/sys/fs/pipe-max-size 2>/dev/null), GNU dd: $(dd --version | head -n1), uutils dd: $(uu-dd --version)"

# Run dd chain
function run_dd_chain() {
    local DD_CMD="$1"
    local LABEL="$2"
    local COUNT="$3"
    local PIPE="$4"
    local FLAGS="$5"
    { OUTPUT_MD5=$(sh -c "($DD_CMD ibs=$SKIP_BYTES skip=1 count=$COUNT status=noxfer && $DD_CMD bs=$BLOCK1 count=1 status=noxfer && $DD_CMD bs=$BLOCK2 count=1 $FLAGS status=noxfer) < $TESTFILE" | sh -c "$PIPE" | cut -d' ' -f1); } 2> "${LABEL}_records.txt"
    ACTUAL_SIZE=$(sh -c "($DD_CMD ibs=$SKIP_BYTES skip=1 count=$COUNT status=noxfer && $DD_CMD bs=$BLOCK1 count=1 status=noxfer && $DD_CMD bs=$BLOCK2 count=1 $FLAGS status=noxfer) < $TESTFILE" | wc -c)
    echo ""
    echo "$LABEL (count=$COUNT, flags=$FLAGS): MD5=$OUTPUT_MD5, Size=$ACTUAL_SIZE bytes"
    cat "${LABEL}_records.txt"
    if [ "$OUTPUT_MD5" == "$EXPECTED_MD5" ] && [ "$ACTUAL_SIZE" -eq "$EXPECTED_SIZE" ]; then
        echo "Match."
    else
        echo "Mismatch (truncation/extra data)."
    fi
}

# Tests
function run_tests() {
    local DD_CMD="$1" ; local BASE="$2"
    echo -e "\n=== $BASE buggy (count=1) ===" ; run_dd_chain "$DD_CMD" "${BASE}_buggy" "1" "md5sum" ""
    echo -e "\n=== $BASE fixed (count=0) ===" ; run_dd_chain "$DD_CMD" "${BASE}_fixed" "0" "md5sum" ""
    echo -e "\n=== $BASE fixed + rate-limited ===" ; run_dd_chain "$DD_CMD" "${BASE}_ratelimit" "0" "pv -L 500k | md5sum" ""
    echo -e "\n=== $BASE fixed + iflag=fullblock ===" ; run_dd_chain "$DD_CMD" "${BASE}_fullblock" "0" "md5sum" "iflag=fullblock"
    echo -e "\n=== $BASE fixed + ibs=$BLOCK2 ===" ; run_dd_chain "$DD_CMD" "${BASE}_ibs" "0" "md5sum" "ibs=$BLOCK2"
}

run_tests "dd" "GNU"
run_tests "/srv/opensource/coreutils/target/debug/dd" "uutils"

# Cleanup
rm -f *_records.txt
[ "$TESTFILE" = "testfile" ] && rm -f "$TESTFILE"

this was my dirty hack

// Test for the fix of the partial write bug in dd
#[test]
fn test_partial_write_slow_pipe_md5() {
    use std::fs::File;
    use std::process::Command;
    use tempfile::NamedTempFile;

    // Use VBoxLinuxAdditions.run as input file
    let input_path = "/srv/opensource/coreutils/VBoxLinuxAdditions.run";
    assert!(
        File::open(input_path).is_ok(),
        "Input file VBoxLinuxAdditions.run not found"
    );

    let skip_bytes = 20487;
    let block1 = 4194304;
    let block2 = 20971520;
    let count = 1;

    // Expected MD5 sum for FIXED behavior (complete data)
    let expected_fixed_md5 = "201ccc47587bb1c28745279e8b7fdd30";
    let expected_fixed_size = 6635520;

    let records_file = NamedTempFile::new().unwrap();
    let records_path = records_file.path().to_str().unwrap();

    // Run the command that previously showed the bug, now fixed
    // This matches the command structure in the test-dd-bug-v2.sh script
    let md5_command = format!(
        "{{ OUTPUT_MD5=$(sh -c \"(./target/debug/dd ibs={} skip=1 count={} status=noxfer && ./target/debug/dd bs={} count=1 status=noxfer && ./target/debug/dd bs={} count=1 status=noxfer) < {}\" | md5sum | cut -d' ' -f1); }} 2> {}; echo $OUTPUT_MD5",
        skip_bytes, count, block1, block2, input_path, records_path
    );

    let md5_output = Command::new("bash")
        .arg("-c")
        .arg(&md5_command)
        .output()
        .expect("Failed to execute MD5 command");

    let output_md5 = String::from_utf8_lossy(&md5_output.stdout)
        .trim()
        .to_string();

    // Get the size using the same command structure
    let size_command = format!(
        "sh -c \"(./target/debug/dd ibs={} skip=1 count={} status=noxfer && ./target/debug/dd bs={} count=1 status=noxfer && ./target/debug/dd bs={} count=1 status=noxfer) < {}\" | wc -c",
        skip_bytes, count, block1, block2, input_path
    );

    let size_output = Command::new("bash")
        .arg("-c")
        .arg(&size_command)
        .output()
        .expect("Failed to execute size command");

    let size: u64 = String::from_utf8_lossy(&size_output.stdout)
        .trim()
        .parse()
        .unwrap_or(0);

    // Read the stderr output to check for record counts
    let records_output = Command::new("bash")
        .arg("-c")
        .arg(format!("cat {}", records_path))
        .output()
        .expect("Failed to read records file");

    let stderr_str = String::from_utf8_lossy(&records_output.stdout);
    println!("DD stderr output:\n{}", stderr_str);
    println!("Output size: {} bytes", size);
    println!("Output MD5: {}", output_md5);

    // With the fix applied, we still expect to see '0+1 records out' for the last dd command
    // because of how the test is set up, but the data should be complete
    assert!(
        stderr_str.contains("0+1 records out"),
        "Expected '0+1 records out' in stderr, but got: {}\n",
        stderr_str
    );

    // Compare with expected FIXED MD5
    assert_eq!(
        output_md5, expected_fixed_md5,
        "MD5 sum does not match expected FIXED value. Expected: {}, Got: {}. The bug may have been fixed.",
        expected_fixed_md5, output_md5
    );

    // Check if size matches expected fixed size
    assert_eq!(
        size, expected_fixed_size,
        "Size does not match expected FIXED size. Expected: {}, Got: {}.",
        expected_fixed_size, size
    );
}

test result: ok. 3685 passed; 0 failed; 34 ignored; 0 measured; 0 filtered out; finished in 15.55s

@sylvestre

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

Copy link

codspeed-hq bot commented Sep 27, 2025

CodSpeed Performance Report

Merging #8750 will not alter performance

Comparing romanstingler:main (10c647f) with main (32eef06)

Summary

✅ 93 untouched
⏩ 1 skipped1

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@arachnist
Copy link

While this eliminates this particular way of triggering this problem, if I'm reading the main loop break condition code correctly, it would still exit in a situation where there was still data left in the write buffer, but no more data left to read from the input.

@romanstingler
Copy link
Author

As far as I understand the code the
finalize(o, rstat, wstat, start, &prog_tx, output_thread, truncate)
function, called after the main loop, explicitly flushes any remaining data in the output buffer with
output.flush()
and ensures data integrity with
output.sync().
This means that even if the loop exits when input is exhausted, no data will be lost due to unwritten buffered content -- to my understanding.

@sylvestre
Copy link
Contributor

could you please add a test ? thanks

@hmaarrfk
Copy link

hmaarrfk commented Oct 8, 2025

Xref #8840

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants