-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Bug Description
dx serve fails intermittently with "Operation not permitted (os error 1)" when copying the compiled server executable. This is a race condition between cargo completing the build and Dioxus attempting to copy the executable before the OS fully releases file handles.
Failure rate: 20-90% depending on build frequency (verified through stress testing)
ERROR Build failed: Failed to write executable
1: Operation not permitted (os error 1)
Environment
- OS: macOS 14.2+ (tested on Apple Silicon)
- Dioxus Version: 0.7.3
- Rust: 1.93.0
- Platform: Fullstack (web + server)
Root Cause
Location: packages/cli/src/build/request.rs:1417
BundleFormat::Server => {
std::fs::create_dir_all(self.exe_dir())?;
std::fs::copy(exe, self.main_exe())?; // ← FAILS HERE
}Race condition sequence:
- Cargo builds the server executable at
target/aarch64-apple-darwin/server-dev/<package> - Cargo process exits successfully
- Race window: OS hasn't fully released file handles/locks yet
- Dioxus immediately tries to copy executable to
target/dx/<package>/debug/web/<package>-<hash> std::fs::copy()fails with EPERM if OS still has file locked
Timing: There's a variable delay (usually 0-200ms) between cargo exit and OS file handle release. Build frequency and system load affect this window.
Reproduction
Stress Test (20-90% failure rate)
#!/bin/bash
# Rapidly restart dx serve to trigger race condition
for i in {1..20}; do
echo "Run #$i..."
rm -rf target/dx/panelist/debug/web/panelist-* 2>/dev/null
timeout 60s dx serve --interactive=false --package myapp > /tmp/dx-test-$i.log 2>&1 &
DX_PID=$!
sleep 30
if grep -q "Operation not permitted" /tmp/dx-test-$i.log; then
echo " ❌ FAILURE"
else
echo " ✅ SUCCESS"
fi
kill $DX_PID 2>/dev/null
sleep 2
doneResult: 4-18 failures out of 20 runs (20-90% failure rate)
Why This is a Heisenbug
The error disappears when debugging:
- ✅ Adding verbose logging/tracing → builds succeed (I/O delays fix timing)
- ✅ Running with
--verbose→ builds succeed - ✅ Using
DIOXUS_LOG=trace→ builds succeed - ❌ Normal
dx serve→ intermittent failures
Classic observation-changes-behavior pattern.
Verified Fix
Implementation: Retry logic with exponential backoff
use std::time::Duration;
// In write_executable() at line 1415:
BundleFormat::Server => {
std::fs::create_dir_all(self.exe_dir())?;
// Retry copy with exponential backoff to handle race conditions
let mut attempts = 0;
let max_attempts = 5;
loop {
match std::fs::copy(exe, self.main_exe()) {
Ok(_) => {
if attempts > 0 {
tracing::info!(
"✅ Executable copy succeeded after {} retries",
attempts
);
}
break;
}
Err(e) if e.raw_os_error() == Some(1) && attempts < max_attempts => {
attempts += 1;
let delay = Duration::from_millis(10 * 2_u64.pow(attempts));
tracing::warn!(
"⚠️ Failed to copy executable (attempt {}/{}), retrying in {:?}: {}",
attempts, max_attempts, delay, e
);
tokio::time::sleep(delay).await;
}
Err(e) => return Err(e.into()),
}
}
}Required import (line ~350):
use std::time::{Duration, SystemTime, UNIX_EPOCH};Verification Results
Stress test with fix: 20/20 builds succeeded (0% failure rate)
- 18 builds succeeded immediately (no race condition)
- 2 builds needed 1 retry (detected and fixed race condition)
- Retry delays: 20ms, 40ms, 80ms, 160ms, 320ms (exponential backoff)
Why This Fix Works
- Catches the race window: Retries give OS time to release file handles
- Exponential backoff: Avoids tight retry loops, increasingly more time for OS cleanup
- Bounded retries: Fails after 5 attempts (max 620ms delay) to avoid infinite loops
- Minimal overhead: Only adds delay when race actually occurs (~10% of builds)
- Observable: Logs show when retries happen for debugging
- Targeted: Only retries on EPERM (errno 1), other errors fail immediately
Alternative Solutions Considered
Option 1: Fixed delay before copy
tokio::time::sleep(Duration::from_millis(100)).await;
std::fs::copy(exe, self.main_exe())?;❌ Rejected: Adds unnecessary delay to 90% of builds that don't need it
Option 2: Symlink instead of copy
#[cfg(unix)]
std::os::unix::fs::symlink(exe, self.main_exe())?;❌ Rejected: Platform-specific, breaks deployment workflows expecting copied files
Option 3: Poll for file accessibility
for _ in 0..50 {
if std::fs::File::open(exe).and_then(|f| f.sync_all()).is_ok() {
break;
}
tokio::time::sleep(Duration::from_millis(20)).await;
}❌ Rejected: Complex, may not detect all lock types, max 1s delay
Recommendation: Retry with exponential backoff (implemented above) is the most robust and performant solution.
Impact
- Affects: All macOS users (possibly Linux/Windows with different timing)
- Frequency: 20-90% of builds depending on system load and build frequency
- Workaround: Use verbose logging or add manual delays (suboptimal)
- Fix: Simple, low-risk change with verified 100% success rate
Related
- Comment in code (line 1402):
// todo(jon): maybe just symlink this rather than copy it? - This suggests the copy operation was always questionable
- Our fix maintains copy semantics while handling the race condition