Skip to content

fix: race condition in tests involving Kea (2)#315

Open
pbreton wants to merge 1 commit intoNVIDIA:mainfrom
pbreton:chore/test-fails-on-recent-ubuntu
Open

fix: race condition in tests involving Kea (2)#315
pbreton wants to merge 1 commit intoNVIDIA:mainfrom
pbreton:chore/test-fails-on-recent-ubuntu

Conversation

@pbreton
Copy link
Contributor

@pbreton pbreton commented Feb 17, 2026

Problem

If running cargo test -p carbide-dhcp --test booturl you can get:

Error: "KEA STDOUT: kea-dhcp4 already running? Daemon::createPIDFile: PID: 3788736 exists, PID file: /tmp/kea-dhcp4.kea-dhcp4.pid"

Root Cause

Tests run in parallel use the same PID file.

Solution

Ensure each test uses it's own PID file.

Notes

  • this allows running tests on a recent version of Kea (2.4.1) on Ubuntu Ubuntu 24.04.3
  • on recent Ubuntu with AppArmor enforced you also need to use sudo aa-complain /usr/sbin/kea-dhcp4
  • this is a new iteration of previous PR#142 that seems broken after latest repo changes.

Copilot AI review requested due to automatic review settings February 17, 2026 18:07
@pbreton pbreton requested a review from a team as a code owner February 17, 2026 18:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Kea test harness in carbide-dhcp to avoid PID/lock-file collisions when tests run in parallel, targeting newer Kea versions on Ubuntu.

Changes:

  • Renames the per-test temp directory handle and introduces explicit pidfile_dir/lockfile_dir fields.
  • Sets KEA_PIDFILE_DIR/KEA_LOCKFILE_DIR from those new fields when spawning kea-dhcp4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// The config file is stored in this directory
_temp_base_directory: TempDir,

// Directories for PID and lock files that AppArmor allows
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says these are "Directories for PID and lock files that AppArmor allows", but the implementation points them at a per-test temp directory and later notes this requires AppArmor complain mode. Please update the wording to avoid implying this path is AppArmor-allowed by default (or adjust the implementation to actually use an AppArmor-allowed location).

Suggested change
// Directories for PID and lock files that AppArmor allows
// Directories for PID and lock files used by Kea during tests; these paths must be
// permitted by the AppArmor profile or used while AppArmor is in complain mode.

Copilot uses AI. Check for mistakes.
// Use the unique temp directory for each test instance to avoid conflicts
// when tests run in parallel. This requires AppArmor to be in complain mode.
let pidfile_dir = _temp_base_directory.path().to_path_buf();
let lockfile_dir = _temp_base_directory.path().to_path_buf();
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pidfile_dir and lockfile_dir are initialized to the same value; you can avoid the duplicate to_path_buf() call by cloning a single PathBuf (or by keeping one field and reusing it for both env vars).

Suggested change
let lockfile_dir = _temp_base_directory.path().to_path_buf();
let lockfile_dir = pidfile_dir.clone();

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +76
pub fn run(&mut self) -> Result<(), eyre::Report> {
let mut process = Command::new("/usr/sbin/kea-dhcp4")
.env("KEA_PIDFILE_DIR", self.temp_base_directory.path())
.env("KEA_LOCKFILE_DIR", self.temp_base_directory.path())
.env("KEA_PIDFILE_DIR", &self.pidfile_dir)
.env("KEA_LOCKFILE_DIR", &self.lockfile_dir)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears functionally equivalent to the previous behavior: KEA_PIDFILE_DIR/KEA_LOCKFILE_DIR were already set to a per-instance TempDir via self.temp_base_directory.path(), and now they’re set to the same TempDir path via pidfile_dir/lockfile_dir. If the failure is still producing /tmp/kea-dhcp4.kea-dhcp4.pid, something else is likely causing Kea to ignore these env vars (e.g., being unable to write to the provided directory). Please clarify in code/PR what behavior changed here that eliminates the shared PID file, or adjust the implementation so the PID/lock path actually changes for Kea in the failing environment.

Copilot uses AI. Check for mistakes.
Comment on lines 28 to 35
// Hold this around so that when Kea is dropped, TempDir is dropped and cleaned up
temp_base_directory: TempDir,
// The config file is stored in this directory
_temp_base_directory: TempDir,

// Directories for PID and lock files that AppArmor allows
pidfile_dir: PathBuf,
lockfile_dir: PathBuf,

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pidfile_dir/lockfile_dir are always derived from the temp dir and never diverge from _temp_base_directory.path(). This adds state to keep in sync and forces the TempDir field to be prefixed with _ just to silence "field is never read". Consider removing pidfile_dir/lockfile_dir and using self.temp_base_directory.path() directly in run() (or keep them but also read _temp_base_directory so the extra fields aren’t needed).

Copilot uses AI. Check for mistakes.
@pbreton pbreton force-pushed the chore/test-fails-on-recent-ubuntu branch from 1536c6f to 0de1af9 Compare February 17, 2026 18:30
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ajf
Copy link
Collaborator

ajf commented Feb 18, 2026

@pbreton make sure to sign your commits to make tests run automatically

@ajf
Copy link
Collaborator

ajf commented Feb 18, 2026

/ok to test 0de1af9

Copy link
Contributor

@kensimon kensimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm going to ignore the copilot noise and just review this... I'm not sure why we get copilot chiming in on these PR's, is it something you set up?)

I don't think we need to store a separate pidfile dir and lockfile dir... the real fix here seems to be the fact that you're specifying the the (unique to this test run) temp dir as the KEA_PIDFILE_DIR and KEA_LOCKFILE_DIR env vars. But you don't need to actually store them in the struct... you can just set each env var to the same path (since that's what you're doing here anyway).

You can add a helper with:

impl Kea {
    // ... snip ...
    fn config_file_path(&self) -> PathBuf {
        self.temp_base_directory.path().join("kea-dhcp4.conf")
    }
}

The Kea struct can become:

pub struct Kea {
    dhcp_in_port: u16,
    dhcp_out_port: u16,
    process: Option<Child>,
    // Hold this around so that when Kea is dropped, TempDir is dropped and cleaned up
    // The config file is stored in this directory
    temp_base_directory: TempDir,
}

Kea::new() can become:

        let temp_base_directory = tempfile::tempdir()?;
        let kea = Kea {
            temp_base_directory,
            dhcp_in_port,
            dhcp_out_port,
            process: None,
        };
        std::fs::write(
            kea.config_file_path(),
            Kea::config(api_server_url).as_bytes(),
        )?;
        Ok(kea)

Then you run kea with:

            .env("KEA_PIDFILE_DIR", &self.temp_base_directory.path())
            .env("KEA_LOCKFILE_DIR", &self.temp_base_directory.path())
            .arg("-c")
            .arg(self.config_file_path())

@pbreton pbreton force-pushed the chore/test-fails-on-recent-ubuntu branch from 0de1af9 to 479c955 Compare February 27, 2026 03:04
@pbreton
Copy link
Contributor Author

pbreton commented Feb 27, 2026

This PR is now a no-op, just refactoring based on Ken's comment: initial conditions are no longer present.
I can no longer trigger the error in the description on my test Linux host from HEAD of main.

@pbreton pbreton force-pushed the chore/test-fails-on-recent-ubuntu branch from 479c955 to b72983d Compare February 27, 2026 15:33
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
@pbreton pbreton force-pushed the chore/test-fails-on-recent-ubuntu branch from b72983d to b41fc6f Compare February 27, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants