fix: test_multi_threaded_lock_expiration_recovery timing race with connection latency

## Problem

The `test_multi_threaded_lock_expiration_recovery` validation test in `src/provider_validation/instance_locking.rs` has a timing race condition that causes spurious failures when connection pool establishment has latency.

## Current Test Logic

```rust
// Thread 1: Fetch and hold lock (don't ack)
let handle1 = tokio::spawn({
    async move {
        let (item, lock_token, _attempt_count) = provider1
            .fetch_orchestration_item(lock_timeout, Duration::ZERO)  // <-- Connection latency here
            .await
            .unwrap()
            .unwrap();
        // Hold lock but don't ack
        tokio::time::sleep(lock_timeout + Duration::from_millis(200)).await;
        lock_token
    }
});

// Thread 2: Try to fetch immediately (should fail)
let handle2 = tokio::spawn({
    async move {
        tokio::time::sleep(Duration::from_millis(200)).await;  // <-- Timer starts at spawn, not after thread 1 acquires lock
        // ...
    }
});

// Thread 3: Wait for expiration and then fetch (should succeed)
let handle3 = tokio::spawn({
    async move {
        tokio::time::sleep(lock_timeout + Duration::from_millis(100)).await;  // <-- Timer starts at spawn
        // ...
    }
});
```

## The Race Condition

All three threads are spawned and start their timers simultaneously at `t=0`. However:

1. **Thread 1** may have connection establishment latency (e.g., 50-200ms) before acquiring the lock
2. **Thread 2** wakes at `t=200ms` - may wake BEFORE thread 1 acquires the lock if connection latency > 200ms
3. **Thread 3** wakes at `t=lock_timeout+100ms` - if thread 1's lock didn't start until `t=100ms` due to connection latency, the lock doesn't expire until `t=lock_timeout+100ms`, creating a race

This causes the test to be flaky on:
- Cold connection pools (first test run)
- Remote databases with higher latency
- Resource-constrained CI environments

## Proposed Fix

Use a barrier or signal to ensure thread 2 and thread 3 timers start AFTER thread 1 has confirmed lock acquisition:

```rust
// Thread 1: Fetch and hold lock (don't ack)
let (lock_acquired_tx, lock_acquired_rx) = tokio::sync::oneshot::channel::<()>();
let handle1 = tokio::spawn({
    async move {
        let (item, lock_token, _attempt_count) = provider1
            .fetch_orchestration_item(lock_timeout, Duration::ZERO)
            .await
            .unwrap()
            .unwrap();
        // Signal that lock is now held
        let _ = lock_acquired_tx.send(());
        // Hold lock but don't ack
        tokio::time::sleep(lock_timeout + Duration::from_millis(200)).await;
        lock_token
    }
});

// Thread 2: Try to fetch shortly after thread 1 acquires lock (should fail)
let lock_acquired_rx2 = lock_acquired_rx.shared(); // or use broadcast
let handle2 = tokio::spawn({
    async move {
        lock_acquired_rx2.await.ok();  // Wait for thread 1 to acquire lock
        tokio::time::sleep(Duration::from_millis(100)).await;  // Then wait a bit
        // ...
    }
});

// Thread 3: Wait for expiration after thread 1 acquired lock (should succeed)
let handle3 = tokio::spawn({
    async move {
        lock_acquired_rx3.await.ok();  // Wait for thread 1 to acquire lock
        tokio::time::sleep(lock_timeout + Duration::from_millis(100)).await;  // Then wait for expiration
        // ...
    }
});
```

Alternatively, add more margin (e.g., 500ms instead of 100ms for thread 3) to account for typical connection latencies.

## Impact

Affects any provider implementation that:
- Uses a connection pool with cold-start latency
- Is tested on remote databases
- Runs in resource-constrained environments (CI)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: test_multi_threaded_lock_expiration_recovery timing race with connection latency #32

Problem

Current Test Logic

The Race Condition

Proposed Fix

Impact

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fix: test_multi_threaded_lock_expiration_recovery timing race with connection latency #32

Description

Problem

Current Test Logic

The Race Condition

Proposed Fix

Impact

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions