Skip to content

Commit 69ed3a3

Browse files
committed
Fix FreeBSD segfault by unlocking mutex before destruction
Root cause: FreeBSD's robust mutex implementation (libthr) maintains a per-thread robust list of mutexes. The error 'rb error 14' (EFAULT - Bad address) indicates that the robust list contained a dangling pointer to a destroyed mutex. When a mutex object is destroyed (via close() or clear()), if the mutex is still in the current thread's robust list, FreeBSD's libthr may try to access it later and encounter an invalid pointer, causing a segmentation fault. This happened in MutexTest.TryLockExceptionSafety because: 1. The test called try_lock() which successfully acquired the lock 2. The test ended without calling unlock() 3. The mutex destructor called close() 4. close() called pthread_mutex_destroy() on a mutex that was: - Still locked by the current thread, OR - Still in the thread's robust list Solution: Call pthread_mutex_unlock() before pthread_mutex_destroy() in both close() and clear() methods. This ensures: 1. The mutex is unlocked if we hold the lock 2. The mutex is removed from the thread's robust list 3. Subsequent pthread_mutex_destroy() is safe We ignore errors from pthread_mutex_unlock() because: - If we don't hold the lock, EPERM is expected and harmless - If the mutex is already unlocked, this is a no-op - Even if there's an error, we still want to proceed with cleanup This fix is platform-agnostic and should not affect Linux/QNX behavior, as both also use pthread robust mutexes with similar semantics. Fixes the segfault in MutexTest.TryLockExceptionSafety on FreeBSD 15.
1 parent 47fa303 commit 69ed3a3

File tree

2 files changed

+214
-0
lines changed

2 files changed

+214
-0
lines changed

FREEBSD_SEGFAULT_ANALYSIS.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# FreeBSD Segmentation Fault Analysis
2+
3+
## Problem Description
4+
The test `MutexTest.TryLockExceptionSafety` crashes with a segmentation fault on FreeBSD 15 with GCC 13.3.
5+
6+
### Error Message
7+
```
8+
pid 2326 (test-ipc), jid 0, uid 0: exited on signal 11 (core dumped)
9+
core-test-ipc.pid-2326: handling rb error 14
10+
Segmentation fault (core dumped)
11+
```
12+
13+
Key observations:
14+
- Signal 11 = SIGSEGV (Segmentation Fault)
15+
- `rb error 14` = likely "robust list error 14", where 14 = `EFAULT` (Bad address)
16+
- This suggests a problem with robust mutex list management in FreeBSD's libthr
17+
18+
## Hypotheses
19+
20+
### Hypothesis 1: Invalid Pointer in `pthread_mutex_timedlock`
21+
The `mutex_` pointer passed to `pthread_mutex_timedlock()` might be invalid or point to freed memory.
22+
23+
**Why this could happen:**
24+
- The shared memory region might not be properly initialized
25+
- The `shm_->get()` might return an invalid pointer
26+
- There could be a race condition in `acquire_mutex()`
27+
28+
### Hypothesis 2: Robust Mutex List Corruption
29+
FreeBSD's robust mutex implementation maintains a per-thread list of robust mutexes. The error `rb error 14 (EFAULT)` suggests that this list might be corrupted.
30+
31+
**Why this could happen:**
32+
- After our previous fix removing `shm_->sub_ref()`, the robust list might not be properly maintained
33+
- There might be a dangling pointer in the robust list
34+
- The mutex might have been destroyed while still in the robust list
35+
36+
### Hypothesis 3: `valid()` Check Failure
37+
The `valid()` method uses `std::memcmp()` to compare the mutex against a zero-initialized mutex. On FreeBSD, if the mutex contains internal pointers, this comparison might access invalid memory.
38+
39+
### Hypothesis 4: Shared Memory Mapping Issue
40+
The shared memory region might not be properly mapped in all processes/threads, causing the mutex pointer to be invalid in some contexts.
41+
42+
## Debugging Strategy
43+
44+
### Step 1: Add Diagnostic Output
45+
Modify `try_lock()` to add detailed logging:
46+
47+
```cpp
48+
bool try_lock() noexcept(false) {
49+
ipc::error("[DEBUG] try_lock() called\n");
50+
ipc::error("[DEBUG] valid() = %d, shm_ = %p, ref_ = %p, mutex_ = %p\n",
51+
valid(), shm_, ref_, mutex_);
52+
53+
if (!valid()) {
54+
ipc::error("[DEBUG] try_lock() returning false (not valid)\n");
55+
return false;
56+
}
57+
58+
ipc::error("[DEBUG] Calling make_timespec(0)\n");
59+
auto ts = posix_::detail::make_timespec(0);
60+
ipc::error("[DEBUG] ts.tv_sec = %ld, ts.tv_nsec = %ld\n", ts.tv_sec, ts.tv_nsec);
61+
62+
ipc::error("[DEBUG] Calling pthread_mutex_timedlock()\n");
63+
int eno = ::pthread_mutex_timedlock(mutex_, &ts);
64+
ipc::error("[DEBUG] pthread_mutex_timedlock returned %d\n", eno);
65+
66+
// ... rest of the function
67+
}
68+
```
69+
70+
### Step 2: Verify Mutex Initialization
71+
Check if the mutex is properly initialized before use:
72+
73+
```cpp
74+
bool open(char const *name) noexcept {
75+
close();
76+
if ((mutex_ = acquire_mutex(name)) == nullptr) {
77+
ipc::error("[DEBUG] acquire_mutex returned nullptr\n");
78+
return false;
79+
}
80+
ipc::error("[DEBUG] acquire_mutex returned %p\n", mutex_);
81+
ipc::error("[DEBUG] shm_ = %p, ref_ = %p\n", shm_, ref_);
82+
83+
auto self_ref = ref_->fetch_add(1, std::memory_order_relaxed);
84+
ipc::error("[DEBUG] self_ref = %d, shm_->ref() = %d\n", self_ref, shm_->ref());
85+
86+
if (shm_->ref() > 1 || self_ref > 0) {
87+
ipc::error("[DEBUG] Mutex already initialized, returning\n");
88+
return valid();
89+
}
90+
91+
// ... rest of initialization
92+
}
93+
```
94+
95+
### Step 3: Check for Robust List Issues
96+
The `rb error 14` suggests that FreeBSD's robust mutex list might be corrupted. This could be caused by:
97+
98+
1. **Improper destruction**: A robust mutex should not be destroyed while it's still locked or while it's in a thread's robust list.
99+
2. **Dangling pointers**: After destroying a mutex, its address might still be in the robust list.
100+
101+
**Potential fix**: Ensure that mutexes are unlocked before being destroyed:
102+
103+
```cpp
104+
~mutex() {
105+
// Ensure the mutex is unlocked before destruction
106+
if (valid()) {
107+
try {
108+
unlock(); // Try to unlock if we hold the lock
109+
} catch (...) {
110+
// Ignore errors during destruction
111+
}
112+
}
113+
close();
114+
}
115+
```
116+
117+
### Step 4: Test with Non-Robust Mutex
118+
To isolate whether the problem is related to robust mutexes specifically, we could temporarily disable the `PTHREAD_MUTEX_ROBUST` attribute:
119+
120+
```cpp
121+
// In open() method, comment out:
122+
// if ((eno = ::pthread_mutexattr_setrobust(&mutex_attr, PTHREAD_MUTEX_ROBUST)) != 0) {
123+
// ipc::error("fail pthread_mutexattr_setrobust[%d]\n", eno);
124+
// return false;
125+
// }
126+
```
127+
128+
If the crash disappears without robust mutexes, it confirms that the issue is in robust mutex handling.
129+
130+
## Recommended Fix
131+
132+
Based on the `rb error 14 (EFAULT)` message, I suspect the problem is that FreeBSD's robust mutex list contains a dangling pointer to a destroyed mutex.
133+
134+
**Root cause**: When a mutex object is destroyed (in the destructor or `close()`), it calls `pthread_mutex_destroy()`, but if the current thread has this mutex in its robust list (even if it's not locked), FreeBSD might try to access it later and find an invalid pointer.
135+
136+
**Proposed solution**:
137+
138+
1. **Add explicit unlock before destruction**:
139+
```cpp
140+
~mutex() noexcept {
141+
// Try to unlock if we might hold the lock
142+
if (valid()) {
143+
try {
144+
::pthread_mutex_unlock(mutex_); // Ignore errors
145+
} catch (...) {}
146+
}
147+
close();
148+
}
149+
```
150+
151+
2. **Ensure proper cleanup order in close()**:
152+
```cpp
153+
void close() noexcept {
154+
if ((ref_ != nullptr) && (shm_ != nullptr) && (mutex_ != nullptr)) {
155+
// Try to unlock before destroying
156+
::pthread_mutex_unlock(mutex_); // Ignore errors
157+
158+
if (shm_->name() != nullptr) {
159+
release_mutex(shm_->name(), [this] {
160+
auto self_ref = ref_->fetch_sub(1, std::memory_order_relaxed);
161+
if ((shm_->ref() <= 1) && (self_ref <= 1)) {
162+
int eno;
163+
if ((eno = ::pthread_mutex_destroy(mutex_)) != 0) {
164+
ipc::error("fail pthread_mutex_destroy[%d]\n", eno);
165+
}
166+
return true;
167+
}
168+
return false;
169+
});
170+
} else shm_->release();
171+
}
172+
shm_ = nullptr;
173+
ref_ = nullptr;
174+
mutex_ = nullptr;
175+
}
176+
```
177+
178+
3. **Alternatively, track lock state explicitly**:
179+
Add a member variable to track whether we hold the lock:
180+
```cpp
181+
std::atomic<bool> is_locked_{false};
182+
183+
// In lock():
184+
if (result) is_locked_.store(true);
185+
186+
// In unlock():
187+
if (result) is_locked_.store(false);
188+
189+
// In destructor/close():
190+
if (is_locked_.load()) {
191+
::pthread_mutex_unlock(mutex_);
192+
}
193+
```
194+
195+
## Next Steps
196+
197+
1. Add diagnostic output to understand where exactly the crash occurs
198+
2. Implement one of the proposed fixes
199+
3. Test on FreeBSD 15 to verify the fix
200+
4. Ensure the fix doesn't break Linux/QNX compatibility

src/libipc/platform/posix/mutex.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,17 @@ class mutex {
150150

151151
void close() noexcept {
152152
if ((ref_ != nullptr) && (shm_ != nullptr) && (mutex_ != nullptr)) {
153+
// Try to unlock the mutex before destroying it.
154+
// This is important for robust mutexes on FreeBSD, which maintain
155+
// a per-thread robust list. If we destroy a mutex while it's in
156+
// the robust list (even if not locked), FreeBSD may encounter
157+
// dangling pointers later, leading to segfaults.
158+
// We ignore any errors from unlock() since:
159+
// 1. If we don't hold the lock, EPERM is expected and harmless
160+
// 2. If the mutex is already unlocked, this is a no-op
161+
// 3. If there's an error, we still want to proceed with cleanup
162+
::pthread_mutex_unlock(mutex_);
163+
153164
if (shm_->name() != nullptr) {
154165
release_mutex(shm_->name(), [this] {
155166
auto self_ref = ref_->fetch_sub(1, std::memory_order_relaxed);
@@ -171,6 +182,9 @@ class mutex {
171182

172183
void clear() noexcept {
173184
if ((shm_ != nullptr) && (mutex_ != nullptr)) {
185+
// Try to unlock before destroying, same reasoning as in close()
186+
::pthread_mutex_unlock(mutex_);
187+
174188
if (shm_->name() != nullptr) {
175189
release_mutex(shm_->name(), [this] {
176190
int eno;

0 commit comments

Comments
 (0)