diff --git a/ChangeLog.md b/ChangeLog.md
index 2e1d037f5970f..0e16803fd6a68 100644
--- a/ChangeLog.md
+++ b/ChangeLog.md
@@ -20,6 +20,8 @@ See docs/process.md for more on how version tagging works.
 
 5.0.6 (in development)
 ----------------------
+- The emscripten_futux_wait API is not document to explicitly allow spurious
+  wakeups. (#26659)
 - The minimum version of node supported by the generated code was bumped from
   v12.22.0 to v18.3.0. (#26604)
 - The DETERMINISIC settings was marked as deprecated (#26653)
diff --git a/docs/design/01-precise-futex-wakeups.md b/docs/design/01-precise-futex-wakeups.md
index 70fbf22d6a2a4..f379f25789388 100644
--- a/docs/design/01-precise-futex-wakeups.md
+++ b/docs/design/01-precise-futex-wakeups.md
@@ -1,18 +1,18 @@
 # Design Doc: Precise Futex Wakeups
 
-- **Status**: Draft
+- **Status**: Completed
 - **Bug**: https://github.com/emscripten-core/emscripten/issues/26633
 
 ## Context
-Currently, `emscripten_futex_wait` (in
-`system/lib/pthread/emscripten_futex_wait.c`) relies on a periodic wakeup loop
-for pthreads and the main runtime thread. This is done for two primary reasons:
+Historically, `emscripten_futex_wait` (in
+`system/lib/pthread/emscripten_futex_wait.c`) relied on a periodic wakeup loop
+for pthreads and the main runtime thread. This was done for two primary reasons:
 
-1. **Thread Cancellation**: To check if the calling thread has been cancelled while it is blocked.
+1. **Thread Cancellation**: To check if the calling thread had been cancelled while it was blocked.
 2. **Main Runtime Thread Events**: To allow the main runtime thread (even when not the main browser thread) to process its mailbox/event queue.
 
-The current implementation uses a 1ms wakeup interval for the main runtime
-thread and a 100ms interval for cancellable pthreads. This leads to unnecessary
+The old implementation used a 1ms wakeup interval for the main runtime
+thread and a 100ms interval for cancellable pthreads. This led to unnecessary
 CPU wakeups and increased latency for events.
 
 ## Goals
@@ -23,22 +23,21 @@ CPU wakeups and increased latency for events.
 
 ## Non-Goals
 - **Main Browser Thread**: Changes to the busy-wait loop in `futex_wait_main_browser_thread` are out of scope.
-- **Direct Atomics Usage**: Threads that call `atomic.wait` directly (bypassing `emscripten_futex_wait`) will remain un-interruptible.
+- **Direct Atomics Usage**: Threads that call `atomic.wait` directly (bypassing `emscripten_futex_wait`) remain un-interruptible.
 - **Wasm Workers**: Wasm Workers do not have a `pthread` structure, so they are not covered by this design.
 
-## Proposed Design
+## Design
 
 The core idea is to allow "side-channel" wakeups (cancellation, mailbox events)
 to interrupt the `atomic.wait` call by having the waker call `atomic.wake` on the
 same address the waiter is currently blocked on.
 
-As part of this design we will need to explicitly state that
-`emscripten_futex_wait` now supports spurious wakeups.  i.e. it may return `0`
-(success) even if the underlying futex was not explicitly woken by the
-application.
+As part of this design, `emscripten_futex_wait` now explicitly supports spurious
+wakeups.  i.e. it may return `0` (success) even if the underlying futex was not
+explicitly woken by the application.
 
 ### 1. `struct pthread` Extensions
-We will add a single atomic `wait_addr` field to `struct pthread` (in
+A single atomic `wait_addr` field was added to `struct pthread` (in
 `system/lib/libc/musl/src/internal/pthread_impl.h`).
 
 ```c
@@ -57,82 +56,80 @@ _Atomic uintptr_t wait_addr;
 ```
 
 ### 2. Waiter Logic (`emscripten_futex_wait`)
-The waiter will follow this logic:
+The waiter follows this logic:
 
-1.  **Notification Loop**:
+1.  **Publish Wait Address**:
     ```c
     uintptr_t expected_null = 0;
-    while (!atomic_compare_exchange_strong(&self->wait_addr, &expected_null, (uintptr_t)addr)) {
+    if (!atomic_compare_exchange_strong(&self->wait_addr, &expected_null, (uintptr_t)addr)) {
         // If the CAS failed, it means NOTIFY_BIT was set by another thread.
-        assert(expected_null == NOTIFY_BIT);
-        // Let the notifier know that we received the wakeup notification by
-        // resetting wait_addr.
-        self->wait_addr = 0;
-        handle_wakeup(); // Process mailbox or handle cancellation
-        // Reset expected_null because CAS updates it to the observed value on failure.
-        expected_null = 0;
+        assert(expected_null & NOTIFY_BIT);
+        // We don't wait at all; instead behave as if we spuriously woke up.
+        ret = ATOMICS_WAIT_OK;
+        goto done;
     }
     ```
 2.  **Wait**: Call `ret = __builtin_wasm_memory_atomic_wait32(addr, val, timeout)`.
-3.  **Unpublish & Check**:
+3.  **Unpublish**:
     ```c
-    // Clear wait_addr and check if a notification arrived while we were sleeping.
-    if ((atomic_exchange(&self->wait_addr, 0) & NOTIFY_BIT) != 0) {
-        handle_wakeup();
-    }
+    done:
+    self->wait_addr = 0;
     ```
-4.  **Return**: Return the result of the wait.
+4.  **Handle side effects**: If the wake was due to cancellation or mailbox
+    events, these are handled after `emscripten_futex_wait` returns (or
+    internally via `pthread_testcancel` if cancellable).
 
 Note: We do **not** loop internally if `ret == ATOMICS_WAIT_OK`. Even if we
 suspect the wake was caused by a side-channel event, we must return to the user
 to avoid "swallowing" a simultaneous real application wake.
 
-### 3. Waker Logic
-When a thread needs to wake another thread for a side-channel event:
+### 3. Waker Logic (`_emscripten_thread_notify`)
+When a thread needs to wake another thread for a side-channel event (e.g.
+enqueuing work or cancellation), it calls `_emscripten_thread_notify`:
 
-1.  **Enqueue Work**: Add the task to the target's mailbox or set the cancellation flag.
-2.  **Signal**:
-    ```c
-    uintptr_t addr = atomic_fetch_or(&target->wait_addr, NOTIFY_BIT);
-    if (addr == 0 || (addr & NOTIFY_BIT) != 0) {
-        // Either the thread wasn't waiting (it will see NOTIFY_BIT later),
-        // or someone else is already in the process of notifying it.
-        return;
-    }
-    // We set the bit and are responsible for waking the target.
-    // The target is currently waiting on `addr`.
-    while (target->wait_addr == (addr | NOTIFY_BIT)) {
-        emscripten_futex_wake((void*)addr, INT_MAX);
-        sched_yield();
-    }
-    ```
+```c
+void _emscripten_thread_notify(pthread_t target) {
+  uintptr_t addr = atomic_fetch_or(&target->wait_addr, NOTIFY_BIT);
+  if (addr == 0 || (addr & NOTIFY_BIT) != 0) {
+    // Either the thread wasn't waiting (it will see NOTIFY_BIT later),
+    // or someone else is already in the process of notifying it.
+    return;
+  }
+  // We set the bit and are responsible for waking the target.
+  // The target is currently waiting on `addr`.
+  while (target->wait_addr == (addr | NOTIFY_BIT)) {
+    emscripten_futex_wake((void*)addr, INT_MAX);
+    sched_yield();
+  }
+}
+```
 
 ### 4. Handling the Race Condition
 The protocol handles the "Lost Wakeup" race by having the waker loop until the
 waiter clears its `wait_addr`. If the waker sets the `NOTIFY_BIT` just before
 the waiter enters `atomic.wait`, the `atomic_wake` will be delivered once the
 waiter is asleep. If the waiter wakes up for any reason (timeout, real wake, or
-side-channel wake), its `atomic_exchange` will satisfy the waker's loop
-condition.
+side-channel wake), its reset of `wait_addr` to `0` will satisfy the waker's
+loop condition.
 
 ## Benefits
 
 - **Lower Power Consumption**: Threads can sleep indefinitely (or for the full duration of a user-requested timeout) without periodic wakeups.
-- **Lower Latency**: Mailbox events and cancellation requests are processed immediately rather than waiting for the next 1ms or 100ms tick.
-- **Simpler Loop**: The complex logic for calculating remaining timeout slices in `emscripten_futex_wait` is removed.
+- **Lower Latency**: Mailbox events and cancellation requests are processed immediately rather than waiting for the next tick.
+- **Simpler Loop**: The complex logic for calculating remaining timeout slices in `emscripten_futex_wait` was removed.
 
 ## Alternatives Considered
 - **Signal-based wakeups**: Not currently feasible in Wasm as signals are not
   implemented in a way that can interrupt `atomic.wait`.
 - **A single global "wake-up" address per thread**: This would require the
   waiter to wait on *two* addresses simultaneously (the user's futex and its
-  own wakeup address), which `atomic.wait` does not support. The proposed
+  own wakeup address), which `atomic.wait` does not support. The implemented
   design works around this by having the waker use the *user's* futex address.
 
 ## Security/Safety Considerations
-- **The `wait_addr` must be managed carefully** to ensure wakers don't
+- **The `wait_addr` is managed carefully** to ensure wakers don't
   call `atomic.wake` on stale addresses. Clearing the address upon wake
   mitigates this.
-- **The waker loop should have a reasonable fallback** (like a yield) to prevent a
-  busy-wait deadlock if the waiter is somehow prevented from waking up (though
-  `atomic.wait` is generally guaranteed to wake if `atomic.wake` is called).
+- **The waker loop has a yield** to prevent a busy-wait deadlock if the waiter
+  is somehow prevented from waking up (though `atomic.wait` is generally
+  guaranteed to wake if `atomic.wake` is called).
diff --git a/src/struct_info_generated.json b/src/struct_info_generated.json
index 163a1cf69779c..c5c9d7e1ee7a2 100644
--- a/src/struct_info_generated.json
+++ b/src/struct_info_generated.json
@@ -1036,7 +1036,7 @@
             "p_proto": 8
         },
         "pthread": {
-            "__size__": 124,
+            "__size__": 128,
             "profilerBlock": 104,
             "stack": 48,
             "stack_size": 52,
diff --git a/src/struct_info_generated_wasm64.json b/src/struct_info_generated_wasm64.json
index 8b857fef5dbcb..18a3290fe7728 100644
--- a/src/struct_info_generated_wasm64.json
+++ b/src/struct_info_generated_wasm64.json
@@ -1036,7 +1036,7 @@
             "p_proto": 16
         },
         "pthread": {
-            "__size__": 216,
+            "__size__": 224,
             "profilerBlock": 184,
             "stack": 80,
             "stack_size": 88,
diff --git a/system/include/emscripten/threading_primitives.h b/system/include/emscripten/threading_primitives.h
index dac148f0c143e..aa6761a791a07 100644
--- a/system/include/emscripten/threading_primitives.h
+++ b/system/include/emscripten/threading_primitives.h
@@ -188,6 +188,10 @@ void emscripten_condvar_signal(emscripten_condvar_t * _Nonnull condvar, uint32_t
 
 // If the given memory address contains value val, puts the calling thread to
 // sleep waiting for that address to be notified.
+// Note: Like the Linux futex syscall, this APi *does* allow spurious wakeups.
+// This differs from the WebAssembly `atomic.wait` instruction itself which
+// does *not* allow supurious wakeups and it means that most callers will want
+// to wraps this some kind loop.
 // Returns -EINVAL if addr is null.
 int emscripten_futex_wait(volatile void/*uint32_t*/ * _Nonnull addr, uint32_t val, double maxWaitMilliseconds);
 
diff --git a/system/lib/libc/musl/src/internal/pthread_impl.h b/system/lib/libc/musl/src/internal/pthread_impl.h
index b2588344eb1ab..d335c4cd9119b 100644
--- a/system/lib/libc/musl/src/internal/pthread_impl.h
+++ b/system/lib/libc/musl/src/internal/pthread_impl.h
@@ -111,6 +111,16 @@ struct pthread {
 	// postMessage path. Once this becomes true, it remains true so we never
 	// fall back to postMessage unnecessarily.
 	_Atomic int waiting_async;
+	// The address the thread is currently waiting on in emscripten_futex_wait.
+	//
+	// This field encodes the state using the following bitmask:
+	// - NULL: Not waiting, no pending notification.
+	// - NOTIFY_BIT (0x1): Not waiting, but a notification was sent.
+	// - addr: Waiting on `addr`, no pending notification.
+	// - addr | NOTIFY_BIT: Waiting on `addr`, notification sent.
+	//
+	// Since futex addresses must be 4-byte aligned, the low bit is safe to use.
+	_Atomic uintptr_t wait_addr;
 #endif
 #ifdef EMSCRIPTEN_DYNAMIC_LINKING
 	// When dynamic linking is enabled, threads use this to facilitate the
@@ -120,6 +130,10 @@ struct pthread {
 #endif
 };
 
+#ifdef __EMSCRIPTEN__
+#define NOTIFY_BIT (1 << 0)
+#endif
+
 enum {
 	DT_EXITED,
 	DT_EXITING,
diff --git a/system/lib/libc/musl/src/signal/setitimer.c b/system/lib/libc/musl/src/signal/setitimer.c
index 5e9b4683ee067..90f3b4a2166d0 100644
--- a/system/lib/libc/musl/src/signal/setitimer.c
+++ b/system/lib/libc/musl/src/signal/setitimer.c
@@ -8,6 +8,7 @@
 #include <emscripten/emscripten.h>
 #include <emscripten/threading.h>
 #include <assert.h>
+#include <math.h>
 #include <signal.h>
 #include <stdint.h>
 #include <stdio.h>
@@ -79,6 +80,18 @@ void _emscripten_check_timers(double now)
 		}
 	}
 }
+
+double _emscripten_next_timer()
+{
+	assert(emscripten_is_main_runtime_thread());
+	double next_timer = INFINITY;
+	for (int which = 0; which < 3; which++) {
+		if (current_timeout_ms[which]) {
+			next_timer = fmin(current_timeout_ms[which], next_timer);
+		}
+	}
+	return next_timer - emscripten_get_now();
+}
 #endif
 
 int setitimer(int which, const struct itimerval *restrict new, struct itimerval *restrict old)
diff --git a/system/lib/libc/musl/src/thread/pthread_cancel.c b/system/lib/libc/musl/src/thread/pthread_cancel.c
index 3428cd5fab474..7bdba001ec6cc 100644
--- a/system/lib/libc/musl/src/thread/pthread_cancel.c
+++ b/system/lib/libc/musl/src/thread/pthread_cancel.c
@@ -108,5 +108,13 @@ int pthread_cancel(pthread_t t)
 			pthread_exit(PTHREAD_CANCELED);
 		return 0;
 	}
+#ifdef __EMSCRIPTEN__
+	// Wake the target thread in case it in emscripten_futex_wait.  Normally
+	// this is only done when the target is the main runtime thread and there
+	// is an event added to its system queue.
+	// However, all threads needs to be inturruped like this in the case they are
+	// cancelled.
+	_emscripten_thread_notify(t);
+#endif
 	return pthread_kill(t, SIGCANCEL);
 }
diff --git a/system/lib/pthread/em_task_queue.c b/system/lib/pthread/em_task_queue.c
index 0eeb8f0bf8579..d9c72f5384162 100644
--- a/system/lib/pthread/em_task_queue.c
+++ b/system/lib/pthread/em_task_queue.c
@@ -14,6 +14,7 @@
 #include "em_task_queue.h"
 #include "proxying_notification_state.h"
 #include "thread_mailbox.h"
+#include "threading_internal.h"
 
 #define EM_TASK_QUEUE_INITIAL_CAPACITY 128
 
@@ -166,6 +167,7 @@ static bool em_task_queue_grow(em_task_queue* queue) {
 }
 
 void em_task_queue_execute(em_task_queue* queue) {
+  DBG("em_task_queue_execute");
   queue->processing = 1;
   pthread_mutex_lock(&queue->mutex);
   while (!em_task_queue_is_empty(queue)) {
@@ -178,6 +180,7 @@ void em_task_queue_execute(em_task_queue* queue) {
   }
   pthread_mutex_unlock(&queue->mutex);
   queue->processing = 0;
+  DBG("done em_task_queue_execute");
 }
 
 void em_task_queue_cancel(em_task_queue* queue) {
@@ -219,6 +222,7 @@ static void receive_notification(void* arg) {
   notification_state expected = NOTIFICATION_RECEIVED;
   atomic_compare_exchange_strong(
     &tasks->notification, &expected, NOTIFICATION_NONE);
+  DBG("receive_notification done");
 }
 
 static void cancel_notification(void* arg) {
@@ -246,6 +250,7 @@ bool em_task_queue_send(em_task_queue* queue, task t) {
   notification_state previous =
     atomic_exchange(&queue->notification, NOTIFICATION_PENDING);
   if (previous == NOTIFICATION_PENDING) {
+    DBG("em_task_queue_send NOTIFICATION_PENDING already set");
     emscripten_thread_mailbox_unref(queue->thread);
     return true;
   }
diff --git a/system/lib/pthread/emscripten_futex_wait.c b/system/lib/pthread/emscripten_futex_wait.c
index 0584edccbd57e..806eea7950379 100644
--- a/system/lib/pthread/emscripten_futex_wait.c
+++ b/system/lib/pthread/emscripten_futex_wait.c
@@ -9,8 +9,11 @@
 #include "pthread_impl.h"
 #include "threading_internal.h"
 
-#include <assert.h>
 #include <emscripten/threading.h>
+#include <emscripten/console.h>
+
+#include <assert.h>
+#include <math.h>
 #include <errno.h>
 #include <math.h>
 #include <stdatomic.h>
@@ -123,6 +126,12 @@ static int futex_wait_main_browser_thread(volatile void* addr,
   return 0;
 }
 
+static double dummy() {
+  return INFINITY;
+}
+
+weak_alias(dummy, _emscripten_next_timer);
+
 int emscripten_futex_wait(volatile void *addr, uint32_t val, double max_wait_ms) {
   if ((((intptr_t)addr)&3) != 0) {
     return -EINVAL;
@@ -146,91 +155,63 @@ int emscripten_futex_wait(volatile void *addr, uint32_t val, double max_wait_ms)
     return ret;
   }
 
-  // -1 (or any negative number) means wait indefinitely.
-  int64_t max_wait_ns = ATOMICS_WAIT_DURATION_INFINITE;
-  if (max_wait_ms != INFINITY) {
-    max_wait_ns = (int64_t)(max_wait_ms * 1e6);
-  }
+  // Pass 0 here, which means we don't have access to the current time in this
+  // function.  This tells _emscripten_yield to call emscripten_get_now if (and
+  // only if) it needs to know the time.
+  _emscripten_yield(0);
+
+  DBG("emscripten_futex_wait ms=%f", max_wait_ms);
 
-#ifdef __EMSCRIPTEN_PTHREADS__
-  // When building with pthread support there are two conditions under which we
-  // need to limit the amount of time we spend in atomic.wait.
-  // 1. We are the main runtime thread.  In this case we need to be able to
-  //    process proxied events from workers.  Note that this is not always
-  //    the same as being the main browser thread.  For example, when running
-  //    under node or when launching an emscripten-built program in a Web
-  //    Worker.  In this case we limit our wait slices to 1ms intervals.
-  // 2. When the current thread has async cancellation enabled.  In this case
-  //    we limit the wait duration to 100ms intervals.
-  int64_t wakeup_interval = 0;
   bool is_runtime_thread = emscripten_is_main_runtime_thread();
   if (is_runtime_thread) {
-    // If the current thread is the main runtime thread then only wait in 1ms slices.
-    wakeup_interval = 1 * 1000000;
-  }
-  else if (cancelable) {
-    // If the current thread is async cancellable then only wait in 100ms slices.
-    wakeup_interval = 100 * 1000000;
+    max_wait_ms = fmin(max_wait_ms, fmax(0, _emscripten_next_timer()));
   }
 
-  // When wakeup_interval is set, we use end_time to track the absolute
-  // time when the wait should end.
-  double end_time = 0;
-  if (wakeup_interval) {
-    if (max_wait_ms == INFINITY) {
-      max_wait_ns = wakeup_interval;
-    } else {
-      end_time = emscripten_get_now() + max_wait_ms;
-      max_wait_ns = MIN(max_wait_ns, wakeup_interval);
-    }
+  // -1 (or any negative number) means wait indefinitely.
+  int64_t max_wait_ns = ATOMICS_WAIT_DURATION_INFINITE;
+  if (max_wait_ms != INFINITY) {
+    max_wait_ns = (int64_t)(max_wait_ms * 1e6);
   }
-
-  do {
-#endif
-    // Pass 0 here, which means we don't have access to the current time in this
-    // function.  This tells _emscripten_yield to call emscripten_get_now if (and
-    // only if) it needs to know the time.
-    _emscripten_yield(0);
-
+#ifdef __EMSCRIPTEN_PTHREADS__
 #ifdef EMSCRIPTEN_DYNAMIC_LINKING
-    // After the main thread queues dlopen events, it checks if the target threads
-    // are sleeping.
-    // If `sleeping` is set then the main thread knows that event will be
-    // processed after the sleep (before any other user code).  In this case the
-    // main thread does not wait for any kind of response form the thread.
-    // If `sleeping` is not set then we know we should wait for the thread process
-    // the queue, either from the call here directly after setting `sleeping` to
-    // 1, or from another callsite (e.g. the one in `emscripten_yield`).
-    if (!is_runtime_thread) {
-      self->sleeping = 1;
-      _emscripten_process_dlopen_queue();
-    }
+  if (!is_runtime_thread) {
+    self->sleeping = 1;
+    _emscripten_process_dlopen_queue();
+  }
 #endif
+  uintptr_t expected_null = 0;
+  if (atomic_compare_exchange_strong(&self->wait_addr, &expected_null, (uintptr_t)addr)) {
+    DBG("emscripten_futex_wait atomic.wait ns=%lld", max_wait_ns);
     ret = __builtin_wasm_memory_atomic_wait32((int*)addr, val, max_wait_ns);
+  } else {
+    DBG("emscripten_futex_wait skipping atomic.wait due to NOTIFY_BIT");
+    // CAS failed, NOTIFY_BIT must have been set.  In this case we don't
+    // actually wait at all.  Instead we behaviour as if we spuriously woke up
+    // right away.
+    assert(expected_null & NOTIFY_BIT);
+    ret = ATOMICS_WAIT_OK;
+  }
+
+  // Clear the wait_addr
+  DBG("emscripten_futex_wait done notify=%d cancelable=%d cancel=%d", !!(self->wait_addr & NOTIFY_BIT), cancelable, self->cancel);
+  self->wait_addr = 0;
 #ifdef EMSCRIPTEN_DYNAMIC_LINKING
-    if (!is_runtime_thread) {
-      self->sleeping = 0;
-      _emscripten_process_dlopen_queue();
-    }
+  if (!is_runtime_thread) {
+    self->sleeping = 0;
+    _emscripten_process_dlopen_queue();
+  }
 #endif
-#ifdef __EMSCRIPTEN_PTHREADS__
-    if (cancelable && ret == ATOMICS_WAIT_TIMED_OUT && self->cancel) {
-      __pthread_testcancel();
-      // If __pthread_testcancel does return here it means that canceldisable
-      // must be set to PTHREAD_CANCEL_MASKED.  In this case we emulate the
-      // behaviour of the futex syscall and return ECANCELLED here.
-      // See pthread_cond_timedwait.c for the only use of this flag.
-      emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_WAITFUTEX, EM_THREAD_STATUS_RUNNING);
-      return -ECANCELED;
-    }
-    if (wakeup_interval && max_wait_ms != INFINITY) {
-      double remainder_ms = end_time - emscripten_get_now();
-      if (remainder_ms <= 0) {
-        break;
-      }
-      max_wait_ns = MIN((int64_t)(remainder_ms * 1e6), wakeup_interval);
-    }
-  } while (wakeup_interval && ret == ATOMICS_WAIT_TIMED_OUT);
+  if (cancelable && self->cancel) {
+    __pthread_testcancel();
+    // If __pthread_testcancel does return here it means that canceldisable
+    // must be set to PTHREAD_CANCEL_MASKED.  In this case we emulate the
+    // behaviour of the futex syscall and return ECANCELLED here.
+    // See pthread_cond_timedwait.c for the only use of this flag.
+    emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_WAITFUTEX, EM_THREAD_STATUS_RUNNING);
+    return -ECANCELED;
+  }
+#else
+  ret = __builtin_wasm_memory_atomic_wait32((int*)addr, val, max_wait_ns);
 #endif
 
   emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_WAITFUTEX, EM_THREAD_STATUS_RUNNING);
diff --git a/system/lib/pthread/emscripten_futex_wake.c b/system/lib/pthread/emscripten_futex_wake.c
index 1e4e57474ffa7..2bff9699fa658 100644
--- a/system/lib/pthread/emscripten_futex_wake.c
+++ b/system/lib/pthread/emscripten_futex_wake.c
@@ -5,10 +5,13 @@
  * found in the LICENSE file.
  */
 
+#include "atomic.h"
+#include "pthread_impl.h"
 #include <assert.h>
 #include <errno.h>
 #include <limits.h>
-#include <atomic.h>
+#include <stdatomic.h>
+
 #include <emscripten/threading.h>
 
 // Stores the memory address that the main thread is waiting on, if any. If
@@ -47,3 +50,23 @@ int emscripten_futex_wake(volatile void *addr, int count) {
   assert(ret >= 0);
   return ret + main_thread_woken;
 }
+
+void _emscripten_thread_notify(pthread_t target) {
+  DBG("_emscripten_thread_notify %p", target);
+  uintptr_t wait_addr = atomic_fetch_or(&target->wait_addr, NOTIFY_BIT);
+  if (wait_addr == 0 || (wait_addr & NOTIFY_BIT)) {
+    // Either the thread wasn't waiting (In this case it will see NOTIFY_BIT and
+    // return early once it enters its next `emscripten_futex_wait`), or someone
+    // else is already in the process of notifying it.
+    return;
+  }
+
+  // We set the NOTIFY_BIT bit and are responsible for waking the target.
+  // The target is currently waiting on `wait_addr`.
+  while (target->wait_addr == (wait_addr | NOTIFY_BIT)) {
+    emscripten_futex_wake((void*)wait_addr, INT_MAX);
+    // TODO: Can we put some kind of yield instruction here?  For example,
+    // it we ever support an `atomics.pause` Wasm instrction this would be a
+    // good place for it.
+  }
+}
diff --git a/system/lib/pthread/emscripten_yield.c b/system/lib/pthread/emscripten_yield.c
index fa3e0398b70a2..578faa355ef50 100644
--- a/system/lib/pthread/emscripten_yield.c
+++ b/system/lib/pthread/emscripten_yield.c
@@ -11,6 +11,7 @@ static _Atomic pthread_t crashed_thread_id = NULL;
 
 void _emscripten_thread_crashed() {
   crashed_thread_id = pthread_self();
+  _emscripten_thread_notify(emscripten_main_runtime_thread_id());
 }
 
 static void dummy(double now)
diff --git a/system/lib/pthread/library_pthread.c b/system/lib/pthread/library_pthread.c
index 4f104d6549750..a89425874b113 100644
--- a/system/lib/pthread/library_pthread.c
+++ b/system/lib/pthread/library_pthread.c
@@ -79,18 +79,25 @@ int pthread_mutexattr_setprioceiling(pthread_mutexattr_t *attr, int prioceiling)
 void emscripten_thread_sleep(double msecs) {
   // We include emscripten_current_thread_process_queued_calls before and
   // after sleeping since that is how we recieve "async" signals.
-  // We include __pthread_testcancel here becuase clock_nanosleep is
+  // We include __pthread_testcancel here because clock_nanosleep is
   // a pthread cancelation point.
   emscripten_current_thread_process_queued_calls();
   __pthread_testcancel();
-  emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_RUNNING,
-                                                   EM_THREAD_STATUS_SLEEPING);
-  uint32_t dummyZeroAddress = 0;
-  emscripten_futex_wait(&dummyZeroAddress, 0, msecs);
-  emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_SLEEPING,
-                                                   EM_THREAD_STATUS_RUNNING);
-  emscripten_current_thread_process_queued_calls();
-  __pthread_testcancel();
+  if (msecs > 0) {
+    uint32_t dummyZeroAddress = 0;
+    double start = emscripten_get_now();
+    double elapsed = 0;
+    while (elapsed < msecs) {
+      emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_RUNNING,
+                                                       EM_THREAD_STATUS_SLEEPING);
+      emscripten_futex_wait(&dummyZeroAddress, 0, msecs - elapsed);
+      emscripten_conditional_set_current_thread_status(EM_THREAD_STATUS_SLEEPING,
+                                                       EM_THREAD_STATUS_RUNNING);
+      emscripten_current_thread_process_queued_calls();
+      __pthread_testcancel();
+      elapsed = emscripten_get_now() - start;
+    }
+  }
 }
 
 static struct pthread __main_pthread;
diff --git a/system/lib/pthread/proxying.c b/system/lib/pthread/proxying.c
index 7625eba8ab3a6..6a2ee7f0c46e8 100644
--- a/system/lib/pthread/proxying.c
+++ b/system/lib/pthread/proxying.c
@@ -162,7 +162,15 @@ static bool do_proxy(em_proxying_queue* q, pthread_t target_thread, task t) {
     return false;
   }
 
-  return em_task_queue_send(tasks, t);
+  bool rtn =  em_task_queue_send(tasks, t);
+  // Then proxying work to the main thread using the system queue we have
+  // a very special case in that we need the target thread to wake up from
+  // `emscripten_futex_wait` to process the queue.
+  if (rtn && is_system_queue && pthread_equal(target_thread, emscripten_main_runtime_thread_id())) {
+    DBG("waking main runtime thread using _emscripten_thread_notify");
+    _emscripten_thread_notify(target_thread);
+  }
+  return rtn;
 }
 
 bool emscripten_proxy_async(em_proxying_queue* q,
diff --git a/system/lib/pthread/thread_mailbox.c b/system/lib/pthread/thread_mailbox.c
index 701218e3a137b..ba0eac547e041 100644
--- a/system/lib/pthread/thread_mailbox.c
+++ b/system/lib/pthread/thread_mailbox.c
@@ -77,6 +77,7 @@ void _emscripten_check_mailbox() {
   // For example, in PROXY_TO_PTHREAD the atexit functions are called via
   // a proxied call, and without this call to synchronize we would crash if
   // any atexit functions were registered from a side module.
+  DBG("_emscripten_check_mailbox");
   assert(pthread_self());
   em_task_queue* mailbox = pthread_self()->mailbox;
   mailbox->notification = NOTIFICATION_RECEIVED;
diff --git a/system/lib/pthread/threading_internal.h b/system/lib/pthread/threading_internal.h
index abc6cbff7f22c..edfd779ca3444 100644
--- a/system/lib/pthread/threading_internal.h
+++ b/system/lib/pthread/threading_internal.h
@@ -9,6 +9,16 @@
 
 #include <pthread.h>
 
+// Defined THREADING_DEBUG here (or in the build system) to enabled verbose
+// logging using emscripten_dbgf.
+// #define THREADING_DEBUG
+
+#ifdef THREADING_DEBUG
+#define DBG(format, ...) emscripten_dbgf(format, ##__VA_ARGS__)
+#else
+#define DBG(format, ...)
+#endif
+
 #define EM_THREAD_NAME_MAX 32
 
 #define EM_THREAD_STATUS int
@@ -104,3 +114,8 @@ void _emscripten_run_js_on_main_thread_done(void* ctx, void* arg, double result)
 // if called from the main browser thread, this function will return zero
 // since blocking is not allowed there).
 int _emscripten_thread_supports_atomics_wait(void);
+
+// Notify a thread that it has new work to do (cancellation or mailbox).
+// This will interrupt the thread if it is currently blocked in
+// emscripten_futex_wait.
+void _emscripten_thread_notify(pthread_t thread);
diff --git a/test/codesize/test_codesize_cxx_ctors1.json b/test/codesize/test_codesize_cxx_ctors1.json
index 0b8bea69b50e7..91ad85c53860c 100644
--- a/test/codesize/test_codesize_cxx_ctors1.json
+++ b/test/codesize/test_codesize_cxx_ctors1.json
@@ -2,9 +2,9 @@
   "a.out.js": 19194,
   "a.out.js.gz": 7969,
   "a.out.nodebug.wasm": 132635,
-  "a.out.nodebug.wasm.gz": 49922,
+  "a.out.nodebug.wasm.gz": 49924,
   "total": 151829,
-  "total_gz": 57891,
+  "total_gz": 57893,
   "sent": [
     "__cxa_throw",
     "_abort_js",
diff --git a/test/codesize/test_codesize_cxx_ctors2.json b/test/codesize/test_codesize_cxx_ctors2.json
index 4603f919e7c55..ffa319647e766 100644
--- a/test/codesize/test_codesize_cxx_ctors2.json
+++ b/test/codesize/test_codesize_cxx_ctors2.json
@@ -1,10 +1,10 @@
 {
   "a.out.js": 19171,
   "a.out.js.gz": 7957,
-  "a.out.nodebug.wasm": 132061,
-  "a.out.nodebug.wasm.gz": 49579,
-  "total": 151232,
-  "total_gz": 57536,
+  "a.out.nodebug.wasm": 132063,
+  "a.out.nodebug.wasm.gz": 49580,
+  "total": 151234,
+  "total_gz": 57537,
   "sent": [
     "__cxa_throw",
     "_abort_js",
diff --git a/test/codesize/test_codesize_cxx_except_wasm.json b/test/codesize/test_codesize_cxx_except_wasm.json
index 66e3ff01b1367..4bc0805d82c50 100644
--- a/test/codesize/test_codesize_cxx_except_wasm.json
+++ b/test/codesize/test_codesize_cxx_except_wasm.json
@@ -2,9 +2,9 @@
   "a.out.js": 19026,
   "a.out.js.gz": 7904,
   "a.out.nodebug.wasm": 147926,
-  "a.out.nodebug.wasm.gz": 55308,
+  "a.out.nodebug.wasm.gz": 55309,
   "total": 166952,
-  "total_gz": 63212,
+  "total_gz": 63213,
   "sent": [
     "_abort_js",
     "_tzset_js",
diff --git a/test/codesize/test_codesize_cxx_lto.json b/test/codesize/test_codesize_cxx_lto.json
index 6c07b8cbdbf3f..4c2da355a7ebc 100644
--- a/test/codesize/test_codesize_cxx_lto.json
+++ b/test/codesize/test_codesize_cxx_lto.json
@@ -2,9 +2,9 @@
   "a.out.js": 18563,
   "a.out.js.gz": 7666,
   "a.out.nodebug.wasm": 102164,
-  "a.out.nodebug.wasm.gz": 39553,
+  "a.out.nodebug.wasm.gz": 39554,
   "total": 120727,
-  "total_gz": 47219,
+  "total_gz": 47220,
   "sent": [
     "a (emscripten_resize_heap)",
     "b (_setitimer_js)",
diff --git a/test/codesize/test_codesize_cxx_noexcept.json b/test/codesize/test_codesize_cxx_noexcept.json
index 59ac98658db69..2d84e07e951e9 100644
--- a/test/codesize/test_codesize_cxx_noexcept.json
+++ b/test/codesize/test_codesize_cxx_noexcept.json
@@ -2,9 +2,9 @@
   "a.out.js": 19194,
   "a.out.js.gz": 7969,
   "a.out.nodebug.wasm": 134657,
-  "a.out.nodebug.wasm.gz": 50774,
+  "a.out.nodebug.wasm.gz": 50769,
   "total": 153851,
-  "total_gz": 58743,
+  "total_gz": 58738,
   "sent": [
     "__cxa_throw",
     "_abort_js",
diff --git a/test/codesize/test_codesize_cxx_wasmfs.json b/test/codesize/test_codesize_cxx_wasmfs.json
index 577ddcf1fd9b6..60713bafd1b45 100644
--- a/test/codesize/test_codesize_cxx_wasmfs.json
+++ b/test/codesize/test_codesize_cxx_wasmfs.json
@@ -2,9 +2,9 @@
   "a.out.js": 7023,
   "a.out.js.gz": 3310,
   "a.out.nodebug.wasm": 172710,
-  "a.out.nodebug.wasm.gz": 63317,
+  "a.out.nodebug.wasm.gz": 63307,
   "total": 179733,
-  "total_gz": 66627,
+  "total_gz": 66617,
   "sent": [
     "__cxa_throw",
     "_abort_js",
diff --git a/test/codesize/test_codesize_hello_dylink_all.json b/test/codesize/test_codesize_hello_dylink_all.json
index 105e4113624d7..fd8921124b9cd 100644
--- a/test/codesize/test_codesize_hello_dylink_all.json
+++ b/test/codesize/test_codesize_hello_dylink_all.json
@@ -1,7 +1,7 @@
 {
   "a.out.js": 244343,
-  "a.out.nodebug.wasm": 577473,
-  "total": 821816,
+  "a.out.nodebug.wasm": 577634,
+  "total": 821977,
   "sent": [
     "IMG_Init",
     "IMG_Load",
@@ -1958,6 +1958,7 @@
     "_emscripten_find_dylib",
     "_emscripten_memcpy_bulkmem",
     "_emscripten_memset_bulkmem",
+    "_emscripten_next_timer",
     "_emscripten_run_callback_on_thread",
     "_emscripten_set_offscreencanvas_size_on_thread",
     "_emscripten_stack_alloc",
@@ -3819,6 +3820,7 @@
     "$_emscripten_find_dylib",
     "$_emscripten_memcpy_bulkmem",
     "$_emscripten_memset_bulkmem",
+    "$_emscripten_next_timer",
     "$_emscripten_run_callback_on_thread",
     "$_emscripten_set_offscreencanvas_size_on_thread",
     "$_emscripten_stack_alloc",
diff --git a/test/codesize/test_codesize_minimal_pthreads.json b/test/codesize/test_codesize_minimal_pthreads.json
index 050cb6cd65f24..2b9278e555869 100644
--- a/test/codesize/test_codesize_minimal_pthreads.json
+++ b/test/codesize/test_codesize_minimal_pthreads.json
@@ -1,10 +1,10 @@
 {
   "a.out.js": 7367,
   "a.out.js.gz": 3603,
-  "a.out.nodebug.wasm": 19003,
-  "a.out.nodebug.wasm.gz": 8786,
-  "total": 26370,
-  "total_gz": 12389,
+  "a.out.nodebug.wasm": 19074,
+  "a.out.nodebug.wasm.gz": 8817,
+  "total": 26441,
+  "total_gz": 12420,
   "sent": [
     "a (memory)",
     "b (exit)",
@@ -96,6 +96,7 @@
     "$_emscripten_thread_free_data",
     "$_emscripten_thread_init",
     "$_emscripten_thread_mailbox_init",
+    "$_emscripten_thread_notify",
     "$_emscripten_tls_init",
     "$_emscripten_yield",
     "$_main_thread",
diff --git a/test/codesize/test_codesize_minimal_pthreads_memgrowth.json b/test/codesize/test_codesize_minimal_pthreads_memgrowth.json
index 3076a1322f4a5..e2fdc1b4f1fe2 100644
--- a/test/codesize/test_codesize_minimal_pthreads_memgrowth.json
+++ b/test/codesize/test_codesize_minimal_pthreads_memgrowth.json
@@ -1,10 +1,10 @@
 {
   "a.out.js": 7769,
   "a.out.js.gz": 3809,
-  "a.out.nodebug.wasm": 19004,
-  "a.out.nodebug.wasm.gz": 8787,
-  "total": 26773,
-  "total_gz": 12596,
+  "a.out.nodebug.wasm": 19075,
+  "a.out.nodebug.wasm.gz": 8818,
+  "total": 26844,
+  "total_gz": 12627,
   "sent": [
     "a (memory)",
     "b (exit)",
@@ -96,6 +96,7 @@
     "$_emscripten_thread_free_data",
     "$_emscripten_thread_init",
     "$_emscripten_thread_mailbox_init",
+    "$_emscripten_thread_notify",
     "$_emscripten_tls_init",
     "$_emscripten_yield",
     "$_main_thread",
diff --git a/test/codesize/test_minimal_runtime_code_size_hello_wasm_worker.json b/test/codesize/test_minimal_runtime_code_size_hello_wasm_worker.json
index 678af8ef670c3..80a918a51986f 100644
--- a/test/codesize/test_minimal_runtime_code_size_hello_wasm_worker.json
+++ b/test/codesize/test_minimal_runtime_code_size_hello_wasm_worker.json
@@ -3,8 +3,8 @@
   "a.html.gz": 355,
   "a.js": 956,
   "a.js.gz": 605,
-  "a.wasm": 2584,
-  "a.wasm.gz": 1438,
-  "total": 4055,
-  "total_gz": 2398
+  "a.wasm": 2592,
+  "a.wasm.gz": 1447,
+  "total": 4063,
+  "total_gz": 2407
 }
diff --git a/test/pthread/test_pthread_cleanup.c b/test/pthread/test_pthread_cleanup.c
index 9d07817f641d0..400e7d05d771f 100644
--- a/test/pthread/test_pthread_cleanup.c
+++ b/test/pthread/test_pthread_cleanup.c
@@ -29,10 +29,12 @@ static void cleanup_handler2(void *arg) {
 }
 
 static void *thread_start1(void *arg) {
+  printf("thread_start1\n");
   pthread_cleanup_push(cleanup_handler1, (void*)(42 + (long)arg*100));
   pthread_cleanup_push(cleanup_handler2, (void*)(69 + (long)arg*100));
   pthread_cleanup_pop((int)(intptr_t)arg);
   pthread_cleanup_pop((int)(intptr_t)arg);
+  printf("thread_start1 done\n");
   pthread_exit(0);
 }
 
@@ -60,20 +62,27 @@ static void *thread_start3(void *arg) {
 pthread_t thr[4];
 
 int main() {
+  int s;
   int result = 0;
 
   pthread_cleanup_push(cleanup_handler1, (void*)9998);
   pthread_cleanup_push(cleanup_handler1, (void*)9999);
 
-  int s = pthread_create(&thr[0], NULL, thread_start1, (void*)0);
+  s = pthread_create(&thr[0], NULL, thread_start1, (void*)0);
   assert(s == 0);
+  printf("joining thread 0");
   pthread_join(thr[0], 0);
+  printf("done join");
   s = pthread_create(&thr[1], NULL, thread_start1, (void*)1);
   assert(s == 0);
+  printf("joining thread 1");
   pthread_join(thr[1], 0);
+  printf("done join\n");
   s = pthread_create(&thr[2], NULL, thread_start2, (void*)1);
   assert(s == 0);
+  printf("joining thread 2");
   pthread_join(thr[2], 0);
+  printf("done join");
 // TODO
 //   s = pthread_create(&thr[3], NULL, thread_start3, (void*)1);
 //   assert(s == 0);
diff --git a/test/pthread/test_pthread_kill.out b/test/pthread/test_pthread_kill.out
index 9c0d30aee8994..f7a70be40ea36 100644
--- a/test/pthread/test_pthread_kill.out
+++ b/test/pthread/test_pthread_kill.out
@@ -4,6 +4,6 @@ thread has started, sending SIGTERM
 SIGTERM sent
 signal: 15 onthread=1
 got term signal, sending signal back to main thread
-joined child_thread
 signal: 10 onthread=0
+joined child_thread
 got SIGUSR1. all done.
diff --git a/test/pthread/test_pthread_mandelbrot.cpp b/test/pthread/test_pthread_mandelbrot.cpp
index d0edb4a3d682c..69f8726b4f088 100644
--- a/test/pthread/test_pthread_mandelbrot.cpp
+++ b/test/pthread/test_pthread_mandelbrot.cpp
@@ -307,7 +307,9 @@ void *mandelbrot_thread(void *arg)
 
   for(;;)
   {
-    emscripten_futex_wait(&tasksPending[idx], 0, INFINITY);
+    while (tasksPending[idx] == 0) {
+      emscripten_futex_wait(&tasksPending[idx], 0, INFINITY);
+    }
     tasksPending[idx] = 0;
     double t0 = emscripten_get_now();
     int ni;
diff --git a/test/pthread/test_pthread_proxying_in_futex_wait.cpp b/test/pthread/test_pthread_proxying_in_futex_wait.cpp
index 605b75bc2be51..9190a65feba6f 100644
--- a/test/pthread/test_pthread_proxying_in_futex_wait.cpp
+++ b/test/pthread/test_pthread_proxying_in_futex_wait.cpp
@@ -35,14 +35,16 @@ int main()
 	pthread_t thread;
 	int rc = pthread_create(&thread, NULL, ThreadMain, 0);
 	assert(rc == 0);
-	rc = emscripten_futex_wait(&main_thread_wait_val, 1, 15 * 1000);
-	// An rc of 0 means no error, and of EWOULDBLOCK means that the value is
-	// not the expected one, which can happen if the pthread manages to set it
-	// before we reach the futex_wait.
-	if (rc != 0 && rc != -EWOULDBLOCK)
-	{
-		printf("ERROR! futex wait errored %d!\n", rc);
-		return 2;
+	while (main_thread_wait_val != 0) {
+		rc = emscripten_futex_wait(&main_thread_wait_val, 1, 15 * 1000);
+		// An rc of 0 means no error, and of EWOULDBLOCK means that the value is
+		// not the expected one, which can happen if the pthread manages to set it
+		// before we reach the futex_wait.
+		if (rc != 0 && rc != -EWOULDBLOCK)
+		{
+			printf("ERROR! futex wait errored %d!\n", rc);
+			return 2;
+		}
 	}
 	pthread_join(thread, 0);