Winsock Fixes #3433

Joannis · 2025-11-03T13:57:52Z

This PR gets TCP (Servers) on Windows mostly working. I'll annotate my PR for clarity.

The one bug:
If you open a TCP client connection to a Windows TCP server, it doesn't read/write yet until a second connection comes in. This happens because reregister0 is called after WSAPoll is called - but WSAPoll isn't woken up to know this so it doesn't know the new socket requests read/writes/...

Also the EventLoop cannot be woken up during el.execute { .. } because SleepEx isn't running. WSAPoll doesn't respond to this signal it seems.

This makes the TCP server functionality not yet usable, but I thought with this draft we could figure out what I'm missing.

Alternative considered: We can make WSAPoll limited to wake up every 1 ms for example, that way we can still get to these events at some point.

This results in a partially working winsock

BUG: A connection currently only becomes communicative after a new connection comes in. But Windows TCP is working after that.

Joannis · 2025-11-03T14:00:04Z

Sources/NIOPosix/ThreadWindows.swift

+            var realHandle: HANDLE? = nil
+            let success = DuplicateHandle(
+                GetCurrentProcess(),    // Source process
+                GetCurrentThread(),     // Source handle (pseudo-handle)
+                GetCurrentProcess(),    // Target process
+                &realHandle,           // Target handle (real handle)
+                0,                     // Desired access (0 = same as source)
+                false,                 // Inherit handle
+                DWORD(DUPLICATE_SAME_ACCESS) // Options
+            )


Windows gives pseudo handles by default, so they're always correct for the current thread. However, when spawning work on another thread, this handle results in getting the incorrect ThreadID, causing QueueUserAPC to notify the wrong (or non-existing) thread. This in turn prevents SleepEx from waking up the eventloop

Joannis · 2025-11-03T14:00:11Z

Sources/NIOPosix/ThreadWindows.swift


    static var currentThread: ThreadOpsSystem.ThreadHandle {
-        GetCurrentThread()
+        var realHandle: HANDLE? = nil


Likewise here

Joannis · 2025-11-03T14:01:49Z

Sources/NIOPosix/SelectorWSAPoll.swift

        registrationID: SelectorRegistrationID
    ) throws {
-        fatalError("TODO: Unimplemented")
+        if let index = self.pollFDs.firstIndex(where: { $0.fd == UInt64(fileDescriptor) }) {


Deregistering is just removing the FD. However, we can't just remove it here as we're iterating over the same pollFDs at the same time. deregister0 is called down the stack of try body((SelectorEvent(io: selectorEvent, registration: registration))). So the available indices of pollFDs changes causing a crash

Joannis · 2025-11-03T14:02:17Z

Sources/NIOPosix/SelectorWSAPoll.swift

                        continue
                    }

                    try body((SelectorEvent(io: selectorEvent, registration: registration)))


This line often calls deregister0 indirectly, so we effectively can't mutate pollFDs in deregister0

Joannis · 2025-11-03T14:02:33Z

Sources/NIOPosix/SelectorWSAPoll.swift

+                // now clean up any deregistered fds
+                // In reverse order so we don't have to copy elements out of the array
+                // If we do in in normal order, we'll have to shift all elements after the removed one
+                for i in self.deregisteredFDs.indices.reversed() {
+                    if self.deregisteredFDs[i] {
+                        // remove this one
+                        let fd = self.pollFDs[i].fd
+                        self.pollFDs.remove(at: i)
+                        self.deregisteredFDs.remove(at: i)
+                        self.registrations.removeValue(forKey: Int(fd))
+                    }
+                }


Do the deregister0 work of cleaning up after the polling is done

Joannis · 2025-11-03T14:03:00Z

Sources/NIOPosix/SelectorWSAPoll.swift

    func initialiseState0() throws {
        self.pollFDs.reserveCapacity(16)
+        self.deregisteredFDs.reserveCapacity(16)
+        self.lifecycleState = .open


Lifecycle never became open yet

Joannis · 2025-11-03T14:03:22Z

Sources/NIOPosix/SocketChannel.swift

+        #if os(Windows)
+        case .winsock(WSAEWOULDBLOCK):
+            return false
+        #endif


accept returns WSAEWOULDBLOCK

Why doesn't the current logic for syscall(blocking: true) cover this?

Joannis · 2025-11-03T14:03:45Z

CC @fabianfett and @zamderax

This way new TCP connections will be functional

Lukasa · 2025-11-03T22:22:33Z

Sources/NIOPosix/BaseSocketChannel.swift

+        #if os(Windows)
+        if
+            let err = err as? IOError,
+            case .winsock(WSAEWOULDBLOCK) = err.error


How is this case reached?

What I've found is that when WSAPoll gives the server socket a .read event when a client is inbound:

Server accepts the client through readable() -> readable0() -> ServerSocketChannel.readFromSocket()

The socket.accept(..) finds a socket

SelectableEventLoop repeats this again again to get another read (see maxMessagesPerRead)

This time, socket.accept(..) runs into WinSDK.INVALID_SOCKET

Gets the error using WSAGetLastError()

The error ends up reaching .winsock(WSAEWOULDBLOCK)

In hindsight, I just noticed that accept() returns an optional so I'll leverage that instead.

Lukasa · 2025-11-03T22:29:57Z

Sources/NIOPosix/SelectorWSAPoll.swift

        } else {
            let result = self.pollFDs.withUnsafeMutableBufferPointer { ptr in
-                WSAPoll(ptr.baseAddress!, UInt32(ptr.count), time)
+                WSAPoll(ptr.baseAddress!, UInt32(ptr.count), 1)


I'm not entirely following why this change needed to be made. Can you elaborate this?

When you initially create the eventloop, there is no FD to poll. So it goes into a SleepEx(INFINITE, true).

When the server FD is added, it wakes up the selector using wakeup0()

This implementation calls QueueUserAPC(..) on the EventLoop's thread

QueueUserAPC wakes up WSAPoll and observes the socket

At this point there are FDs, so instead of SleepEx - the EventLoop calls into WSAPoll for those FDs.

A client connects to the server

Server accepts the socket, registering it to WSAPoll with a minimal set of events (.reset and .error IIRC)

The eventloop is done, and goes into WSAPoll with this minimal set of events

After some back & forth, the client reregister0s itself with more events

WSAPoll is not aware of the change, WSAPoll is still invoked with the previous events subset

Socket I/O is ignored, becase .read and .write were not part of the events

Then..

A new (second) client attaches to the server

It goes through the same flow, except now WSAPoll correctly polls the old socket's new events (read/write)

New socket goes through the same trouble

To remedy this temporarily, I'm waking up WSAPoll every 1ms so we don't completely block I/O for new sockets. But this 100% not something we'll want to stick with of course. I don't yet know how to best tackle this

Why is WSAPoll not being made aware of the change? That seems like the obvious bug.

WSAPoll cannot be woken up early

I'm curious what the long term solution here would be. I'd love to hear @fabianfett 's thoughts - not sure if he's available again.

Ok, so assuming that statement is true (I haven't checked), the solution here is to have a self-pipe. This allows us to mostly replicate the eventfd pattern on Linux, so that in order to wake up the loop we write to a pipe.

Oh that's clever! I'll take a look at that

Lukasa · 2025-11-03T22:38:11Z

Sources/NIOPosix/SocketChannel.swift

+        #if os(Windows)
+        case .winsock(WSAEWOULDBLOCK):
+            return false
+        #endif


Why doesn't the current logic for syscall(blocking: true) cover this?

Lukasa · 2025-11-03T22:38:36Z

Sources/NIOPosix/SocketChannel.swift


-    private func shouldCloseOnErrnoCode(_ errnoCode: CInt) -> Bool {
-        switch errnoCode {
+    private func shouldCloseOnErrnoCode(_ errno: CInt) -> Bool {


What motivated the changes to this function?

I was messing with that code but the function itself shouldn't have changed, I'll revert it.

Sources/NIOPosix/SocketChannel.swift

Joannis added 2 commits November 3, 2025 14:03

Fix the ability to spawn work on the Windows EventLoop

43c5d7d

This results in a partially working winsock

Allow a TCP server on Windows to accept connection

f59cfef

BUG: A connection currently only becomes communicative after a new connection comes in. But Windows TCP is working after that.

Joannis commented Nov 3, 2025

View reviewed changes

Joannis marked this pull request as ready for review November 3, 2025 14:22

Temporary fix: wake up WSAPoll after 1ms - always

10bb556

This way new TCP connections will be functional

Joannis mentioned this pull request Nov 3, 2025

Windows support hummingbird-project/hummingbird#747

Open

Lukasa reviewed Nov 3, 2025

View reviewed changes

Joannis added 3 commits November 4, 2025 11:05

Revert Windows WSAEWOULDBLOCK logic

05caf55

If accept() runs into WSAEWOULDBLOCk - return nil

d2a1ed1

Revert function signature change

17f30b6

Winsock Fixes #3433

Are you sure you want to change the base?

Winsock Fixes #3433

Uh oh!

Conversation

Joannis commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joannis Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Joannis commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Joannis commented Nov 3, 2025 •

edited

Loading

Joannis Nov 3, 2025 •

edited

Loading